hub / github.com/spencermountain/compromise

github.com/spencermountain/compromise @14.15.1 sqlite

repository ↗ · DeepWiki ↗ · release 14.15.1 ↗

4,123 symbols 14,685 edges 1,023 files 62 documented · 2%

README

compromise

modest natural language processing

npm install compromise

<sub>
  by
  <a href="https://spencermounta.in/">Spencer Kelly</a> and
  <a href="https://github.com/spencermountain/compromise/graphs/contributors">
    many contributors
  </a>
</sub>

<a href="https://npmjs.org/package/compromise">
<img src="https://img.shields.io/npm/v/compromise.svg?style=flat-square" />

<sub>
 <a href="https://github.com/nlp-compromise/fr-compromise">french</a> • <a href="https://github.com/nlp-compromise/de-compromise">german</a>  • <a href="https://github.com/nlp-compromise/it-compromise">italian</a> • <a href="https://github.com/nlp-compromise/es-compromise">spanish</a>
</sub>

don't you find it strange,

_{how easy text is to make,}

↬_ᔐᖜ↬

parse

use

compromise tries its best to turn text into data.

it makes limited and sensible decisions.

_{it's not as smart as you'd think.}

import nlp from 'compromise'

let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'

don't be fancy, at all:

if (doc.has('simon says #Verb')) {
  return true
}

grab parts of the text:

let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"

match docs

and get data:

import plg from 'compromise-speech'
nlp.extend(plg)

let doc = nlp('Milwaukee has certainly had its share of visitors..')
doc.compute('syllables')
doc.places().json()
/*
[{
  "text": "Milwaukee",
  "terms": [{
    "normal": "milwaukee",
    "syllables": ["mil", "wau", "kee"]
  }]
}]
*/

json docs

avoid the problems of brittle parsers:

let doc = nlp("we're not gonna take it..")

doc.has('gonna') // true
doc.has('going to') // true (implicit)

// transform
doc.contractions().expand()
doc.text()
// 'we are not going to take it..'

contraction docs

and whip stuff around like it's data:

let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(20)
doc.text()
// 'ninety five thousand and seventy two'

number docs

_{-because it actually is-}

let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'

noun docs

Use it on the client-side:

<script src="https://unpkg.com/compromise"></script>
<script>
  var doc = nlp('two bottles of beer')
  doc.numbers().minus(1)
  document.body.innerHTML = doc.text()
  // 'one bottle of beer'
</script>

or likewise:

import nlp from 'compromise'

var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'

compromise is ~250kb (minified):

it's pretty fast. It can run on keypress:

it works mainly by conjugating all forms of a basic word list.

The final lexicon is ~14,000 words:

you can read more about how it works, here. it's weird.

_{okay -}

`compromise/one`

A tokenizer of words, sentences, and punctuation.

import nlp from 'compromise/one'

let doc = nlp("Wayne's World, party time")
let data = doc.json()
/* [{
  normal:"wayne's world party time",
    terms:[{ text: "Wayne's", normal: "wayne" },
      ...
      ]
  }]
*/

tokenizer docs

compromise/one splits your text up, wraps it in a handy API,

_{and does nothing else -}

/one is quick - most sentences take a 10th of a millisecond.

It can do ~1mb of text a second - or 10 wikipedia pages.

Infinite jest takes 3s.

You can also parallelize, or stream text to it with compromise-speed.

`compromise/two`

A part-of-speech tagger, and grammar-interpreter.

import nlp from 'compromise/two'

let doc = nlp("Wayne's World, party time")
let str = doc.match('#Possessive #Noun').text()
// "Wayne's World"

tagger docs

compromise/two automatically calculates the very basic grammar of each word.

_{this is more useful than people sometimes realize.}

Light grammar helps you write cleaner templates, and get closer to the information.

compromise has 83 tags, arranged in a handsome graph.

#FirstName → #Person → #ProperNoun → #Noun

you can see the grammar of each word by running doc.debug()

you can see the reasoning for each tag with nlp.verbose('tagger').

if you prefer Penn tags, you can derive them with:

let doc = nlp('welcome thrillho')
doc.compute('penn')
doc.json()

`compromise/three`

Phrase and sentence tooling.

import nlp from 'compromise/three'

let doc = nlp("Wayne's World, party time")
let str = doc.people().normalize().text()
// "wayne"

selection docs

compromise/three is a set of tooling to zoom into and operate on parts of a text.

.numbers() grabs all the numbers in a document, for example - and extends it with new methods, like .subtract().

When you have a phrase, or group of words, you can see additional metadata about it with .json()

let doc = nlp('four out of five dentists')
console.log(doc.fractions().json())
/*[{
    text: 'four out of five',
    terms: [ [Object], [Object], [Object], [Object] ],
    fraction: { numerator: 4, denominator: 5, decimal: 0.8 }
  }
]*/

let doc = nlp('$4.09CAD')
doc.money().json()
/*[{
    text: '$4.09CAD',
    terms: [ [Object] ],
    number: { prefix: '$', num: 4.09, suffix: 'cad'}
  }
]*/

<img height="80px" src="https://use

Extension points exported contracts — how you extend this code

DateView (Interface)

(no doc) [4 implementers]

plugins/dates/index.d.ts

TypedPlugin (Interface)

(no doc)

types/one.d.ts

TypedPlugin (Interface)

TypedPlugin (Interface)

(no doc)

types/three.d.ts

ReplaceWithProps (Interface)

(no doc)

types/view/one.d.ts

ParagraphView (Interface)

(no doc)

plugins/paragraphs/index.d.ts

SpeedMethods (Interface)

(no doc)

plugins/speed/index.d.ts

Core symbols most depended-on inside this repo

has

called by 2350

plugins/paragraphs/src/api.js

match

called by 2124

plugins/paragraphs/src/api.js

forEach

called by 2118

plugins/paragraphs/src/api.js

plugins/paragraphs/src/api.js

map

called by 903

plugins/paragraphs/src/api.js

end

called by 864

plugins/dates/builds/compromise-dates.cjs

filter

called by 538

plugins/paragraphs/src/api.js

Shape

Function 2,695

Method 1,062

Class 328

Interface 38

Languages

TypeScript100%

Modules by API surface

builds/three/compromise-three.mjs443 symbols

builds/three/compromise-three.cjs443 symbols

builds/compromise.js443 symbols

plugins/dates/builds/compromise-dates.cjs297 symbols

builds/two/compromise-two.mjs225 symbols

builds/two/compromise-two.cjs225 symbols

plugins/dates/builds/compromise-dates.mjs207 symbols

plugins/dates/builds/compromise-dates.min.js207 symbols

builds/one/compromise-one.mjs156 symbols

builds/one/compromise-one.cjs156 symbols

src/3-three/numbers/numbers/api.js26 symbols

plugins/paragraphs/src/api.js26 symbols

Used by 4 indexed graphs manifest dependencies, hub-wide

github.com/ZuodaoTech/everyone-can-use-english

github.com/harvard-edge/cs249r_book

github.com/mastra-ai/mastra

github.com/supermemoryai/supermemory

Dependencies from manifests, versioned

@rollup/plugin-commonjs24.0.1 · 1×

@rollup/plugin-node-resolve16.0.3 · 1×

@rollup/plugin-terser1.0.0 · 1×

colorette2.0.16 · 1×

compromise13.0.01×

compromise13.1.01×

compromise13.1.11×

compromise13.10.01×

compromise13.10.11×

compromise13.10.21×

compromise13.2.01×

compromise13.3.01×

For agents

$ claude mcp add compromise \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact