MCPcopy Index your code
hub / github.com/clips/pattern / ngrams

Function ngrams

pattern/metrics.py:341–351  ·  view source on GitHub ↗

Returns a list of n-grams (tuples of n successive words) from the given string. Punctuation marks are stripped from words.

(string, n=3, punctuation=PUNCTUATION, **kwargs)

Source from the content-addressed store, hash-verified

339PUNCTUATION = ".,;:!?()[]{}`''\"@#$^&*+-|=~_"
340
341def ngrams(string, n=3, punctuation=PUNCTUATION, **kwargs):
342 """ Returns a list of n-grams (tuples of n successive words) from the given string.
343 Punctuation marks are stripped from words.
344 """
345 s = string
346 s = s.replace(".", " .")
347 s = s.replace("?", " ?")
348 s = s.replace("!", " !")
349 s = [w.strip(punctuation) for w in s.split()]
350 s = [w.strip() for w in s if w.strip()]
351 return [tuple(s[i:i+n]) for i in range(len(s)-n+1)]
352
353class Weight(float):
354 """ A float with a magic "assessments" property,

Callers 1

intertextualityFunction · 0.70

Calls 3

lenFunction · 0.85
stripMethod · 0.80
splitMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…