MCPcopy Index your code
hub / github.com/ddbourgin/numpy-ml / tokenize_whitespace

Function tokenize_whitespace

numpy_ml/preprocessing/nlp.py:64–74  ·  view source on GitHub ↗

Split a string at any whitespace characters, optionally removing punctuation and stop-words in the process.

(
    line, lowercase=True, filter_stopwords=True, filter_punctuation=True, **kwargs,
)

Source from the content-addressed store, hash-verified

62
63
64def tokenize_whitespace(
65 line, lowercase=True, filter_stopwords=True, filter_punctuation=True, **kwargs,
66):
67 """
68 Split a string at any whitespace characters, optionally removing
69 punctuation and stop-words in the process.
70 """
71 line = line.lower() if lowercase else line
72 words = line.split()
73 line = [strip_punctuation(w) for w in words] if filter_punctuation else line
74 return remove_stop_words(words) if filter_stopwords else words
75
76
77def tokenize_words(

Callers

nothing calls this directly

Calls 2

strip_punctuationFunction · 0.85
remove_stop_wordsFunction · 0.85

Tested by

no test coverage detected