MCPcopy
hub / github.com/Morizeyao/GPT2-Chinese / whitespace_tokenize

Function whitespace_tokenize

tokenizations/tokenization_bert_word_level.py:80–86  ·  view source on GitHub ↗

Runs basic whitespace cleaning and splitting on a piece of text.

(text)

Source from the content-addressed store, hash-verified

78
79
80def whitespace_tokenize(text):
81 """Runs basic whitespace cleaning and splitting on a piece of text."""
82 text = text.strip()
83 if not text:
84 return []
85 tokens = text.split()
86 return tokens
87
88
89class BertTokenizer(PreTrainedTokenizer):

Callers 2

tokenizeMethod · 0.70
tokenizeMethod · 0.70

Calls

no outgoing calls

Tested by

no test coverage detected