MCPcopy
hub / github.com/Morizeyao/GPT2-Chinese / tokenize

Method tokenize

tokenizations/bpe_tokenizer.py:86–89  ·  view source on GitHub ↗
(self, text)

Source from the content-addressed store, hash-verified

84 return text
85
86 def tokenize(self, text):
87 bpe_tokens = []
88 bpe_tokens.extend(bpe_token for bpe_token in self.bpe(text).split(' '))
89 return bpe_tokens
90
91 def convert_tokens_to_ids(self, tokens):
92 return [self.encoder.get(token, 1) for token in tokens]

Callers 6

encodeMethod · 0.95
build_filesFunction · 0.45
build_filesFunction · 0.45
mainFunction · 0.45
build_filesFunction · 0.45
mainFunction · 0.45

Calls 1

bpeMethod · 0.95

Tested by

no test coverage detected