MCPcopy Index your code
hub / github.com/zai-org/CogView / tokenize

Method tokenize

data_utils/sp_tokenizer.py:89–92  ·  view source on GitHub ↗
(self, text)

Source from the content-addressed store, hash-verified

87 return text
88
89 def tokenize(self, text):
90 bpe_tokens = []
91 bpe_tokens.extend(bpe_token for bpe_token in self.bpe(text).split(' '))
92 return bpe_tokens
93
94 def convert_tokens_to_ids(self, tokens):
95 return [self.encoder.get(token, 1) for token in tokens]

Callers 1

encodeMethod · 0.95

Calls 1

bpeMethod · 0.95

Tested by

no test coverage detected