MCPcopy
hub / github.com/brightmart/albert_zh / tokenize

Method tokenize

tokenization.py:172–178  ·  view source on GitHub ↗
(self, text)

Source from the content-addressed store, hash-verified

170 self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)
171
172 def tokenize(self, text):
173 split_tokens = []
174 for token in self.basic_tokenizer.tokenize(text):
175 for sub_token in self.wordpiece_tokenizer.tokenize(token):
176 split_tokens.append(sub_token)
177
178 return split_tokens
179
180 def convert_tokens_to_ids(self, tokens):
181 return convert_by_vocab(self.vocab, tokens)

Callers 8

convert_single_exampleFunction · 0.45
convert_single_exampleFunction · 0.45
convert_single_exampleFunction · 0.45

Calls

no outgoing calls

Tested by

no test coverage detected