MCPcopy Index your code
hub / github.com/deepspeedai/DeepSpeedExamples / EncodeAsIds

Method EncodeAsIds

Megatron-LM/data_utils/tokenization.py:301–308  ·  view source on GitHub ↗

encode text using text tokenizer and shift Id values for command tokens

(self, text, process_fn=None)

Source from the content-addressed store, hash-verified

299 return self._text_token_vocab
300
301 def EncodeAsIds(self, text, process_fn=None):
302 """
303 encode text using text tokenizer and shift Id values for command tokens
304 """
305 tokenization = self.text_tokenizer.EncodeAsIds(text, process_fn=process_fn)
306 tokenization.tokenization = [t+self.num_command_tokens for t in tokenization.tokenization]
307 tokenization.set_command_tokens(self._command_tokens)
308 return tokenization
309
310 def EncodeAsTokens(self, text, process_fn=None):
311 """

Callers 9

__call__Method · 0.95
generate_samplesFunction · 0.45
__init__Method · 0.45
get_eval_dataFunction · 0.45
EncodeAsIdsMethod · 0.45
__getitem__Method · 0.45
__getitem__Method · 0.45
getidxMethod · 0.45
sentence_tokenizeMethod · 0.45

Calls 1

set_command_tokensMethod · 0.80

Tested by

no test coverage detected