MCPcopy
hub / github.com/deepspeedai/DeepSpeedExamples / pad_seq

Method pad_seq

Megatron-LM/data_utils/datasets.py:551–555  ·  view source on GitHub ↗
(self, seq)

Source from the content-addressed store, hash-verified

549 return tokens
550
551 def pad_seq(self, seq):
552 total_tokens = self.max_seq_len + 1
553 num_pad_tokens = max(0, total_tokens - len(seq))
554 seq += [self.tokenizer.get_command('pad').Id]*(num_pad_tokens)
555 return seq
556
557 def contains_sentence_end(self, tok):
558 tok = self.tokenizer.IdToToken(tok)

Callers 1

__getitem__Method · 0.95

Calls 1

get_commandMethod · 0.80

Tested by

no test coverage detected