MCPcopy Index your code
hub / github.com/deepspeedai/DeepSpeedExamples / __getitem__

Method __getitem__

Megatron-LM/data_utils/datasets.py:368–381  ·  view source on GitHub ↗

gets the index'th string from the dataset

(self, index)

Source from the content-addressed store, hash-verified

366 return None
367
368 def __getitem__(self, index):
369 """gets the index'th string from the dataset"""
370 x = self.X[index]
371 if self.tokenizer is not None:
372 x = self.tokenizer.EncodeAsIds(x, self.preprocess_fn)
373 elif self.preprocess_fn is not None:
374 x = self.preprocess_fn(x)
375 y = self.Y[index]
376 if isinstance(y, str):
377 if self.tokenizer is not None:
378 y = self.tokenizer.EncodeAsIds(y, self.preprocess_fn)
379 elif self.preprocess_fn is not None:
380 y = self.preprocess_fn(y)
381 return {'text': x, 'length': len(x), 'label': y}
382
383 def __len__(self):
384 return len(self.X)

Callers

nothing calls this directly

Calls 1

EncodeAsIdsMethod · 0.45

Tested by

no test coverage detected