MCPcopy Index your code
hub / github.com/LargeWorldModel/LWM / load_dataset

Method load_dataset

lwm/data.py:35–49  ·  view source on GitHub ↗
(cls, config, tokenizer, **kwargs)

Source from the content-addressed store, hash-verified

33
34 @classmethod
35 def load_dataset(cls, config, tokenizer, **kwargs):
36 config = cls.get_default_config(config)
37 if config.type == 'huggingface':
38 text_processor = TextProcessor(config.text_processor, tokenizer)
39 return HuggingfaceDataset(
40 config.huggingface_dataset, tokenizer, text_processor, **kwargs
41 )
42 elif config.type == 'json':
43 text_processor = TextProcessor(config.text_processor, tokenizer)
44 return JsonDataset(config.json_dataset, tokenizer, text_processor, **kwargs)
45 elif config.type == 'json_vision':
46 vision_text_processor = VisionTextProcessor(config.vision_text_processor, tokenizer)
47 return JsonVisionDataset(config.json_vision_dataset, tokenizer, vision_text_processor, **kwargs)
48 else:
49 raise ValueError(f'Unknown dataset type: {config.type}')
50
51 def __init__(self):
52 raise ValueError('DatasetFactory is a static class and should not be instantiated.')

Callers 1

mainFunction · 0.80

Calls 6

TextProcessorClass · 0.85
HuggingfaceDatasetClass · 0.85
JsonDatasetClass · 0.85
VisionTextProcessorClass · 0.85
JsonVisionDatasetClass · 0.85
get_default_configMethod · 0.45

Tested by

no test coverage detected