MCPcopy Index your code
hub / github.com/Tele-AI/Telechat / create_dataset

Function create_dataset

deepspeed-telechat/utils/data/data_utils.py:88–92  ·  view source on GitHub ↗
( dataset_name, dataset_weight, output_path, seed)

Source from the content-addressed store, hash-verified

86 return all_lines
87
88def create_dataset( dataset_name, dataset_weight, output_path, seed):
89 raw_dataset = get_raw_dataset(dataset_name, output_path, seed)
90 train_dataset = raw_dataset.get_train_data()
91 train_dataset = get_weight_data(train_dataset, dataset_weight)
92 return train_dataset
93
94def process_concat_data(text, tokenizer, max_seq_len, args):
95 texts = text.split("<_end>")

Callers 1

create_prompt_datasetFunction · 0.85

Calls 3

get_raw_datasetFunction · 0.85
get_weight_dataFunction · 0.85
get_train_dataMethod · 0.45

Tested by

no test coverage detected