MCPcopy
hub / github.com/lm-sys/FastChat / make_supervised_data_module

Function make_supervised_data_module

fastchat/train/train.py:235–253  ·  view source on GitHub ↗

Make dataset and collator for supervised fine-tuning.

(
    tokenizer: transformers.PreTrainedTokenizer, data_args
)

Source from the content-addressed store, hash-verified

233
234
235def make_supervised_data_module(
236 tokenizer: transformers.PreTrainedTokenizer, data_args
237) -> Dict:
238 """Make dataset and collator for supervised fine-tuning."""
239 dataset_cls = (
240 LazySupervisedDataset if data_args.lazy_preprocess else SupervisedDataset
241 )
242 rank0_print("Loading data...")
243
244 train_json = json.load(open(data_args.data_path, "r"))
245 train_dataset = dataset_cls(train_json, tokenizer=tokenizer)
246
247 if data_args.eval_data_path:
248 eval_json = json.load(open(data_args.eval_data_path, "r"))
249 eval_dataset = dataset_cls(eval_json, tokenizer=tokenizer)
250 else:
251 eval_dataset = None
252
253 return dict(train_dataset=train_dataset, eval_dataset=eval_dataset)
254
255
256def train():

Callers 2

trainFunction · 0.90
trainFunction · 0.70

Calls 1

rank0_printFunction · 0.70

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…