MCPcopy
hub / github.com/huggingface/datasets / generate_example_dataset

Function generate_example_dataset

benchmarks/utils.py:47–64  ·  view source on GitHub ↗
(dataset_path, features, num_examples=100, seq_shapes=None)

Source from the content-addressed store, hash-verified

45
46
47def generate_example_dataset(dataset_path, features, num_examples=100, seq_shapes=None):
48 dummy_data = generate_examples(features, num_examples=num_examples, seq_shapes=seq_shapes)
49
50 with ArrowWriter(features=features, path=dataset_path) as writer:
51 for key, record in dummy_data:
52 example = features.encode_example(record)
53 writer.write(example)
54
55 num_final_examples, num_bytes = writer.finalize()
56
57 if not num_final_examples == num_examples:
58 raise ValueError(
59 f"Error writing the dataset, wrote {num_final_examples} examples but should have written {num_examples}."
60 )
61
62 dataset = datasets.Dataset.from_file(filename=dataset_path, info=datasets.DatasetInfo(features=features))
63
64 return dataset

Callers 3

benchmark_iteratingFunction · 0.90
benchmark_map_filterFunction · 0.90

Calls 6

ArrowWriterClass · 0.90
finalizeMethod · 0.80
generate_examplesFunction · 0.70
encode_exampleMethod · 0.45
writeMethod · 0.45
from_fileMethod · 0.45

Tested by

no test coverage detected