MCPcopy Index your code
hub / github.com/Turing-Project/WriteGPT / input_fn_builder

Function input_fn_builder

LanguageNetwork/GPT2/train/dataloader.py:40–75  ·  view source on GitHub ↗

Creates an `input_fn` closure to be passed to TPUEstimator.

(input_files,
                     seq_length,
                     is_training,
                     num_cpu_threads=8,
                     evaluate_for_fixed_number_of_steps=True)

Source from the content-addressed store, hash-verified

38
39
40def input_fn_builder(input_files,
41 seq_length,
42 is_training,
43 num_cpu_threads=8,
44 evaluate_for_fixed_number_of_steps=True):
45 """Creates an `input_fn` closure to be passed to TPUEstimator."""
46
47 def input_fn(params):
48 """The actual input function."""
49 batch_size = params["batch_size"]
50 name_to_features = {
51 "input_ids": tf.FixedLenFeature([seq_length], tf.int64),
52 }
53 # For training, we want a lot of parallel reading and shuffling.
54 # For eval, we want no shuffling and parallel reading doesn't matter.
55
56 d = tf.data.TFRecordDataset(input_files)
57 # If we evaluate for a fixed number of steps we don't want to encounter
58 # out-of-range exceptions.
59 if evaluate_for_fixed_number_of_steps:
60 d = d.repeat()
61
62 # We must `drop_remainder` on training because the TPU requires fixed
63 # size dimensions. For eval, we assume we are evaluating on the CPU or GPU
64 # and we *don't* want to drop the remainder, otherwise we wont cover
65 # every sample.
66 #d = d.apply(
67 # tf.data.experimental.map_and_batch(
68 # lambda record: _decode_record(record, name_to_features),
69 # batch_size=batch_size,
70 # num_parallel_batches=num_cpu_threads,
71 # drop_remainder=True))
72 print("the actual lens of data is>>>>>>>>>>>>>>>>>>>>>>>>>>>> ", d)
73 return d
74
75 return input_fn
76
77
78# ~~~~~~~~~~~~~~ This is for classification / AF ~~~~~~~~~~~~~~~~~~

Callers 2

mainFunction · 0.90
mainFunction · 0.90

Calls

no outgoing calls

Tested by

no test coverage detected