MCPcopy Index your code
hub / github.com/togethercomputer/RedPajama-Data / _wrap_reader

Method _wrap_reader

app/src/artifacts/hash_dist.py:67–75  ·  view source on GitHub ↗

r""" wrap reader so that it can be used with multiprocessing. Otherwise, pickling of records fails.

()

Source from the content-addressed store, hash-verified

65 reader = Reader(schema=[("text", str)])
66
67 def _wrap_reader():
68 r""" wrap reader so that it can be used with multiprocessing.
69 Otherwise, pickling of records fails. """
70 for record in reader.read(
71 uri="file://" + datafile,
72 max_samples=self._num_samples,
73 return_idx=False
74 ):
75 yield record.text
76
77 global_dist = np.zeros(self._buckets, dtype=np.int64)
78

Callers

nothing calls this directly

Calls 1

readMethod · 0.80

Tested by

no test coverage detected