MCPcopy Index your code
hub / github.com/deepspeedai/DeepSpeedExamples / get_splits

Function get_splits

Megatron-LM/scripts/split_json.py:31–54  ·  view source on GitHub ↗
(lines, line_counts)

Source from the content-addressed store, hash-verified

29 return lines
30
31def get_splits(lines, line_counts):
32 all_lines = []
33 line_idx = []
34 file_mappings = []
35 for i, l in enumerate(lines):
36 all_lines.extend(l)
37 line_idx.extend(list(range(len(l))))
38 file_mappings.extend([i]*len(l))
39
40 indices = list(range(len(all_lines)))
41 random.shuffle(indices)
42 all_lines = [all_lines[idx] for idx in indices]
43 line_idx = [line_idx[idx] for idx in indices]
44 file_mappings = [file_mappings[idx] for idx in indices]
45
46 splits = []
47 mappings = []
48 start = 0
49 for end in line_counts:
50 end += start
51 splits.append(all_lines[start:end])
52 mappings.append(format_mappings(line_idx[start:end], file_mappings[start:end]))
53 start = end
54 return splits, mappings
55
56def format_mappings(line_idx, file_mappings):
57 lines = []

Callers 1

split_json.pyFile · 0.85

Calls 3

format_mappingsFunction · 0.85
extendMethod · 0.80
appendMethod · 0.80

Tested by

no test coverage detected