MCPcopy
hub / github.com/PaddlePaddle/PaddleRec / cut

Method cut

models/treebased/data/data_cutter.py:28–56  ·  view source on GitHub ↗
(self)

Source from the content-addressed store, hash-verified

26 self._number = number
27
28 def cut(self):
29 user_behav = dict()
30 user_ids = list()
31 with open(self._input) as f:
32 for line in f:
33 arr = line.strip().split(',')
34 if len(arr) != 5:
35 break
36
37 if arr[0] not in user_behav:
38 user_ids.append(arr[0])
39 user_behav[arr[0]] = list()
40
41 user_behav[arr[0]].append(line)
42
43 random.shuffle(user_ids)
44 test_user_ids = user_ids[:self._number]
45 train_user_ids = user_ids[self._number:]
46
47 # write train data set
48 with open(self._train, 'w') as f:
49 for uid in train_user_ids:
50 for line in user_behav[uid]:
51 f.write(line)
52
53 with open(self._test, 'w') as f:
54 for uid in test_user_ids:
55 for line in user_behav[uid]:
56 f.write(line)
57
58
59if __name__ == '__main__':

Callers 5

preprocess.pyFile · 0.80
preprocess.pyFile · 0.80
preprocess.pyFile · 0.80
preprocess.pyFile · 0.80
data_cutter.pyFile · 0.80

Calls

no outgoing calls

Tested by

no test coverage detected