MCPcopy
hub / github.com/microsoft/qlib / setup_data

Method setup_data

examples/benchmarks/TRA/src/dataset.py:112–146  ·  view source on GitHub ↗
(self, handler_kwargs: dict = None, **kwargs)

Source from the content-addressed store, hash-verified

110 super().__init__(handler, segments, **kwargs)
111
112 def setup_data(self, handler_kwargs: dict = None, **kwargs):
113 super().setup_data()
114
115 # change index to <code, date>
116 # NOTE: we will use inplace sort to reduce memory use
117 df = self.handler._data
118 df.index = df.index.swaplevel()
119 df.sort_index(inplace=True)
120
121 self._data = df["feature"].values.astype("float32")
122 self._label = df["label"].squeeze().astype("float32")
123 self._index = df.index
124
125 # add memory to feature
126 self._data = np.c_[self._data, np.zeros((len(self._data), self.num_states), dtype=np.float32)]
127
128 # padding tensor
129 self.zeros = np.zeros((self.seq_len, self._data.shape[1]), dtype=np.float32)
130
131 # pin memory
132 if self.pin_memory:
133 self._data = _to_tensor(self._data)
134 self._label = _to_tensor(self._label)
135 self.zeros = _to_tensor(self.zeros)
136
137 # create batch slices
138 self.batch_slices = _create_ts_slices(self._index, self.seq_len)
139
140 # create daily slices
141 index = [slc.stop - 1 for slc in self.batch_slices]
142 act_index = self.restore_index(index)
143 daily_slices = {date: [] for date in sorted(act_index.unique(level=1))}
144 for i, (code, date) in enumerate(act_index):
145 daily_slices[date].append(self.batch_slices[i])
146 self.daily_slices = list(daily_slices.values())
147
148 def _prepare_seg(self, slc, **kwargs):
149 fn = _get_date_parse_fn(self._index[0][1])

Callers 2

dump_and_load_datasetMethod · 0.45
rolling_processMethod · 0.45

Calls 5

restore_indexMethod · 0.95
valuesMethod · 0.80
_to_tensorFunction · 0.70
_create_ts_slicesFunction · 0.70
sort_indexMethod · 0.45

Tested by

no test coverage detected