MCPcopy
hub / github.com/hankcs/HanLP / split

Method split

hanlp/common/dataset.py:223–248  ·  view source on GitHub ↗

Split dataset into subsets. Args: *ratios: The ratios for each subset. They can be any type of numbers which will be normalized. For example, ``8, 1, 1`` are equivalent to ``0.8, 0.1, 0.1``. Returns: list[TransformableDataset]: A list of

(self, *ratios)

Source from the content-addressed store, hash-verified

221 self.cache = [None] * len(self.data)
222
223 def split(self, *ratios):
224 """Split dataset into subsets.
225
226 Args:
227 *ratios: The ratios for each subset. They can be any type of numbers which will be normalized. For example,
228 ``8, 1, 1`` are equivalent to ``0.8, 0.1, 0.1``.
229
230 Returns:
231 list[TransformableDataset]: A list of subsets.
232 """
233 ratios = [x / sum(ratios) for x in ratios]
234 chunks = []
235 prev = 0
236 for r in ratios:
237 cur = prev + math.ceil(len(self) * r)
238 chunks.append([prev, cur])
239 prev = cur
240 chunks[-1][1] = len(self)
241 outputs = []
242 for b, e in chunks:
243 dataset = copy(self)
244 dataset.data = dataset.data[b:e]
245 if dataset.cache:
246 dataset.cache = dataset.cache[b:e]
247 outputs.append(dataset)
248 return outputs
249
250 def k_fold(self, k, i):
251 """Perform k-fold sampling.

Callers 15

_init_new_embeddingsMethod · 0.45
_tokenizeMethod · 0.45
_tok_bpeMethod · 0.45
get_framesMethod · 0.45
dfs_linearize_tokenizeFunction · 0.45
_tokenizeMethod · 0.45
_split_name_opsFunction · 0.45
forwardMethod · 0.45

Calls 1

appendMethod · 0.45

Tested by

no test coverage detected