MCPcopy
hub / github.com/huggingface/datasets / approximate_mode

Function approximate_mode

src/datasets/utils/stratify.py:4–51  ·  view source on GitHub ↗

Computes approximate mode of multivariate hypergeometric. This is an approximation to the mode of the multivariate hypergeometric given by class_counts and n_draws. It shouldn't be off by more than one. It is the mostly likely outcome of drawing n_draws many samples from the popu

(class_counts, n_draws, rng)

Source from the content-addressed store, hash-verified

2
3
4def approximate_mode(class_counts, n_draws, rng):
5 """Computes approximate mode of multivariate hypergeometric.
6 This is an approximation to the mode of the multivariate
7 hypergeometric given by class_counts and n_draws.
8 It shouldn't be off by more than one.
9 It is the mostly likely outcome of drawing n_draws many
10 samples from the population given by class_counts.
11 Args
12 ----------
13 class_counts : ndarray of int
14 Population per class.
15 n_draws : int
16 Number of draws (samples to draw) from the overall population.
17 rng : random state
18 Used to break ties.
19 Returns
20 -------
21 sampled_classes : ndarray of int
22 Number of samples drawn from each class.
23 np.sum(sampled_classes) == n_draws
24
25 """
26 # this computes a bad approximation to the mode of the
27 # multivariate hypergeometric given by class_counts and n_draws
28 continuous = n_draws * class_counts / class_counts.sum()
29 # floored means we don't overshoot n_samples, but probably undershoot
30 floored = np.floor(continuous)
31 # we add samples according to how much "left over" probability
32 # they had, until we arrive at n_samples
33 need_to_add = int(n_draws - floored.sum())
34 if need_to_add > 0:
35 remainder = continuous - floored
36 values = np.sort(np.unique(remainder))[::-1]
37 # add according to remainder, but break ties
38 # randomly to avoid biases
39 for value in values:
40 (inds,) = np.where(remainder == value)
41 # if we need_to_add less than what's in inds
42 # we draw randomly from them.
43 # if we need to add more, we add them all and
44 # go to the next value
45 add_now = min(len(inds), need_to_add)
46 inds = rng.choice(inds, size=add_now, replace=False)
47 floored[inds] += 1
48 need_to_add -= add_now
49 if need_to_add == 0:
50 break
51 return floored.astype(np.int64)
52
53
54def stratified_shuffle_split_generate_indices(y, n_train, n_test, rng, n_splits=10):

Calls 2

sortMethod · 0.45
uniqueMethod · 0.45

Tested by

no test coverage detected