Computes approximate mode of multivariate hypergeometric. This is an approximation to the mode of the multivariate hypergeometric given by class_counts and n_draws. It shouldn't be off by more than one. It is the mostly likely outcome of drawing n_draws many samples from the popu
(class_counts, n_draws, rng)
| 2 | |
| 3 | |
| 4 | def approximate_mode(class_counts, n_draws, rng): |
| 5 | """Computes approximate mode of multivariate hypergeometric. |
| 6 | This is an approximation to the mode of the multivariate |
| 7 | hypergeometric given by class_counts and n_draws. |
| 8 | It shouldn't be off by more than one. |
| 9 | It is the mostly likely outcome of drawing n_draws many |
| 10 | samples from the population given by class_counts. |
| 11 | Args |
| 12 | ---------- |
| 13 | class_counts : ndarray of int |
| 14 | Population per class. |
| 15 | n_draws : int |
| 16 | Number of draws (samples to draw) from the overall population. |
| 17 | rng : random state |
| 18 | Used to break ties. |
| 19 | Returns |
| 20 | ------- |
| 21 | sampled_classes : ndarray of int |
| 22 | Number of samples drawn from each class. |
| 23 | np.sum(sampled_classes) == n_draws |
| 24 | |
| 25 | """ |
| 26 | # this computes a bad approximation to the mode of the |
| 27 | # multivariate hypergeometric given by class_counts and n_draws |
| 28 | continuous = n_draws * class_counts / class_counts.sum() |
| 29 | # floored means we don't overshoot n_samples, but probably undershoot |
| 30 | floored = np.floor(continuous) |
| 31 | # we add samples according to how much "left over" probability |
| 32 | # they had, until we arrive at n_samples |
| 33 | need_to_add = int(n_draws - floored.sum()) |
| 34 | if need_to_add > 0: |
| 35 | remainder = continuous - floored |
| 36 | values = np.sort(np.unique(remainder))[::-1] |
| 37 | # add according to remainder, but break ties |
| 38 | # randomly to avoid biases |
| 39 | for value in values: |
| 40 | (inds,) = np.where(remainder == value) |
| 41 | # if we need_to_add less than what's in inds |
| 42 | # we draw randomly from them. |
| 43 | # if we need to add more, we add them all and |
| 44 | # go to the next value |
| 45 | add_now = min(len(inds), need_to_add) |
| 46 | inds = rng.choice(inds, size=add_now, replace=False) |
| 47 | floored[inds] += 1 |
| 48 | need_to_add -= add_now |
| 49 | if need_to_add == 0: |
| 50 | break |
| 51 | return floored.astype(np.int64) |
| 52 | |
| 53 | |
| 54 | def stratified_shuffle_split_generate_indices(y, n_train, n_test, rng, n_splits=10): |
no test coverage detected