MCPcopy
hub / github.com/scikit-learn/scikit-learn / _approximate_mode

Function _approximate_mode

sklearn/utils/extmath.py:1416–1482  ·  view source on GitHub ↗

Computes approximate mode of multivariate hypergeometric. This is an approximation to the mode of the multivariate hypergeometric given by class_counts and n_draws. It shouldn't be off by more than one. It is the mostly likely outcome of drawing n_draws many samples from the po

(class_counts, n_draws, rng)

Source from the content-addressed store, hash-verified

1414
1415
1416def _approximate_mode(class_counts, n_draws, rng):
1417 """Computes approximate mode of multivariate hypergeometric.
1418
1419 This is an approximation to the mode of the multivariate
1420 hypergeometric given by class_counts and n_draws.
1421 It shouldn't be off by more than one.
1422
1423 It is the mostly likely outcome of drawing n_draws many
1424 samples from the population given by class_counts.
1425
1426 Parameters
1427 ----------
1428 class_counts : ndarray of int
1429 Population per class.
1430 n_draws : int
1431 Number of draws (samples to draw) from the overall population.
1432 rng : random state
1433 Used to break ties.
1434
1435 Returns
1436 -------
1437 sampled_classes : ndarray of int
1438 Number of samples drawn from each class.
1439 np.sum(sampled_classes) == n_draws
1440
1441 Examples
1442 --------
1443 >>> import numpy as np
1444 >>> from sklearn.utils.extmath import _approximate_mode
1445 >>> _approximate_mode(class_counts=np.array([4, 2]), n_draws=3, rng=0)
1446 array([2, 1])
1447 >>> _approximate_mode(class_counts=np.array([5, 2]), n_draws=4, rng=0)
1448 array([3, 1])
1449 >>> _approximate_mode(class_counts=np.array([2, 2, 2, 1]),
1450 ... n_draws=2, rng=0)
1451 array([0, 1, 1, 0])
1452 >>> _approximate_mode(class_counts=np.array([2, 2, 2, 1]),
1453 ... n_draws=2, rng=42)
1454 array([1, 1, 0, 0])
1455 """
1456 rng = check_random_state(rng)
1457 # this computes a bad approximation to the mode of the
1458 # multivariate hypergeometric given by class_counts and n_draws
1459 continuous = class_counts / class_counts.sum() * n_draws
1460 # floored means we don't overshoot n_samples, but probably undershoot
1461 floored = np.floor(continuous)
1462 # we add samples according to how much "left over" probability
1463 # they had, until we arrive at n_samples
1464 need_to_add = int(n_draws - floored.sum())
1465 if need_to_add > 0:
1466 remainder = continuous - floored
1467 values = np.sort(np.unique(remainder))[::-1]
1468 # add according to remainder, but break ties
1469 # randomly to avoid biases
1470 for value in values:
1471 (inds,) = np.where(remainder == value)
1472 # if we need_to_add less than what's in inds
1473 # we draw randomly from them.

Callers 3

_iter_indicesMethod · 0.90
resampleFunction · 0.90
test_approximate_modeFunction · 0.90

Calls 2

check_random_stateFunction · 0.90
minFunction · 0.85

Tested by 1

test_approximate_modeFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…