Detect if a feature is indistinguishable from a constant feature. The detection is based on its computed variance and on the theoretical error bounds of the '2 pass algorithm' for variance computation. See "Algorithms for computing the sample variance: analysis and recommendations"
(var, mean, n_samples)
| 81 | |
| 82 | |
| 83 | def _is_constant_feature(var, mean, n_samples): |
| 84 | """Detect if a feature is indistinguishable from a constant feature. |
| 85 | |
| 86 | The detection is based on its computed variance and on the theoretical |
| 87 | error bounds of the '2 pass algorithm' for variance computation. |
| 88 | |
| 89 | See "Algorithms for computing the sample variance: analysis and |
| 90 | recommendations", by Chan, Golub, and LeVeque. |
| 91 | """ |
| 92 | # In scikit-learn, variance is always computed using float64 accumulators. |
| 93 | xp, _, device_ = get_namespace_and_device(var, mean) |
| 94 | max_float_dtype = _max_precision_float_dtype(xp=xp, device=device_) |
| 95 | eps = xp.finfo(max_float_dtype).eps |
| 96 | |
| 97 | upper_bound = n_samples * eps * var + (n_samples * mean * eps) ** 2 |
| 98 | return var <= upper_bound |
| 99 | |
| 100 | |
| 101 | def _handle_zeros_in_scale(scale, copy=True, constant_mask=None): |
no test coverage detected
searching dependent graphs…