Common data preprocessing for fitting linear models. This helper is in charge of the following steps: - `sample_weight` is assumed to be `None` or a validated array with same dtype as `X`. - If `check_input=True`, perform standard input validation of `X`, `y`. - Perform copie
(
X,
y,
*,
fit_intercept,
copy=True,
sample_weight=None,
check_input=True,
rescale_with_sw=True,
)
| 111 | |
| 112 | |
| 113 | def _preprocess_data( |
| 114 | X, |
| 115 | y, |
| 116 | *, |
| 117 | fit_intercept, |
| 118 | copy=True, |
| 119 | sample_weight=None, |
| 120 | check_input=True, |
| 121 | rescale_with_sw=True, |
| 122 | ): |
| 123 | """Common data preprocessing for fitting linear models. |
| 124 | |
| 125 | This helper is in charge of the following steps: |
| 126 | |
| 127 | - `sample_weight` is assumed to be `None` or a validated array with same dtype as |
| 128 | `X`. |
| 129 | - If `check_input=True`, perform standard input validation of `X`, `y`. |
| 130 | - Perform copies if requested to avoid side-effects in case of inplace |
| 131 | modifications of the input. |
| 132 | |
| 133 | Then, if `fit_intercept=True` this preprocessing centers both `X` and `y` as |
| 134 | follows: |
| 135 | - if `X` is dense, center the data and |
| 136 | store the mean vector in `X_offset`. |
| 137 | - if `X` is sparse, store the mean in `X_offset` |
| 138 | without centering `X`. The centering is expected to be handled by the |
| 139 | linear solver where appropriate. |
| 140 | - in either case, always center `y` and store the mean in `y_offset`. |
| 141 | - both `X_offset` and `y_offset` are always weighted by `sample_weight` |
| 142 | if not set to `None`. |
| 143 | |
| 144 | If `fit_intercept=False`, no centering is performed and `X_offset`, `y_offset` |
| 145 | are set to zero. |
| 146 | |
| 147 | If `rescale_with_sw` is True, then X and y are rescaled with the square root of |
| 148 | sample weights. |
| 149 | |
| 150 | Returns |
| 151 | ------- |
| 152 | X_out : {ndarray, sparse matrix} of shape (n_samples, n_features) |
| 153 | If copy=True a copy of the input X is triggered, otherwise operations are |
| 154 | inplace. |
| 155 | If input X is dense, then X_out is centered. |
| 156 | y_out : {ndarray, sparse matrix} of shape (n_samples,) or (n_samples, n_targets) |
| 157 | Centered copy of y. |
| 158 | X_offset : ndarray of shape (n_features,) |
| 159 | The mean per column of input X. |
| 160 | y_offset : float or ndarray of shape (n_features,) |
| 161 | X_scale : ndarray of shape (n_features,) |
| 162 | Always an array of ones. TODO: refactor the code base to make it |
| 163 | possible to remove this unused variable. |
| 164 | sample_weight_sqrt : ndarray of shape (n_samples, ) or None |
| 165 | `np.sqrt(sample_weight)` |
| 166 | """ |
| 167 | xp, _, device_ = get_namespace_and_device(X, y, sample_weight) |
| 168 | n_samples, n_features = X.shape |
| 169 | X_is_sparse = sp.issparse(X) |
| 170 |
searching dependent graphs…