MCPcopy
hub / github.com/scikit-learn/scikit-learn / _preprocess_data

Function _preprocess_data

sklearn/linear_model/_base.py:113–220  ·  view source on GitHub ↗

Common data preprocessing for fitting linear models. This helper is in charge of the following steps: - `sample_weight` is assumed to be `None` or a validated array with same dtype as `X`. - If `check_input=True`, perform standard input validation of `X`, `y`. - Perform copie

(
    X,
    y,
    *,
    fit_intercept,
    copy=True,
    sample_weight=None,
    check_input=True,
    rescale_with_sw=True,
)

Source from the content-addressed store, hash-verified

111
112
113def _preprocess_data(
114 X,
115 y,
116 *,
117 fit_intercept,
118 copy=True,
119 sample_weight=None,
120 check_input=True,
121 rescale_with_sw=True,
122):
123 """Common data preprocessing for fitting linear models.
124
125 This helper is in charge of the following steps:
126
127 - `sample_weight` is assumed to be `None` or a validated array with same dtype as
128 `X`.
129 - If `check_input=True`, perform standard input validation of `X`, `y`.
130 - Perform copies if requested to avoid side-effects in case of inplace
131 modifications of the input.
132
133 Then, if `fit_intercept=True` this preprocessing centers both `X` and `y` as
134 follows:
135 - if `X` is dense, center the data and
136 store the mean vector in `X_offset`.
137 - if `X` is sparse, store the mean in `X_offset`
138 without centering `X`. The centering is expected to be handled by the
139 linear solver where appropriate.
140 - in either case, always center `y` and store the mean in `y_offset`.
141 - both `X_offset` and `y_offset` are always weighted by `sample_weight`
142 if not set to `None`.
143
144 If `fit_intercept=False`, no centering is performed and `X_offset`, `y_offset`
145 are set to zero.
146
147 If `rescale_with_sw` is True, then X and y are rescaled with the square root of
148 sample weights.
149
150 Returns
151 -------
152 X_out : {ndarray, sparse matrix} of shape (n_samples, n_features)
153 If copy=True a copy of the input X is triggered, otherwise operations are
154 inplace.
155 If input X is dense, then X_out is centered.
156 y_out : {ndarray, sparse matrix} of shape (n_samples,) or (n_samples, n_targets)
157 Centered copy of y.
158 X_offset : ndarray of shape (n_features,)
159 The mean per column of input X.
160 y_offset : float or ndarray of shape (n_features,)
161 X_scale : ndarray of shape (n_features,)
162 Always an array of ones. TODO: refactor the code base to make it
163 possible to remove this unused variable.
164 sample_weight_sqrt : ndarray of shape (n_samples, ) or None
165 `np.sqrt(sample_weight)`
166 """
167 xp, _, device_ = get_namespace_and_device(X, y, sample_weight)
168 n_samples, n_features = X.shape
169 X_is_sparse = sp.issparse(X)
170

Callers 15

_fitMethod · 0.90
fitMethod · 0.90
fitMethod · 0.90
fitMethod · 0.90
fitMethod · 0.90
fitMethod · 0.90
test_preprocess_dataFunction · 0.90
test_csr_preprocess_dataFunction · 0.90

Calls 7

get_namespace_and_deviceFunction · 0.90
check_arrayFunction · 0.90
supported_float_dtypesFunction · 0.90
_asarray_with_orderFunction · 0.90
mean_variance_axisFunction · 0.90
_averageFunction · 0.90
_rescale_dataFunction · 0.85

Used in the wild real call sites across dependent graphs

searching dependent graphs…