hub / github.com/scikit-learn/scikit-learn / partial_fit

Method partial_fit

sklearn/preprocessing/_data.py:931–1090 · view source on GitHub ↗

Online computation of mean and std on X for later scaling. All of X is processed as a single batch. This is intended for cases when :meth:`fit` is not feasible due to very large number of `n_samples` or because X is read from a continuous stream. The algorithm for i

(self, X, y=None, sample_weight=None)

Source from the content-addressed store, hash-verified

929
930	@_fit_context(prefer_skip_nested_validation=True)
931	def partial_fit(self, X, y=None, sample_weight=None):
932	"""Online computation of mean and std on X for later scaling.
933
934	All of X is processed as a single batch. This is intended for cases
935	when :meth:`fit` is not feasible due to very large number of
936	`n_samples` or because X is read from a continuous stream.
937
938	The algorithm for incremental mean and std is given in Equation 1.5a,b
939	in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. "Algorithms
940	for computing the sample variance: Analysis and recommendations."
941	The American Statistician 37.3 (1983): 242-247:
942
943	Parameters
944	----------
945	X : {array-like, sparse matrix} of shape (n_samples, n_features)
946	The data used to compute the mean and standard deviation
947	used for later scaling along the features axis.
948
949	y : None
950	Ignored.
951
952	sample_weight : array-like of shape (n_samples,), default=None
953	Individual weights for each sample.
954
955	.. versionadded:: 0.24
956	parameter sample_weight support to StandardScaler.
957
958	Returns
959	-------
960	self : object
961	Fitted scaler.
962	"""
963	xp, _, X_device = get_namespace_and_device(X)
964	first_call = not hasattr(self, "n_samples_seen_")
965	X = validate_data(
966	self,
967	X,
968	accept_sparse=("csr", "csc"),
969	dtype=supported_float_dtypes(xp, X_device),
970	ensure_all_finite="allow-nan",
971	reset=first_call,
972	)
973	n_features = X.shape[1]
974
975	callback_ctx = self._init_callback_context()
976	callback_ctx.call_on_fit_task_begin(
977	estimator=self, X=X, y=y, metadata={"sample_weight": sample_weight}
978	)
979
980	if sample_weight is not None:
981	sample_weight = _check_sample_weight(sample_weight, X, dtype=X.dtype)
982
983	# Even in the case of `with_mean=False`, we update the mean anyway
984	# This is needed for the incremental computation of the var
985	# See incr_mean_variance_axis and _incremental_mean_variance_axis
986
987	# if n_samples_seen_ is an integer (i.e. no missing values), we need to
988	# transform it to an array of shape (n_features,) required by

Callers 6

fitMethod · 0.95

test_standard_scaler_partial_fitFunction · 0.95

test_standard_scaler_partial_fit_numerical_stabilityFunction · 0.95

test_partial_fit_sparse_inputFunction · 0.95

test_standard_scaler_transform_with_partial_fitFunction · 0.95

test_scaler_return_identityFunction · 0.95

Calls 15

get_namespace_and_deviceFunction · 0.90

validate_dataFunction · 0.90

supported_float_dtypesFunction · 0.90

_check_sample_weightFunction · 0.90

sizeFunction · 0.90

mean_variance_axisFunction · 0.90

incr_mean_variance_axisFunction · 0.90

_incremental_mean_and_varFunction · 0.90

_is_constant_featureFunction · 0.85

_handle_zeros_in_scaleFunction · 0.85

_init_callback_contextMethod · 0.80

call_on_fit_task_beginMethod · 0.80

Tested by 5

test_standard_scaler_partial_fitFunction · 0.76

test_standard_scaler_partial_fit_numerical_stabilityFunction · 0.76

test_partial_fit_sparse_inputFunction · 0.76

test_standard_scaler_transform_with_partial_fitFunction · 0.76

test_scaler_return_identityFunction · 0.76