MCPcopy
hub / github.com/scikit-learn/scikit-learn / StandardScaler

Class StandardScaler

sklearn/preprocessing/_data.py:742–1187  ·  view source on GitHub ↗

Standardize features by removing the mean and scaling to unit variance. The standard score of a sample `x` is calculated as: .. code-block:: text z = (x - u) / s where `u` is the mean of the training samples or zero if `with_mean=False`, and `s` is the standard deviation

Source from the content-addressed store, hash-verified

740
741
742class StandardScaler(
743 CallbackSupportMixin, OneToOneFeatureMixin, TransformerMixin, BaseEstimator
744):
745 """Standardize features by removing the mean and scaling to unit variance.
746
747 The standard score of a sample `x` is calculated as:
748
749 .. code-block:: text
750
751 z = (x - u) / s
752
753 where `u` is the mean of the training samples or zero if `with_mean=False`,
754 and `s` is the standard deviation of the training samples or one if
755 `with_std=False`.
756
757 Centering and scaling happen independently on each feature by computing
758 the relevant statistics on the samples in the training set. Mean and
759 standard deviation are then stored to be used on later data using
760 :meth:`transform`.
761
762 Standardization of a dataset is a common requirement for many
763 machine learning estimators: they might behave badly if the
764 individual features do not more or less look like standard normally
765 distributed data (e.g. Gaussian with 0 mean and unit variance).
766
767 For instance many elements used in the objective function of
768 a learning algorithm (such as the RBF kernel of Support Vector
769 Machines or the L1 and L2 regularizers of linear models) assume that
770 all features are centered around 0 and have variance in the same
771 order. If a feature has a variance that is orders of magnitude larger
772 than others, it might dominate the objective function and make the
773 estimator unable to learn from other features correctly as expected.
774
775 `StandardScaler` is sensitive to outliers, and the features may scale
776 differently from each other in the presence of outliers. For an example
777 visualization, refer to :ref:`Compare StandardScaler with other scalers
778 <plot_all_scaling_standard_scaler_section>`.
779
780 This scaler can also be applied to sparse CSR or CSC matrices by passing
781 `with_mean=False` to avoid breaking the sparsity structure of the data.
782
783 Read more in the :ref:`User Guide <preprocessing_scaler>`.
784
785 Parameters
786 ----------
787 copy : bool, default=True
788 If False, try to avoid a copy and do inplace scaling instead.
789 This is not guaranteed to always work inplace; e.g. if the data is
790 not a NumPy array or scipy.sparse CSR matrix, a copy may still be
791 returned.
792
793 with_mean : bool, default=True
794 If True, center the data before scaling.
795 This does not work (and will raise an exception) when attempted on
796 sparse matrices, because centering them entails building a dense
797 matrix which in common use cases is likely to be too large to fit in
798 memory.
799

Calls

no outgoing calls

Used in the wild real call sites across dependent graphs

searching dependent graphs…