Standardize features by removing the mean and scaling to unit variance. The standard score of a sample `x` is calculated as: .. code-block:: text z = (x - u) / s where `u` is the mean of the training samples or zero if `with_mean=False`, and `s` is the standard deviation
| 740 | |
| 741 | |
| 742 | class StandardScaler( |
| 743 | CallbackSupportMixin, OneToOneFeatureMixin, TransformerMixin, BaseEstimator |
| 744 | ): |
| 745 | """Standardize features by removing the mean and scaling to unit variance. |
| 746 | |
| 747 | The standard score of a sample `x` is calculated as: |
| 748 | |
| 749 | .. code-block:: text |
| 750 | |
| 751 | z = (x - u) / s |
| 752 | |
| 753 | where `u` is the mean of the training samples or zero if `with_mean=False`, |
| 754 | and `s` is the standard deviation of the training samples or one if |
| 755 | `with_std=False`. |
| 756 | |
| 757 | Centering and scaling happen independently on each feature by computing |
| 758 | the relevant statistics on the samples in the training set. Mean and |
| 759 | standard deviation are then stored to be used on later data using |
| 760 | :meth:`transform`. |
| 761 | |
| 762 | Standardization of a dataset is a common requirement for many |
| 763 | machine learning estimators: they might behave badly if the |
| 764 | individual features do not more or less look like standard normally |
| 765 | distributed data (e.g. Gaussian with 0 mean and unit variance). |
| 766 | |
| 767 | For instance many elements used in the objective function of |
| 768 | a learning algorithm (such as the RBF kernel of Support Vector |
| 769 | Machines or the L1 and L2 regularizers of linear models) assume that |
| 770 | all features are centered around 0 and have variance in the same |
| 771 | order. If a feature has a variance that is orders of magnitude larger |
| 772 | than others, it might dominate the objective function and make the |
| 773 | estimator unable to learn from other features correctly as expected. |
| 774 | |
| 775 | `StandardScaler` is sensitive to outliers, and the features may scale |
| 776 | differently from each other in the presence of outliers. For an example |
| 777 | visualization, refer to :ref:`Compare StandardScaler with other scalers |
| 778 | <plot_all_scaling_standard_scaler_section>`. |
| 779 | |
| 780 | This scaler can also be applied to sparse CSR or CSC matrices by passing |
| 781 | `with_mean=False` to avoid breaking the sparsity structure of the data. |
| 782 | |
| 783 | Read more in the :ref:`User Guide <preprocessing_scaler>`. |
| 784 | |
| 785 | Parameters |
| 786 | ---------- |
| 787 | copy : bool, default=True |
| 788 | If False, try to avoid a copy and do inplace scaling instead. |
| 789 | This is not guaranteed to always work inplace; e.g. if the data is |
| 790 | not a NumPy array or scipy.sparse CSR matrix, a copy may still be |
| 791 | returned. |
| 792 | |
| 793 | with_mean : bool, default=True |
| 794 | If True, center the data before scaling. |
| 795 | This does not work (and will raise an exception) when attempted on |
| 796 | sparse matrices, because centering them entails building a dense |
| 797 | matrix which in common use cases is likely to be too large to fit in |
| 798 | memory. |
| 799 |
no outgoing calls
searching dependent graphs…