MCPcopy Index your code
hub / github.com/scikit-learn/scikit-learn / train_test_split

Function train_test_split

sklearn/model_selection/_split.py:2797–2976  ·  view source on GitHub ↗

Split arrays or matrices into random train and test subsets. Quick utility that wraps input validation, ``next(ShuffleSplit().split(X, y))``, and application to input data into a single call for splitting (and optionally subsampling) data into a one-liner. Read more in the :ref

(
    *arrays,
    test_size=None,
    train_size=None,
    random_state=None,
    shuffle=True,
    stratify=None,
)

Source from the content-addressed store, hash-verified

2795 prefer_skip_nested_validation=True,
2796)
2797def train_test_split(
2798 *arrays,
2799 test_size=None,
2800 train_size=None,
2801 random_state=None,
2802 shuffle=True,
2803 stratify=None,
2804):
2805 """Split arrays or matrices into random train and test subsets.
2806
2807 Quick utility that wraps input validation,
2808 ``next(ShuffleSplit().split(X, y))``, and application to input data
2809 into a single call for splitting (and optionally subsampling) data into a
2810 one-liner.
2811
2812 Read more in the :ref:`User Guide <cross_validation>`.
2813
2814 Parameters
2815 ----------
2816 *arrays : sequence of indexables with same length / shape[0]
2817 Allowed inputs are lists, numpy arrays, scipy-sparse
2818 matrices or pandas dataframes.
2819
2820 test_size : float or int, default=None
2821 If float, should be between 0.0 and 1.0 and represent the proportion
2822 of the dataset to include in the test split. If int, represents the
2823 absolute number of test samples. If None, the value is set to the
2824 complement of the train size. If ``train_size`` is also None, it will
2825 be set to 0.25.
2826
2827 train_size : float or int, default=None
2828 If float, should be between 0.0 and 1.0 and represent the
2829 proportion of the dataset to include in the train split. If
2830 int, represents the absolute number of train samples. If None,
2831 the value is automatically set to the complement of the test size.
2832
2833 random_state : int, RandomState instance or None, default=None
2834 Controls the shuffling applied to the data before applying the split.
2835 Pass an int for reproducible output across multiple function calls.
2836 See :term:`Glossary <random_state>`.
2837
2838 shuffle : bool, default=True
2839 Whether or not to shuffle the data before splitting. If shuffle=False
2840 then stratify must be None.
2841
2842 stratify : array-like, default=None
2843 If not None, data is split in a stratified fashion, using this as
2844 the class labels.
2845 Read more in the :ref:`User Guide <stratification>`.
2846
2847 Returns
2848 -------
2849 splitting : list, length=2 * len(arrays)
2850 List containing train-test split of inputs.
2851
2852 .. versionadded:: 0.16
2853 If the input is sparse, the output will be a
2854 ``scipy.sparse.csr_matrix``. Else, output type is the same as the

Callers 15

_blobs_datasetFunction · 0.90
_mnist_datasetFunction · 0.90
_digits_datasetFunction · 0.90
_olivetti_faces_datasetFunction · 0.90
_random_datasetFunction · 0.90

Calls 7

indexableFunction · 0.90
_num_samplesFunction · 0.90
get_namespace_and_deviceFunction · 0.90
move_toFunction · 0.90
_safe_indexingFunction · 0.90
_validate_shuffle_splitFunction · 0.85
splitMethod · 0.45

Used in the wild real call sites across dependent graphs

searching dependent graphs…