hub / github.com/scikit-learn/scikit-learn / train_test_split

Function train_test_split

sklearn/model_selection/_split.py:2797–2976 · view source on GitHub ↗

Split arrays or matrices into random train and test subsets. Quick utility that wraps input validation, ``next(ShuffleSplit().split(X, y))``, and application to input data into a single call for splitting (and optionally subsampling) data into a one-liner. Read more in the :ref

(
    *arrays,
    test_size=None,
    train_size=None,
    random_state=None,
    shuffle=True,
    stratify=None,
)

Source from the content-addressed store, hash-verified

2795	prefer_skip_nested_validation=True,
2796	)
2797	def train_test_split(
2798	*arrays,
2799	test_size=None,
2800	train_size=None,
2801	random_state=None,
2802	shuffle=True,
2803	stratify=None,
2804	):
2805	"""Split arrays or matrices into random train and test subsets.
2806
2807	Quick utility that wraps input validation,
2808	``next(ShuffleSplit().split(X, y))``, and application to input data
2809	into a single call for splitting (and optionally subsampling) data into a
2810	one-liner.
2811
2812	Read more in the :ref:`User Guide <cross_validation>`.
2813
2814	Parameters
2815	----------
2816	*arrays : sequence of indexables with same length / shape[0]
2817	Allowed inputs are lists, numpy arrays, scipy-sparse
2818	matrices or pandas dataframes.
2819
2820	test_size : float or int, default=None
2821	If float, should be between 0.0 and 1.0 and represent the proportion
2822	of the dataset to include in the test split. If int, represents the
2823	absolute number of test samples. If None, the value is set to the
2824	complement of the train size. If ``train_size`` is also None, it will
2825	be set to 0.25.
2826
2827	train_size : float or int, default=None
2828	If float, should be between 0.0 and 1.0 and represent the
2829	proportion of the dataset to include in the train split. If
2830	int, represents the absolute number of train samples. If None,
2831	the value is automatically set to the complement of the test size.
2832
2833	random_state : int, RandomState instance or None, default=None
2834	Controls the shuffling applied to the data before applying the split.
2835	Pass an int for reproducible output across multiple function calls.
2836	See :term:`Glossary <random_state>`.
2837
2838	shuffle : bool, default=True
2839	Whether or not to shuffle the data before splitting. If shuffle=False
2840	then stratify must be None.
2841
2842	stratify : array-like, default=None
2843	If not None, data is split in a stratified fashion, using this as
2844	the class labels.
2845	Read more in the :ref:`User Guide <stratification>`.
2846
2847	Returns
2848	-------
2849	splitting : list, length=2 * len(arrays)
2850	List containing train-test split of inputs.
2851
2852	.. versionadded:: 0.16
2853	If the input is sparse, the output will be a
2854	``scipy.sparse.csr_matrix``. Else, output type is the same as the

Callers 15

_blobs_datasetFunction · 0.90

_20newsgroups_highdim_datasetFunction · 0.90

_20newsgroups_lowdim_datasetFunction · 0.90

_mnist_datasetFunction · 0.90

_digits_datasetFunction · 0.90

_synth_regression_datasetFunction · 0.90

_synth_regression_sparse_datasetFunction · 0.90

_synth_classification_datasetFunction · 0.90

_olivetti_faces_datasetFunction · 0.90

_random_datasetFunction · 0.90

test_scalar_fit_param_compatFunction · 0.90

test_train_test_split_errorsFunction · 0.90

Calls 7

indexableFunction · 0.90

_num_samplesFunction · 0.90

get_namespace_and_deviceFunction · 0.90

move_toFunction · 0.90

_safe_indexingFunction · 0.90

_validate_shuffle_splitFunction · 0.85

splitMethod · 0.45

Tested by 15

test_scalar_fit_param_compatFunction · 0.72

test_train_test_split_errorsFunction · 0.72

test_train_test_split_default_test_sizeFunction · 0.72

test_array_api_train_test_splitFunction · 0.72

test_train_test_splitFunction · 0.72

test_train_test_split_32bit_overflowFunction · 0.72

test_train_test_split_pandasFunction · 0.72

test_train_test_split_sparseFunction · 0.72

test_train_test_split_mock_pandasFunction · 0.72

test_train_test_split_list_inputFunction · 0.72

test_train_test_split_allow_nansFunction · 0.72

test_train_test_split_empty_trainsetFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…