Split arrays or matrices into random train and test subsets. Quick utility that wraps input validation, ``next(ShuffleSplit().split(X, y))``, and application to input data into a single call for splitting (and optionally subsampling) data into a one-liner. Read more in the :ref
(
*arrays,
test_size=None,
train_size=None,
random_state=None,
shuffle=True,
stratify=None,
)
| 2795 | prefer_skip_nested_validation=True, |
| 2796 | ) |
| 2797 | def train_test_split( |
| 2798 | *arrays, |
| 2799 | test_size=None, |
| 2800 | train_size=None, |
| 2801 | random_state=None, |
| 2802 | shuffle=True, |
| 2803 | stratify=None, |
| 2804 | ): |
| 2805 | """Split arrays or matrices into random train and test subsets. |
| 2806 | |
| 2807 | Quick utility that wraps input validation, |
| 2808 | ``next(ShuffleSplit().split(X, y))``, and application to input data |
| 2809 | into a single call for splitting (and optionally subsampling) data into a |
| 2810 | one-liner. |
| 2811 | |
| 2812 | Read more in the :ref:`User Guide <cross_validation>`. |
| 2813 | |
| 2814 | Parameters |
| 2815 | ---------- |
| 2816 | *arrays : sequence of indexables with same length / shape[0] |
| 2817 | Allowed inputs are lists, numpy arrays, scipy-sparse |
| 2818 | matrices or pandas dataframes. |
| 2819 | |
| 2820 | test_size : float or int, default=None |
| 2821 | If float, should be between 0.0 and 1.0 and represent the proportion |
| 2822 | of the dataset to include in the test split. If int, represents the |
| 2823 | absolute number of test samples. If None, the value is set to the |
| 2824 | complement of the train size. If ``train_size`` is also None, it will |
| 2825 | be set to 0.25. |
| 2826 | |
| 2827 | train_size : float or int, default=None |
| 2828 | If float, should be between 0.0 and 1.0 and represent the |
| 2829 | proportion of the dataset to include in the train split. If |
| 2830 | int, represents the absolute number of train samples. If None, |
| 2831 | the value is automatically set to the complement of the test size. |
| 2832 | |
| 2833 | random_state : int, RandomState instance or None, default=None |
| 2834 | Controls the shuffling applied to the data before applying the split. |
| 2835 | Pass an int for reproducible output across multiple function calls. |
| 2836 | See :term:`Glossary <random_state>`. |
| 2837 | |
| 2838 | shuffle : bool, default=True |
| 2839 | Whether or not to shuffle the data before splitting. If shuffle=False |
| 2840 | then stratify must be None. |
| 2841 | |
| 2842 | stratify : array-like, default=None |
| 2843 | If not None, data is split in a stratified fashion, using this as |
| 2844 | the class labels. |
| 2845 | Read more in the :ref:`User Guide <stratification>`. |
| 2846 | |
| 2847 | Returns |
| 2848 | ------- |
| 2849 | splitting : list, length=2 * len(arrays) |
| 2850 | List containing train-test split of inputs. |
| 2851 | |
| 2852 | .. versionadded:: 0.16 |
| 2853 | If the input is sparse, the output will be a |
| 2854 | ``scipy.sparse.csr_matrix``. Else, output type is the same as the |
searching dependent graphs…