hub / github.com/scikit-learn/scikit-learn / make_classification

Function make_classification

sklearn/datasets/_samples_generator.py:67–387 · view source on GitHub ↗

Generate a random n-class classification problem. This initially creates clusters of points normally distributed (std=1) about vertices of an ``n_informative``-dimensional hypercube with sides of length ``2*class_sep`` and assigns an equal number of clusters to each class. It introd

(
    n_samples=100,
    n_features=20,
    *,
    n_informative=2,
    n_redundant=2,
    n_repeated=0,
    n_classes=2,
    n_clusters_per_class=2,
    weights=None,
    flip_y=0.01,
    class_sep=1.0,
    hypercube=True,
    shift=0.0,
    scale=1.0,
    shuffle=True,
    random_state=None,
    return_X_y=True,
)

Source from the content-addressed store, hash-verified

65	prefer_skip_nested_validation=True,
66	)
67	def make_classification(
68	n_samples=100,
69	n_features=20,
70	*,
71	n_informative=2,
72	n_redundant=2,
73	n_repeated=0,
74	n_classes=2,
75	n_clusters_per_class=2,
76	weights=None,
77	flip_y=0.01,
78	class_sep=1.0,
79	hypercube=True,
80	shift=0.0,
81	scale=1.0,
82	shuffle=True,
83	random_state=None,
84	return_X_y=True,
85	):
86	"""Generate a random n-class classification problem.
87
88	This initially creates clusters of points normally distributed (std=1)
89	about vertices of an ``n_informative``-dimensional hypercube with sides of
90	length ``2*class_sep`` and assigns an equal number of clusters to each
91	class. It introduces interdependence between these features and adds
92	various types of further noise to the data.
93
94	Without shuffling, ``X`` horizontally stacks features in the following
95	order: the primary ``n_informative`` features, followed by ``n_redundant``
96	linear combinations of the informative features, followed by ``n_repeated``
97	duplicates, drawn randomly with replacement from the informative and
98	redundant features. The remaining features are filled with random noise.
99	Thus, without shuffling, all useful features are contained in the columns
100	``X[:, :n_informative + n_redundant + n_repeated]``.
101
102	Read more in the :ref:`User Guide <sample_generators>`.
103
104	Parameters
105	----------
106	n_samples : int, default=100
107	The number of samples.
108
109	n_features : int, default=20
110	The total number of features. These comprise ``n_informative``
111	informative features, ``n_redundant`` redundant features,
112	``n_repeated`` duplicated features and
113	``n_features-n_informative-n_redundant-n_repeated`` useless features
114	drawn at random.
115
116	n_informative : int, default=2
117	The number of informative features. Each class is composed of a number
118	of gaussian clusters each located around the vertices of a hypercube
119	in a subspace of dimension ``n_informative``. For each cluster,
120	informative features are drawn independently from N(0, 1) and then
121	randomly linearly combined within each cluster in order to add
122	covariance. The clusters are then placed on the vertices of the
123	hypercube.
124

Callers 15

_synth_classification_datasetFunction · 0.90

test_fit_and_score_over_thresholds_curve_scorersFunction · 0.90

test_fit_and_score_over_thresholds_prefitFunction · 0.90

test_fit_and_score_over_thresholds_fit_paramsFunction · 0.90

test_tuned_threshold_classifier_conflict_cv_refitFunction · 0.90

test_threshold_classifier_estimator_response_methodsFunction · 0.90

test_tuned_threshold_classifier_refitFunction · 0.90

test_tuned_threshold_classifier_fit_paramsFunction · 0.90

test_tuned_threshold_classifier_thresholds_arrayFunction · 0.90

test_tuned_threshold_classifier_store_cv_resultsFunction · 0.90

test_tuned_threshold_classifier_cv_floatFunction · 0.90

test_tuned_threshold_classifier_error_constant_predictorFunction · 0.90

Calls 5

check_random_stateFunction · 0.90

BunchClass · 0.90

sumFunction · 0.85

_generate_hypercubeFunction · 0.85

formatMethod · 0.80

Tested by 15

test_fit_and_score_over_thresholds_curve_scorersFunction · 0.72

test_fit_and_score_over_thresholds_prefitFunction · 0.72

test_fit_and_score_over_thresholds_fit_paramsFunction · 0.72

test_tuned_threshold_classifier_conflict_cv_refitFunction · 0.72

test_threshold_classifier_estimator_response_methodsFunction · 0.72

test_tuned_threshold_classifier_refitFunction · 0.72

test_tuned_threshold_classifier_fit_paramsFunction · 0.72

test_tuned_threshold_classifier_thresholds_arrayFunction · 0.72

test_tuned_threshold_classifier_store_cv_resultsFunction · 0.72

test_tuned_threshold_classifier_cv_floatFunction · 0.72

test_tuned_threshold_classifier_error_constant_predictorFunction · 0.72

test_fixed_threshold_classifier_equivalence_defaultFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…