MCPcopy
hub / github.com/scikit-learn/scikit-learn / make_classification

Function make_classification

sklearn/datasets/_samples_generator.py:67–387  ·  view source on GitHub ↗

Generate a random n-class classification problem. This initially creates clusters of points normally distributed (std=1) about vertices of an ``n_informative``-dimensional hypercube with sides of length ``2*class_sep`` and assigns an equal number of clusters to each class. It introd

(
    n_samples=100,
    n_features=20,
    *,
    n_informative=2,
    n_redundant=2,
    n_repeated=0,
    n_classes=2,
    n_clusters_per_class=2,
    weights=None,
    flip_y=0.01,
    class_sep=1.0,
    hypercube=True,
    shift=0.0,
    scale=1.0,
    shuffle=True,
    random_state=None,
    return_X_y=True,
)

Source from the content-addressed store, hash-verified

65 prefer_skip_nested_validation=True,
66)
67def make_classification(
68 n_samples=100,
69 n_features=20,
70 *,
71 n_informative=2,
72 n_redundant=2,
73 n_repeated=0,
74 n_classes=2,
75 n_clusters_per_class=2,
76 weights=None,
77 flip_y=0.01,
78 class_sep=1.0,
79 hypercube=True,
80 shift=0.0,
81 scale=1.0,
82 shuffle=True,
83 random_state=None,
84 return_X_y=True,
85):
86 """Generate a random n-class classification problem.
87
88 This initially creates clusters of points normally distributed (std=1)
89 about vertices of an ``n_informative``-dimensional hypercube with sides of
90 length ``2*class_sep`` and assigns an equal number of clusters to each
91 class. It introduces interdependence between these features and adds
92 various types of further noise to the data.
93
94 Without shuffling, ``X`` horizontally stacks features in the following
95 order: the primary ``n_informative`` features, followed by ``n_redundant``
96 linear combinations of the informative features, followed by ``n_repeated``
97 duplicates, drawn randomly with replacement from the informative and
98 redundant features. The remaining features are filled with random noise.
99 Thus, without shuffling, all useful features are contained in the columns
100 ``X[:, :n_informative + n_redundant + n_repeated]``.
101
102 Read more in the :ref:`User Guide <sample_generators>`.
103
104 Parameters
105 ----------
106 n_samples : int, default=100
107 The number of samples.
108
109 n_features : int, default=20
110 The total number of features. These comprise ``n_informative``
111 informative features, ``n_redundant`` redundant features,
112 ``n_repeated`` duplicated features and
113 ``n_features-n_informative-n_redundant-n_repeated`` useless features
114 drawn at random.
115
116 n_informative : int, default=2
117 The number of informative features. Each class is composed of a number
118 of gaussian clusters each located around the vertices of a hypercube
119 in a subspace of dimension ``n_informative``. For each cluster,
120 informative features are drawn independently from N(0, 1) and then
121 randomly linearly combined within each cluster in order to add
122 covariance. The clusters are then placed on the vertices of the
123 hypercube.
124

Calls 5

check_random_stateFunction · 0.90
BunchClass · 0.90
sumFunction · 0.85
_generate_hypercubeFunction · 0.85
formatMethod · 0.80

Used in the wild real call sites across dependent graphs

searching dependent graphs…