Generate a random n-class classification problem. This initially creates clusters of points normally distributed (std=1) about vertices of an ``n_informative``-dimensional hypercube with sides of length ``2*class_sep`` and assigns an equal number of clusters to each class. It introd
(
n_samples=100,
n_features=20,
*,
n_informative=2,
n_redundant=2,
n_repeated=0,
n_classes=2,
n_clusters_per_class=2,
weights=None,
flip_y=0.01,
class_sep=1.0,
hypercube=True,
shift=0.0,
scale=1.0,
shuffle=True,
random_state=None,
return_X_y=True,
)
| 65 | prefer_skip_nested_validation=True, |
| 66 | ) |
| 67 | def make_classification( |
| 68 | n_samples=100, |
| 69 | n_features=20, |
| 70 | *, |
| 71 | n_informative=2, |
| 72 | n_redundant=2, |
| 73 | n_repeated=0, |
| 74 | n_classes=2, |
| 75 | n_clusters_per_class=2, |
| 76 | weights=None, |
| 77 | flip_y=0.01, |
| 78 | class_sep=1.0, |
| 79 | hypercube=True, |
| 80 | shift=0.0, |
| 81 | scale=1.0, |
| 82 | shuffle=True, |
| 83 | random_state=None, |
| 84 | return_X_y=True, |
| 85 | ): |
| 86 | """Generate a random n-class classification problem. |
| 87 | |
| 88 | This initially creates clusters of points normally distributed (std=1) |
| 89 | about vertices of an ``n_informative``-dimensional hypercube with sides of |
| 90 | length ``2*class_sep`` and assigns an equal number of clusters to each |
| 91 | class. It introduces interdependence between these features and adds |
| 92 | various types of further noise to the data. |
| 93 | |
| 94 | Without shuffling, ``X`` horizontally stacks features in the following |
| 95 | order: the primary ``n_informative`` features, followed by ``n_redundant`` |
| 96 | linear combinations of the informative features, followed by ``n_repeated`` |
| 97 | duplicates, drawn randomly with replacement from the informative and |
| 98 | redundant features. The remaining features are filled with random noise. |
| 99 | Thus, without shuffling, all useful features are contained in the columns |
| 100 | ``X[:, :n_informative + n_redundant + n_repeated]``. |
| 101 | |
| 102 | Read more in the :ref:`User Guide <sample_generators>`. |
| 103 | |
| 104 | Parameters |
| 105 | ---------- |
| 106 | n_samples : int, default=100 |
| 107 | The number of samples. |
| 108 | |
| 109 | n_features : int, default=20 |
| 110 | The total number of features. These comprise ``n_informative`` |
| 111 | informative features, ``n_redundant`` redundant features, |
| 112 | ``n_repeated`` duplicated features and |
| 113 | ``n_features-n_informative-n_redundant-n_repeated`` useless features |
| 114 | drawn at random. |
| 115 | |
| 116 | n_informative : int, default=2 |
| 117 | The number of informative features. Each class is composed of a number |
| 118 | of gaussian clusters each located around the vertices of a hypercube |
| 119 | in a subspace of dimension ``n_informative``. For each cluster, |
| 120 | informative features are drawn independently from N(0, 1) and then |
| 121 | randomly linearly combined within each cluster in order to add |
| 122 | covariance. The clusters are then placed on the vertices of the |
| 123 | hypercube. |
| 124 |
searching dependent graphs…