Encode a collection of multi-featured examples into a `n_dim`-dimensional feature matrix via feature hashing. Notes ----- Feature hashing works by applying a hash function to the features of an example and using the hash values as column indices in t
(self, examples)
| 299 | self.sparse = sparse and _SCIPY |
| 300 | |
| 301 | def encode(self, examples): |
| 302 | """ |
| 303 | Encode a collection of multi-featured examples into a |
| 304 | `n_dim`-dimensional feature matrix via feature hashing. |
| 305 | |
| 306 | Notes |
| 307 | ----- |
| 308 | Feature hashing works by applying a hash function to the features of an |
| 309 | example and using the hash values as column indices in the resulting |
| 310 | feature matrix. The entries at each hashed feature column correspond to |
| 311 | the values for that example and feature. For example, given the |
| 312 | following two input examples: |
| 313 | |
| 314 | >>> examples = [ |
| 315 | {"furry": 1, "quadruped": 1, "domesticated": 1}, |
| 316 | {"nocturnal": 1, "quadruped": 1}, |
| 317 | ] |
| 318 | |
| 319 | and a hypothetical hash function `H` mapping strings to [0, 127], we have: |
| 320 | |
| 321 | >>> feature_mat = zeros(2, 128) |
| 322 | >>> ex1_cols = [H("furry"), H("quadruped"), H("domesticated")] |
| 323 | >>> ex2_cols = [H("nocturnal"), H("quadruped")] |
| 324 | >>> feat_mat[0, ex1_cols] = 1 |
| 325 | >>> feat_mat[1, ex2_cols] = 1 |
| 326 | |
| 327 | To better handle hash collisions, it is common to multiply the feature |
| 328 | value by the sign of the digest for the corresponding feature name. |
| 329 | |
| 330 | Parameters |
| 331 | ---------- |
| 332 | examples : dict or list of dicts |
| 333 | A collection of `N` examples, each represented as a dict where keys |
| 334 | correspond to the feature name and values correspond to the feature |
| 335 | value. |
| 336 | |
| 337 | Returns |
| 338 | ------- |
| 339 | table : :py:class:`ndarray <numpy.ndarray>` or :py:class:`csr_matrix <scipy.sparse.csr_matrix>` of shape `(N, n_dim)` |
| 340 | The encoded feature matrix |
| 341 | """ |
| 342 | if isinstance(examples, dict): |
| 343 | examples = [examples] |
| 344 | |
| 345 | sparse = self.sparse |
| 346 | return self._encode_sparse(examples) if sparse else self._encode_dense(examples) |
| 347 | |
| 348 | def _encode_dense(self, examples): |
| 349 | N = len(examples) |
no test coverage detected