MCPcopy Index your code
hub / github.com/ddbourgin/numpy-ml / encode

Method encode

numpy_ml/preprocessing/general.py:301–346  ·  view source on GitHub ↗

Encode a collection of multi-featured examples into a `n_dim`-dimensional feature matrix via feature hashing. Notes ----- Feature hashing works by applying a hash function to the features of an example and using the hash values as column indices in t

(self, examples)

Source from the content-addressed store, hash-verified

299 self.sparse = sparse and _SCIPY
300
301 def encode(self, examples):
302 """
303 Encode a collection of multi-featured examples into a
304 `n_dim`-dimensional feature matrix via feature hashing.
305
306 Notes
307 -----
308 Feature hashing works by applying a hash function to the features of an
309 example and using the hash values as column indices in the resulting
310 feature matrix. The entries at each hashed feature column correspond to
311 the values for that example and feature. For example, given the
312 following two input examples:
313
314 >>> examples = [
315 {"furry": 1, "quadruped": 1, "domesticated": 1},
316 {"nocturnal": 1, "quadruped": 1},
317 ]
318
319 and a hypothetical hash function `H` mapping strings to [0, 127], we have:
320
321 >>> feature_mat = zeros(2, 128)
322 >>> ex1_cols = [H("furry"), H("quadruped"), H("domesticated")]
323 >>> ex2_cols = [H("nocturnal"), H("quadruped")]
324 >>> feat_mat[0, ex1_cols] = 1
325 >>> feat_mat[1, ex2_cols] = 1
326
327 To better handle hash collisions, it is common to multiply the feature
328 value by the sign of the digest for the corresponding feature name.
329
330 Parameters
331 ----------
332 examples : dict or list of dicts
333 A collection of `N` examples, each represented as a dict where keys
334 correspond to the feature name and values correspond to the feature
335 value.
336
337 Returns
338 -------
339 table : :py:class:`ndarray <numpy.ndarray>` or :py:class:`csr_matrix <scipy.sparse.csr_matrix>` of shape `(N, n_dim)`
340 The encoded feature matrix
341 """
342 if isinstance(examples, dict):
343 examples = [examples]
344
345 sparse = self.sparse
346 return self._encode_sparse(examples) if sparse else self._encode_dense(examples)
347
348 def _encode_dense(self, examples):
349 N = len(examples)

Callers 4

_encode_denseMethod · 0.80
_encode_sparseMethod · 0.80
tokenize_words_bytesFunction · 0.80
tokenize_bytes_rawFunction · 0.80

Calls 2

_encode_sparseMethod · 0.95
_encode_denseMethod · 0.95

Tested by

no test coverage detected