An ndarray subclass for working with arrays of strings. Factorizes the input array into integers, but overloads equality on strings to check against the factor label. Parameters ---------- values : array-like Array of values that can be passed to np.asarray with dt
| 89 | |
| 90 | |
| 91 | class LabelArray(ndarray): |
| 92 | """ |
| 93 | An ndarray subclass for working with arrays of strings. |
| 94 | |
| 95 | Factorizes the input array into integers, but overloads equality on strings |
| 96 | to check against the factor label. |
| 97 | |
| 98 | Parameters |
| 99 | ---------- |
| 100 | values : array-like |
| 101 | Array of values that can be passed to np.asarray with dtype=object. |
| 102 | missing_value : str |
| 103 | Scalar value to treat as 'missing' for operations on ``self``. |
| 104 | categories : list[str], optional |
| 105 | List of values to use as categories. If not supplied, categories will |
| 106 | be inferred as the unique set of entries in ``values``. |
| 107 | sort : bool, optional |
| 108 | Whether to sort categories. If sort is False and categories is |
| 109 | supplied, they are left in the order provided. If sort is False and |
| 110 | categories is None, categories will be constructed in a random order. |
| 111 | |
| 112 | Attributes |
| 113 | ---------- |
| 114 | categories : ndarray[str] |
| 115 | An array containing the unique labels of self. |
| 116 | reverse_categories : dict[str -> int] |
| 117 | Reverse lookup table for ``categories``. Stores the index in |
| 118 | ``categories`` at which each entry each unique entry is found. |
| 119 | missing_value : str or None |
| 120 | A sentinel missing value with NaN semantics for comparisons. |
| 121 | |
| 122 | Notes |
| 123 | ----- |
| 124 | Consumers should be cautious when passing instances of LabelArray to numpy |
| 125 | functions. We attempt to disallow as many meaningless operations as |
| 126 | possible, but since a LabelArray is just an ndarray of ints with some |
| 127 | additional metadata, many numpy functions (for example, trigonometric) will |
| 128 | happily accept a LabelArray and treat its values as though they were |
| 129 | integers. |
| 130 | |
| 131 | In a future change, we may be able to disallow more numerical operations by |
| 132 | creating a wrapper dtype which doesn't register an implementation for most |
| 133 | numpy ufuncs. Until that change is made, consumers of LabelArray should |
| 134 | assume that it is undefined behavior to pass a LabelArray to any numpy |
| 135 | ufunc that operates on semantically-numerical data. |
| 136 | |
| 137 | See Also |
| 138 | -------- |
| 139 | https://docs.scipy.org/doc/numpy-1.11.0/user/basics.subclassing.html |
| 140 | """ |
| 141 | SUPPORTED_SCALAR_TYPES = (bytes, unicode, type(None)) |
| 142 | SUPPORTED_NON_NONE_SCALAR_TYPES = (bytes, unicode) |
| 143 | |
| 144 | @preprocess( |
| 145 | values=coerce(list, partial(np.asarray, dtype=object)), |
| 146 | # Coerce ``list`` to ``list`` to make a copy. Code internally may call |
| 147 | # ``categories.insert(0, missing_value)`` which will mutate this list |
| 148 | # in place. |