Discover the top motifs for time series ``T`` A subsequence, ``Q``, becomes a candidate motif if there are at least ``min_neighbor`` number of other subsequence matches in ``T`` (outside the exclusion zone) with a distance less or equal to ``max_distance``. Note that, in the b
(
T,
P,
min_neighbors=1,
max_distance=None,
cutoff=None,
max_matches=10,
max_motifs=1,
atol=1e-8,
normalize=True,
p=2.0,
T_subseq_isconstant=None,
)
| 164 | ], |
| 165 | ) |
| 166 | def motifs( |
| 167 | T, |
| 168 | P, |
| 169 | min_neighbors=1, |
| 170 | max_distance=None, |
| 171 | cutoff=None, |
| 172 | max_matches=10, |
| 173 | max_motifs=1, |
| 174 | atol=1e-8, |
| 175 | normalize=True, |
| 176 | p=2.0, |
| 177 | T_subseq_isconstant=None, |
| 178 | ): |
| 179 | """ |
| 180 | Discover the top motifs for time series ``T`` |
| 181 | |
| 182 | A subsequence, ``Q``, becomes a candidate motif if there are at least |
| 183 | ``min_neighbor`` number of other subsequence matches in ``T`` (outside the |
| 184 | exclusion zone) with a distance less or equal to ``max_distance``. |
| 185 | |
| 186 | Note that, in the best case scenario, the returned arrays would have shape |
| 187 | ``(max_motifs, max_matches)`` and contain all finite values. However, in reality, |
| 188 | many conditions (see below) need to be satisfied in order for this to be true. Any |
| 189 | truncation in the number of rows (i.e., motifs) may be the result of insufficient |
| 190 | candidate motifs with matches greater than or equal to ``min_neighbors`` or that |
| 191 | the matrix profile value for the candidate motif was larger than ``cutoff``. |
| 192 | Similarly, any truncation in the number of columns (i.e., matches) may be the result |
| 193 | of insufficient matches being found with distances (to their corresponding candidate |
| 194 | motif) that are equal to or less than ``max_distance``. Only motifs and matches that |
| 195 | satisfy all of these constraints will be returned. |
| 196 | |
| 197 | If you must return a shape of ``(max_motifs, max_matches)``, then you may consider |
| 198 | specifying a smaller ``min_neighbors``, a larger ``max_distance``, and/or a larger |
| 199 | ``cutoff``. For example, while it is ill advised, setting ``min_neighbors=1``, |
| 200 | ``max_distance = np.inf``, and ``cutoff = np.inf`` will ensure that the shape of the |
| 201 | output arrays will be ``(max_motifs, max_matches)``. However, given the lack of |
| 202 | constraints, the quality of each motif and the quality of each match may be |
| 203 | drastically different. Setting appropriate conditions will help ensure appropriately |
| 204 | constrained results that may be easier to interpret. |
| 205 | |
| 206 | Parameters |
| 207 | ---------- |
| 208 | T : numpy.ndarray |
| 209 | The time series or sequence. |
| 210 | |
| 211 | P : numpy.ndarray |
| 212 | The (1-dimensional) matrix profile of ``T``. In the case where the matrix |
| 213 | profile was computed with ``k > 1`` (i.e., top-k nearest neighbors), you |
| 214 | must summarize the top-k nearest-neighbor distances for each subsequence |
| 215 | into a single value (e.g., ``np.mean``, ``np.min``, etc) and then use that |
| 216 | derived value as your ``P``. |
| 217 | |
| 218 | min_neighbors : int, default 1 |
| 219 | The minimum number of similar matches a subsequence needs to have in order |
| 220 | to be considered a motif. This defaults to ``1``, which means that a subsequence |
| 221 | must have at least one similar match in order to be considered a motif. |
| 222 | |
| 223 | max_distance : float or function, default None |