Discover the top non-normalized motifs (i.e., without z-normalization) for time series `T`. A subsequence, `Q`, becomes a candidate motif if there are at least `min_neighbor` number of other subsequence matches in `T` (outside the exclusion zone) with a distance less or equal t
(
T,
P,
min_neighbors=1,
max_distance=None,
cutoff=None,
max_matches=10,
max_motifs=1,
atol=1e-8,
p=2.0,
)
| 150 | |
| 151 | |
| 152 | def aamp_motifs( |
| 153 | T, |
| 154 | P, |
| 155 | min_neighbors=1, |
| 156 | max_distance=None, |
| 157 | cutoff=None, |
| 158 | max_matches=10, |
| 159 | max_motifs=1, |
| 160 | atol=1e-8, |
| 161 | p=2.0, |
| 162 | ): |
| 163 | """ |
| 164 | Discover the top non-normalized motifs (i.e., without z-normalization) for time |
| 165 | series `T`. |
| 166 | |
| 167 | A subsequence, `Q`, becomes a candidate motif if there are at least `min_neighbor` |
| 168 | number of other subsequence matches in `T` (outside the exclusion zone) with a |
| 169 | distance less or equal to `max_distance`. |
| 170 | |
| 171 | Note that, in the best case scenario, the returned arrays would have shape |
| 172 | `(max_motifs, max_matches)` and contain all finite values. However, in reality, |
| 173 | many conditions (see below) need to be satisfied in order for this to be true. Any |
| 174 | truncation in the number of rows (i.e., motifs) may be the result of insufficient |
| 175 | candidate motifs with matches greater than or equal to `min_neighbors` or that the |
| 176 | matrix profile value for the candidate motif was larger than `cutoff`. Similarly, |
| 177 | any truncation in the number of columns (i.e., matches) may be the result of |
| 178 | insufficient matches being found with distances (to their corresponding candidate |
| 179 | motif) that are equal to or less than `max_distance`. Only motifs and matches that |
| 180 | satisfy all of these constraints will be returned. |
| 181 | |
| 182 | If you must return a shape of `(max_motifs, max_matches)`, then you may consider |
| 183 | specifying a smaller `min_neighbors`, a larger `max_distance`, and/or a larger |
| 184 | `cutoff`. For example, while it is ill advised, setting `min_neighbors=1`, |
| 185 | `max_distance=np.inf`, and `cutoff=np.inf` will ensure that the shape of the output |
| 186 | arrays will be `(max_motifs, max_matches)`. However, given the lack of constraints, |
| 187 | the quality of each motif and the quality of each match may be drastically |
| 188 | different. Setting appropriate conditions will help ensure appropriately |
| 189 | constrained results that may be easier to interpret. |
| 190 | |
| 191 | Parameters |
| 192 | ---------- |
| 193 | T : numpy.ndarray |
| 194 | The time series or sequence |
| 195 | |
| 196 | P : numpy.ndarray |
| 197 | Matrix Profile of `T` |
| 198 | |
| 199 | min_neighbors : int, default 1 |
| 200 | The minimum number of similar matches a subsequence needs to have in order |
| 201 | to be considered a motif. This defaults to `1`, which means that a |
| 202 | subsequence must have at least one similar match in order to be considered |
| 203 | a motif. |
| 204 | |
| 205 | max_distance : float or function, default None |
| 206 | For a candidate motif, `Q`, and a non-trivial subsequence, `S`, |
| 207 | `max_distance` is the maximum distance allowed between `Q` and `S` so that |
| 208 | `S` is considered a match of `Q`. If `max_distance` is a function, then it |
| 209 | must be a function that accepts a single parameter, `D`, in its function |