The Knuth-Morris-Pratt Algorithm for finding a pattern within a piece of text with complexity O(n + m) 1) Preprocess pattern to identify any suffixes that are identical to prefixes This tells us where to continue from if we get a mismatch between a character in our pat
(text: str, pattern: str)
| 2 | |
| 3 | |
| 4 | def knuth_morris_pratt(text: str, pattern: str) -> int: |
| 5 | """ |
| 6 | The Knuth-Morris-Pratt Algorithm for finding a pattern within a piece of text |
| 7 | with complexity O(n + m) |
| 8 | |
| 9 | 1) Preprocess pattern to identify any suffixes that are identical to prefixes |
| 10 | |
| 11 | This tells us where to continue from if we get a mismatch between a character |
| 12 | in our pattern and the text. |
| 13 | |
| 14 | 2) Step through the text one character at a time and compare it to a character in |
| 15 | the pattern updating our location within the pattern if necessary |
| 16 | |
| 17 | >>> kmp = "knuth_morris_pratt" |
| 18 | >>> all( |
| 19 | ... knuth_morris_pratt(kmp, s) == kmp.find(s) |
| 20 | ... for s in ("kn", "h_m", "rr", "tt", "not there") |
| 21 | ... ) |
| 22 | True |
| 23 | """ |
| 24 | |
| 25 | # 1) Construct the failure array |
| 26 | failure = get_failure_array(pattern) |
| 27 | |
| 28 | # 2) Step through text searching for pattern |
| 29 | i, j = 0, 0 # index into text, pattern |
| 30 | while i < len(text): |
| 31 | if pattern[j] == text[i]: |
| 32 | if j == (len(pattern) - 1): |
| 33 | return i - j |
| 34 | j += 1 |
| 35 | |
| 36 | # if this is a prefix in our pattern |
| 37 | # just go back far enough to continue |
| 38 | elif j > 0: |
| 39 | j = failure[j - 1] |
| 40 | continue |
| 41 | i += 1 |
| 42 | return -1 |
| 43 | |
| 44 | |
| 45 | def get_failure_array(pattern: str) -> list[int]: |
no test coverage detected