MCPcopy Index your code
hub / github.com/TheAlgorithms/Python / jaro_winkler

Function jaro_winkler

strings/jaro_winkler.py:4–73  ·  view source on GitHub ↗

Jaro-Winkler distance is a string metric measuring an edit distance between two sequences. Output value is between 0.0 and 1.0. >>> jaro_winkler("martha", "marhta") 0.9611111111111111 >>> jaro_winkler("CRATE", "TRACE") 0.7333333333333334 >>> jaro_winkler("test", "db

(str1: str, str2: str)

Source from the content-addressed store, hash-verified

2
3
4def jaro_winkler(str1: str, str2: str) -> float:
5 """
6 Jaro-Winkler distance is a string metric measuring an edit distance between two
7 sequences.
8 Output value is between 0.0 and 1.0.
9
10 >>> jaro_winkler("martha", "marhta")
11 0.9611111111111111
12 >>> jaro_winkler("CRATE", "TRACE")
13 0.7333333333333334
14 >>> jaro_winkler("test", "dbdbdbdb")
15 0.0
16 >>> jaro_winkler("test", "test")
17 1.0
18 >>> jaro_winkler("hello world", "HeLLo W0rlD")
19 0.6363636363636364
20 >>> jaro_winkler("test", "")
21 0.0
22 >>> jaro_winkler("hello", "world")
23 0.4666666666666666
24 >>> jaro_winkler("hell**o", "*world")
25 0.4365079365079365
26 """
27
28 def get_matched_characters(_str1: str, _str2: str) -> str:
29 matched = []
30 limit = min(len(_str1), len(_str2)) // 2
31 for i, char in enumerate(_str1):
32 left = int(max(0, i - limit))
33 right = int(min(i + limit + 1, len(_str2)))
34 if char in _str2[left:right]:
35 matched.append(char)
36 _str2 = (
37 f"{_str2[0 : _str2.index(char)]} {_str2[_str2.index(char) + 1 :]}"
38 )
39
40 return "".join(matched)
41
42 # matching characters
43 matching_1 = get_matched_characters(str1, str2)
44 matching_2 = get_matched_characters(str2, str1)
45 match_count = len(matching_1)
46
47 # transposition
48 transpositions = (
49 len([(c1, c2) for c1, c2 in zip(matching_1, matching_2) if c1 != c2]) // 2
50 )
51
52 if not match_count:
53 jaro = 0.0
54 else:
55 jaro = (
56 1
57 / 3
58 * (
59 match_count / len(str1)
60 + match_count / len(str2)
61 + (match_count - transpositions) / match_count

Callers 1

jaro_winkler.pyFile · 0.85

Calls 1

get_matched_charactersFunction · 0.85

Tested by

no test coverage detected