Returns the similarity between string1 and string1 as a number between 0.0 and 1.0, based on the number of shared bigrams, e.g., "night" and "nacht" have one common bigram "ht".
(string1, string2)
| 274 | return 1 - levenshtein(string1, string2) / float(max(len(string1), len(string2), 1.0)) |
| 275 | |
| 276 | def dice_coefficient(string1, string2): |
| 277 | """ Returns the similarity between string1 and string1 as a number between 0.0 and 1.0, |
| 278 | based on the number of shared bigrams, e.g., "night" and "nacht" have one common bigram "ht". |
| 279 | """ |
| 280 | def bigrams(s): |
| 281 | return set(s[i:i+2] for i in range(len(s)-1)) |
| 282 | nx = bigrams(string1) |
| 283 | ny = bigrams(string2) |
| 284 | nt = nx.intersection(ny) |
| 285 | return 2.0 * len(nt) / ((len(nx) + len(ny)) or 1) |
| 286 | |
| 287 | LEVENSHTEIN, DICE = "levenshtein", "dice" |
| 288 | def similarity(string1, string2, metric=LEVENSHTEIN): |
no test coverage detected
searching dependent graphs…