MCPcopy Index your code
hub / github.com/clips/pattern / dice_coefficient

Function dice_coefficient

pattern/metrics.py:276–285  ·  view source on GitHub ↗

Returns the similarity between string1 and string1 as a number between 0.0 and 1.0, based on the number of shared bigrams, e.g., "night" and "nacht" have one common bigram "ht".

(string1, string2)

Source from the content-addressed store, hash-verified

274 return 1 - levenshtein(string1, string2) / float(max(len(string1), len(string2), 1.0))
275
276def dice_coefficient(string1, string2):
277 """ Returns the similarity between string1 and string1 as a number between 0.0 and 1.0,
278 based on the number of shared bigrams, e.g., "night" and "nacht" have one common bigram "ht".
279 """
280 def bigrams(s):
281 return set(s[i:i+2] for i in range(len(s)-1))
282 nx = bigrams(string1)
283 ny = bigrams(string2)
284 nt = nx.intersection(ny)
285 return 2.0 * len(nt) / ((len(nx) + len(ny)) or 1)
286
287LEVENSHTEIN, DICE = "levenshtein", "dice"
288def similarity(string1, string2, metric=LEVENSHTEIN):

Callers 1

similarityFunction · 0.85

Calls 2

bigramsFunction · 0.85
lenFunction · 0.85

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…