hub / github.com/clips/pattern / dice_coefficient

Function dice_coefficient

pattern/metrics.py:276–285 · view source on GitHub ↗

Returns the similarity between string1 and string1 as a number between 0.0 and 1.0, based on the number of shared bigrams, e.g., "night" and "nacht" have one common bigram "ht".

(string1, string2)

Source from the content-addressed store, hash-verified

274	return 1 - levenshtein(string1, string2) / float(max(len(string1), len(string2), 1.0))
275
276	def dice_coefficient(string1, string2):
277	""" Returns the similarity between string1 and string1 as a number between 0.0 and 1.0,
278	based on the number of shared bigrams, e.g., "night" and "nacht" have one common bigram "ht".
279	"""
280	def bigrams(s):
281	return set(s[i:i+2] for i in range(len(s)-1))
282	nx = bigrams(string1)
283	ny = bigrams(string2)
284	nt = nx.intersection(ny)
285	return 2.0 * len(nt) / ((len(nx) + len(ny)) or 1)
286
287	LEVENSHTEIN, DICE = "levenshtein", "dice"
288	def similarity(string1, string2, metric=LEVENSHTEIN):

Callers 1

similarityFunction · 0.85

Calls 2

bigramsFunction · 0.85

lenFunction · 0.85

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…