A tiny, efficient fuzzy search that doesn't suck. This is my fuzzy 🐈. There are many like it, but this one is mine.¹
uFuzzy is a fuzzy search library designed to match a relatively short search phrase (needle) against a large list of short-to-medium phrases (haystack). It might be best described as a more forgiving String.includes(). Common applications include list filtering, auto-complete/suggest, and searches for titles, names, descriptions, filenames, and functions.
In uFuzzy's default MultiInsert mode, each match must contain all alpha-numeric characters from the needle in the same sequence;
in SingleError mode, single typos are tolerated in each term (Damerau–Levenshtein distance = 1).
Its .search() API can efficiently match out-of-order terms, supports multiple substring exclusions (e.g. fruit -green -melon), and exact terms with non-alphanum chars (e.g. "C++", "$100", "#hashtag").
When held just right, it can efficiently match against multiple object properties, too.
Array.sort() which gets access to each match's stats/counters. There's no composite, black box "score" to understand.uFuzzy is optimized for the Latin/Roman alphabet and relies internally on non-unicode regular expressions.
Support for more languages works by augmenting the built-in Latin regexps with additional chars or by using the slower, universal {unicode: true} variant.
A more simple, but less flexible {alpha: "..."} alternative replaces the A-Z and a-z parts of the built-in Latin regexps with chars of your choice (the letter case will be matched automatically during replacement).
The uFuzzy.latinize() util function may be used to strip common accents/diacritics from the haystack and needle prior to searching.
// Latin (default)
let opts = { alpha: "a-z" };
// OR
let opts = {
// case-sensitive regexps
interSplit: "[^A-Za-z\\d']+",
intraSplit: "[a-z][A-Z]",
intraBound: "[A-Za-z]\\d|\\d[A-Za-z]|[a-z][A-Z]",
// case-insensitive regexps
intraChars: "[a-z\\d']",
intraContr: "'[a-z]{1,2}\\b",
};
// Latin + Norwegian
let opts = { alpha: "a-zæøå" };
// OR
let opts = {
interSplit: "[^A-Za-zæøåÆØÅ\\d']+",
intraSplit: "[a-zæøå][A-ZÆØÅ]",
intraBound: "[A-Za-zæøåÆØÅ]\\d|\\d[A-Za-zæøåÆØÅ]|[a-zæøå][A-ZÆØÅ]",
intraChars: "[a-zæøå\\d']",
intraContr: "'[a-zæøå]{1,2}\\b",
};
// Latin + Russian
let opts = { alpha: "a-zа-яё" };
// OR
let opts = {
interSplit: "[^A-Za-zА-ЯЁа-яё\\d']+",
intraSplit: "[a-z][A-Z]|[а-яё][А-ЯЁ]",
intraBound: "[A-Za-zА-ЯЁа-яё]\\d|\\d[A-Za-zА-ЯЁа-яё]|[a-z][A-Z]|[а-яё][А-ЯЁ]",
intraChars: "[a-zа-яё\\d']",
intraContr: "'[a-z]{1,2}\\b",
};
// Unicode / universal (50%-75% slower)
let opts = {
unicode: true,
interSplit: "[^\\p{L}\\d']+",
intraSplit: "\\p{Ll}\\p{Lu}",
intraBound: "\\p{L}\\d|\\d\\p{L}|\\p{Ll}\\p{Lu}",
intraChars: "[\\p{L}\\d']",
intraContr: "'\\p{L}{1,2}\\b",
};
All searches are currently case-insensitive; it is not possible to do a case-sensitive search.
NOTE: The testdata.json file is a diverse 162,000 string/phrase dataset 4MB in size, so first load may be slow due to network transfer. Try refreshing once it's been cached by your browser.
First, uFuzzy in isolation to demonstrate its performance.
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=super%20ma
Now the same comparison page, booted with fuzzysort, QuickScore, and Fuse.js:
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,fuzzysort,QuickScore,Fuse&search=super%20ma
Here is the full library list but with a reduced dataset (just hearthstone_750, urls_and_titles_600) to avoid crashing your browser:
https://leeoniya.github.io/uFuzzy/demos/compare.html?lists=hearthstone_750,urls_and_titles_600&search=moo
Answers:
Else: https://github.com/leeoniya/uFuzzy/issues
npm i @leeoniya/ufuzzy
const uFuzzy = require('@leeoniya/ufuzzy');
<script src="https://github.com/leeoniya/uFuzzy/raw/1.0.19/dist/uFuzzy.iife.min.js"></script>
let haystack = [
'puzzle',
'Super Awesome Thing (now with stuff!)',
'FileName.js',
'/feeding/the/catPic.jpg',
];
let needle = 'feed cat';
let opts = {};
let uf = new uFuzzy(opts);
// pre-filter
let idxs = uf.filter(haystack, needle);
// idxs can be null when the needle is non-searchable (has no alpha-numeric chars)
if (idxs != null && idxs.length > 0) {
// sort/rank only when <= 1,000 items
let infoThresh = 1e3;
if (idxs.length <= infoThresh) {
let info = uf.info(idxs, haystack, needle);
// order is a double-indirection array (a re-order of the passed-in idxs)
// this allows corresponding info to be grabbed directly by idx, if needed
let order = uf.sort(info, haystack, needle);
// render post-filtered & ordered matches
for (let i = 0; i < order.length; i++) {
// using info.idx here instead of idxs because uf.info() may have
// further reduced the initial idxs based on prefix/suffix rules
console.log(haystack[info.idx[order[i]]]);
}
}
else {
// render pre-filtered but unordered matches
for (let i = 0; i < idxs.length; i++) {
console.log(haystack[idxs[i]]);
}
}
}
uFuzzy provides a uf.search(haystack, needle, outOfOrder = 0, infoThresh = 1e3) => [idxs, info, order] wrapper which combines the filter, info, sort steps above.
This method also implements efficient logic for matching search terms out of order and support for multiple substring exclusions, e.g. fruit -green -melon.
Get your ordered matches first:
let haystack = [
'foo',
'bar',
'cowbaz',
];
let needle = 'ba';
let u = new uFuzzy();
let idxs = u.filter(haystack, needle);
let info = u.info(idxs, haystack, needle);
let order = u.sort(info, haystack, needle);
Basic innerHTML highlighter (<mark>-wrapped ranges):
let innerHTML = '';
for (let i = 0; i < order.length; i++) {
let infoIdx = order[i];
innerHTML += uFuzzy.highlight(
haystack[info.idx[infoIdx]],
info.ranges[infoIdx],
) + '
';
}
console.log(innerHTML);
innerHTML highlighter with custom marking function (<b>-wrapped ranges):
let innerHTML = '';
const mark = (part, matched) => matched ? '<b>' + part + '</b>' : part;
for (let i = 0; i < order.length; i++) {
let infoIdx = order[i];
innerHTML += uFuzzy.highlight(
haystack[info.idx[infoIdx]],
info.ranges[infoIdx],
mark,
) + '
';
}
console.log(innerHTML);
DOM/JSX element highlighter with custom marking and append functions:
let domElems = [];
const mark = (part, matched) => {
let el = matched ? document.createElement('mark') : document.createElement('span');
el.textContent = part;
return el;
};
const append = (accum, part) => { accum.push(part); };
for (let i = 0; i < order.length; i++) {
let infoIdx = order[i];
let matchEl = document.createElement('div');
let parts = uFuzzy.highlight(
haystack[info.idx[infoIdx]],
info.ranges[infoIdx],
mark,
[],
append,
);
matchEl.append(...parts);
domElems.push(matchEl);
}
document.getElementById('matches').append(...domElems);
uFuzzy has two operational modes which differ in matching strategy:
example - exactexamplle - single insertion (addition)exemple - single substitution (replacement)exmaple - single transposition (swap)exmple - single deletion (omission)xamp - partialxmap - partial with transpositionThere are 3 phases to a search:
haystack with a fast RegExp compiled from your needle without doing any extra ops. It returns an array of matched indices in original order.needle into two more-expensive RegExps that can partition each match. Therefore, it should be run on a reduced subset of the haystack, usually returned by the Filter phase. The uFuzzy demo is gated at <= 1,000 filtered items, before moving ahead with this phase.Array.sort() to determine final result order, utilizing the info object returned from the previous phase. A custom sort function can be provided via a uFuzzy option: {sort: (info, haystack, needle) => idxsOrder}.A liberally-commented 200 LoC uFuzzy.d.ts file.
Options with an inter prefix apply to allowances in between search terms, while those with an intra prefix apply to allowances within each search term.
| Option | Description | Default | Examples |
|---|---|---|---|
intraMode |
How term matching should be performed | 0 |
0 MultiInsert
1 SingleError
See How It Works
|
intraIns |
Max number of extra chars allowed between each char within a term | Matches the value of intraMode (either 0 or 1) |
Searching "cat"...
0 can match: cat, scat, catch, vacate
1 also matches: cart, chapter, outcast
|
interIns |
Max number of extra chars allowed between terms | Infinity |
Searching "where is"...
Infinity can match: where is, where have blah wisdom
5 cannot match: where have blah wisdom
|
intraSub
intraTrn
intraDel
|
For intraMode: 1 only,
Error types to tolerate within terms
|
Matches the value of intraMode (either 0 or 1) |
0 No
1 Yes
|
intraChars |
Partial regexp for allowed insert chars between each char within a term | [a-z\d |
$ claude mcp add uFuzzy \
-- python -m otcore.mcp_server <graph>