MCPcopy
hub / github.com/leeoniya/uFuzzy

github.com/leeoniya/uFuzzy @1.0.19 sqlite

repository ↗ · DeepWiki ↗ · release 1.0.19 ↗
1,299 symbols 4,163 edges 31 files 13 documented · 1%
README

▒ μFuzzy

A tiny, efficient fuzzy search that doesn't suck. This is my fuzzy 🐈. There are many like it, but this one is mine.¹


Overview

uFuzzy is a fuzzy search library designed to match a relatively short search phrase (needle) against a large list of short-to-medium phrases (haystack). It might be best described as a more forgiving String.includes(). Common applications include list filtering, auto-complete/suggest, and searches for titles, names, descriptions, filenames, and functions.

In uFuzzy's default MultiInsert mode, each match must contain all alpha-numeric characters from the needle in the same sequence; in SingleError mode, single typos are tolerated in each term (Damerau–Levenshtein distance = 1). Its .search() API can efficiently match out-of-order terms, supports multiple substring exclusions (e.g. fruit -green -melon), and exact terms with non-alphanum chars (e.g. "C++", "$100", "#hashtag"). When held just right, it can efficiently match against multiple object properties, too.


Features

  • Junk-free, high quality results with any dataset. No need to fine-tune indexing options or boosting params to attain some arbitrary relevance score cut-off.
  • Precise fuzziness control that follows straightforward rules, without returning unexpected matches.
  • Sorting you can reason about and customize using a simple Array.sort() which gets access to each match's stats/counters. There's no composite, black box "score" to understand.
  • Concise set of options that don't interact in mysterious ways to drastically alter combined behavior.
  • Fast with low resource usage - there's no index to build, so startup is below 1ms with near-zero memory overhead. Searching a three-term phrase in a 162,000 phrase dataset takes 5ms with out-of-order terms.
  • Micro, with zero dependencies - currently ~7.5KB min

uFuzzy demo


Charsets, Alphabets, Diacritics

uFuzzy is optimized for the Latin/Roman alphabet and relies internally on non-unicode regular expressions.

Support for more languages works by augmenting the built-in Latin regexps with additional chars or by using the slower, universal {unicode: true} variant. A more simple, but less flexible {alpha: "..."} alternative replaces the A-Z and a-z parts of the built-in Latin regexps with chars of your choice (the letter case will be matched automatically during replacement).

The uFuzzy.latinize() util function may be used to strip common accents/diacritics from the haystack and needle prior to searching.

// Latin (default)
let opts = { alpha: "a-z" };
// OR
let opts = {
  // case-sensitive regexps
  interSplit: "[^A-Za-z\\d']+",
  intraSplit: "[a-z][A-Z]",
  intraBound: "[A-Za-z]\\d|\\d[A-Za-z]|[a-z][A-Z]",
  // case-insensitive regexps
  intraChars: "[a-z\\d']",
  intraContr: "'[a-z]{1,2}\\b",
};

// Latin + Norwegian
let opts = { alpha: "a-zæøå" };
// OR
let opts = {
  interSplit: "[^A-Za-zæøåÆØÅ\\d']+",
  intraSplit: "[a-zæøå][A-ZÆØÅ]",
  intraBound: "[A-Za-zæøåÆØÅ]\\d|\\d[A-Za-zæøåÆØÅ]|[a-zæøå][A-ZÆØÅ]",
  intraChars: "[a-zæøå\\d']",
  intraContr: "'[a-zæøå]{1,2}\\b",
};

// Latin + Russian
let opts = { alpha: "a-zа-яё" };
// OR
let opts = {
  interSplit: "[^A-Za-zА-ЯЁа-яё\\d']+",
  intraSplit: "[a-z][A-Z]|[а-яё][А-ЯЁ]",
  intraBound: "[A-Za-zА-ЯЁа-яё]\\d|\\d[A-Za-zА-ЯЁа-яё]|[a-z][A-Z]|[а-яё][А-ЯЁ]",
  intraChars: "[a-zа-яё\\d']",
  intraContr: "'[a-z]{1,2}\\b",
};

// Unicode / universal (50%-75% slower)
let opts = {
  unicode: true,
  interSplit: "[^\\p{L}\\d']+",
  intraSplit: "\\p{Ll}\\p{Lu}",
  intraBound: "\\p{L}\\d|\\d\\p{L}|\\p{Ll}\\p{Lu}",
  intraChars: "[\\p{L}\\d']",
  intraContr: "'\\p{L}{1,2}\\b",
};

All searches are currently case-insensitive; it is not possible to do a case-sensitive search.


Demos

NOTE: The testdata.json file is a diverse 162,000 string/phrase dataset 4MB in size, so first load may be slow due to network transfer. Try refreshing once it's been cached by your browser.

First, uFuzzy in isolation to demonstrate its performance.

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=super%20ma

Now the same comparison page, booted with fuzzysort, QuickScore, and Fuse.js:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,fuzzysort,QuickScore,Fuse&search=super%20ma

Here is the full library list but with a reduced dataset (just hearthstone_750, urls_and_titles_600) to avoid crashing your browser:

https://leeoniya.github.io/uFuzzy/demos/compare.html?lists=hearthstone_750,urls_and_titles_600&search=moo


Questions?

Answers:

  • https://news.ycombinator.com/item?id=33035580
  • https://old.reddit.com/r/javascript/comments/xtrszc/ufuzzyjs_a_tiny_efficient_fuzzy_search_that/

Else: https://github.com/leeoniya/uFuzzy/issues


Installation

Node

npm i @leeoniya/ufuzzy
const uFuzzy = require('@leeoniya/ufuzzy');

Browser

<script src="https://github.com/leeoniya/uFuzzy/raw/1.0.19/dist/uFuzzy.iife.min.js"></script>

Example

let haystack = [
    'puzzle',
    'Super Awesome Thing (now with stuff!)',
    'FileName.js',
    '/feeding/the/catPic.jpg',
];

let needle = 'feed cat';

let opts = {};

let uf = new uFuzzy(opts);

// pre-filter
let idxs = uf.filter(haystack, needle);

// idxs can be null when the needle is non-searchable (has no alpha-numeric chars)
if (idxs != null && idxs.length > 0) {
  // sort/rank only when <= 1,000 items
  let infoThresh = 1e3;

  if (idxs.length <= infoThresh) {
    let info = uf.info(idxs, haystack, needle);

    // order is a double-indirection array (a re-order of the passed-in idxs)
    // this allows corresponding info to be grabbed directly by idx, if needed
    let order = uf.sort(info, haystack, needle);

    // render post-filtered & ordered matches
    for (let i = 0; i < order.length; i++) {
      // using info.idx here instead of idxs because uf.info() may have
      // further reduced the initial idxs based on prefix/suffix rules
      console.log(haystack[info.idx[order[i]]]);
    }
  }
  else {
    // render pre-filtered but unordered matches
    for (let i = 0; i < idxs.length; i++) {
      console.log(haystack[idxs[i]]);
    }
  }
}

Integrated Search

uFuzzy provides a uf.search(haystack, needle, outOfOrder = 0, infoThresh = 1e3) => [idxs, info, order] wrapper which combines the filter, info, sort steps above. This method also implements efficient logic for matching search terms out of order and support for multiple substring exclusions, e.g. fruit -green -melon.


Match Highlighting

Get your ordered matches first:

let haystack = [
  'foo',
  'bar',
  'cowbaz',
];

let needle = 'ba';

let u = new uFuzzy();

let idxs = u.filter(haystack, needle);
let info = u.info(idxs, haystack, needle);
let order = u.sort(info, haystack, needle);

Basic innerHTML highlighter (<mark>-wrapped ranges):

let innerHTML = '';

for (let i = 0; i < order.length; i++) {
  let infoIdx = order[i];

  innerHTML += uFuzzy.highlight(
    haystack[info.idx[infoIdx]],
    info.ranges[infoIdx],
  ) + '

';
}

console.log(innerHTML);

innerHTML highlighter with custom marking function (<b>-wrapped ranges):

let innerHTML = '';

const mark = (part, matched) => matched ? '<b>' + part + '</b>' : part;

for (let i = 0; i < order.length; i++) {
  let infoIdx = order[i];

  innerHTML += uFuzzy.highlight(
    haystack[info.idx[infoIdx]],
    info.ranges[infoIdx],

    mark,
  ) + '

';
}

console.log(innerHTML);

DOM/JSX element highlighter with custom marking and append functions:

let domElems = [];

const mark = (part, matched) => {
  let el = matched ? document.createElement('mark') : document.createElement('span');
  el.textContent = part;
  return el;
};

const append = (accum, part) => { accum.push(part); };

for (let i = 0; i < order.length; i++) {
  let infoIdx = order[i];

  let matchEl = document.createElement('div');

  let parts = uFuzzy.highlight(
    haystack[info.idx[infoIdx]],
    info.ranges[infoIdx],

    mark,
    [],
    append,
  );

  matchEl.append(...parts);

  domElems.push(matchEl);
}

document.getElementById('matches').append(...domElems);

How It Works

uFuzzy has two operational modes which differ in matching strategy:

  • intraMode: 0 (default) requires all alpha-numeric characters in each search term to exist in the same sequence in all matches. For example, when searching for "cat", this mode is capable of matching the strings below. What is actually matched will depend on additonal fuzziness settings.
  • cat
  • coat
  • scratch
  • cantina
  • tractors are late
  • intraMode: 1 allows for a single error in each term of the search phrase, where an error is one of: substitution (replacement), transposition (swap), insertion (addition), or deletion (omission). The search strings with errors below can return matches containing "example". What is actually matched will depend on additonal fuzziness settings. In contrast to the previous mode, searching for "example" will never match "extra maple".
  • example - exact
  • examplle - single insertion (addition)
  • exemple - single substitution (replacement)
  • exmaple - single transposition (swap)
  • exmple - single deletion (omission)
  • xamp - partial
  • xmap - partial with transposition

There are 3 phases to a search:

  1. Filter filters the full haystack with a fast RegExp compiled from your needle without doing any extra ops. It returns an array of matched indices in original order.
  2. Info collects more detailed stats about the filtered matches, such as start offsets, fuzz level, prefix/suffix counters, etc. It also gathers substring match positions for range highlighting. Finally, it filters out any matches that don't conform to the desired prefix/suffix rules. To do all this it re-compiles the needle into two more-expensive RegExps that can partition each match. Therefore, it should be run on a reduced subset of the haystack, usually returned by the Filter phase. The uFuzzy demo is gated at <= 1,000 filtered items, before moving ahead with this phase.
  3. Sort does an Array.sort() to determine final result order, utilizing the info object returned from the previous phase. A custom sort function can be provided via a uFuzzy option: {sort: (info, haystack, needle) => idxsOrder}.

API

A liberally-commented 200 LoC uFuzzy.d.ts file.


Options

Options with an inter prefix apply to allowances in between search terms, while those with an intra prefix apply to allowances within each search term.

Option Description Default Examples
intraMode How term matching should be performed 0 0 MultiInsert 1 SingleError See How It Works
intraIns Max number of extra chars allowed between each char within a term Matches the value of intraMode (either 0 or 1) Searching "cat"... 0 can match: cat, scat, catch, vacate 1 also matches: cart, chapter, outcast
interIns Max number of extra chars allowed between terms Infinity Searching "where is"... Infinity can match: where is, where have blah wisdom 5 cannot match: where have blah wisdom
intraSub intraTrn intraDel For intraMode: 1 only, Error types to tolerate within terms Matches the value of intraMode (either 0 or 1) 0 No 1 Yes
intraChars Partial regexp for allowed insert chars between each char within a term [a-z\d

Core symbols most depended-on inside this repo

push
called by 427
demos/lib/search-index.js
n
called by 233
demos/lib/search-index.js
get
called by 158
demos/lib/search-index.js
set
called by 123
demos/lib/search-index.js
keys
called by 108
demos/lib/search-index.js
forEach
called by 98
demos/lib/search-index.js
next
called by 94
demos/lib/search-index.js
add
called by 92
demos/lib/search-index.js

Shape

Function 1,014
Method 231
Class 54

Languages

TypeScript100%

Modules by API surface

demos/lib/search-index.js315 symbols
demos/lib/itemsjs.min.js311 symbols
demos/lib/fuzzball.umd.min.js147 symbols
demos/lib/orama.iife.js137 symbols
demos/lib/fast-fuzzy.iife.min.js59 symbols
demos/lib/fzf.iife.min.js54 symbols
demos/lib/fuse.min.js44 symbols
demos/lib/FuzzySearch2.min.js31 symbols
demos/lib/minisearch.min.js26 symbols
demos/lib/quick-score.min.js23 symbols
src/uFuzzy.mjs16 symbols
demos/lib/m31-fuzzy-search.umd.js16 symbols

Dependencies from manifests, versioned

@rollup/plugin-terser0.4.4 · 1×
rollup4.47.1 · 1×

For agents

$ claude mcp add uFuzzy \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact