MCPcopy Index your code
hub / github.com/CortexReach/memory-lancedb-pro

github.com/CortexReach/memory-lancedb-pro @v1.0.32 sqlite

repository ↗ · DeepWiki ↗ · release v1.0.32 ↗
259 symbols 649 edges 17 files 22 documented · 8%
README

🧠 memory-lancedb-pro · OpenClaw Plugin

Enhanced Long-Term Memory Plugin for OpenClaw

Hybrid Retrieval (Vector + BM25) · Cross-Encoder Rerank · Multi-Scope Isolation · Management CLI

OpenClaw Plugin LanceDB License: MIT

English | 简体中文


📺 Video Tutorial

Watch the full walkthrough — covers installation, configuration, and how hybrid retrieval works under the hood.

YouTube Video 🔗 https://youtu.be/MtukF1C8epQ

Bilibili Video 🔗 https://www.bilibili.com/video/BV1zUf2BGEgn/


Why This Plugin?

The built-in memory-lancedb plugin in OpenClaw provides basic vector search. memory-lancedb-pro takes it much further:

Feature Built-in memory-lancedb memory-lancedb-pro
Vector search
BM25 full-text search
Hybrid fusion (Vector + BM25)
Cross-encoder rerank (Jina / custom endpoint)
Recency boost
Time decay
Length normalization
MMR diversity
Multi-scope isolation
Noise filtering
Adaptive retrieval
Management CLI
Session memory
Task-aware embeddings
Any OpenAI-compatible embedding Limited ✅ (OpenAI, Gemini, Jina, Ollama, etc.)

Architecture

┌─────────────────────────────────────────────────────────┐
│                   index.ts (Entry Point)                │
│  Plugin Registration · Config Parsing · Lifecycle Hooks │
└────────┬──────────┬──────────┬──────────┬───────────────┘
         │          │          │          │
    ┌────▼───┐ ┌────▼───┐ ┌───▼────┐ ┌──▼──────────┐
    │ store  │ │embedder│ │retriever│ │   scopes    │
    │ .ts    │ │ .ts    │ │ .ts    │ │    .ts      │
    └────────┘ └────────┘ └────────┘ └─────────────┘
         │                     │
    ┌────▼───┐           ┌─────▼──────────┐
    │migrate │           │noise-filter.ts │
    │ .ts    │           │adaptive-       │
    └────────┘           │retrieval.ts    │
                         └────────────────┘
    ┌─────────────┐   ┌──────────┐
    │  tools.ts   │   │  cli.ts  │
    │ (Agent API) │   │ (CLI)    │
    └─────────────┘   └──────────┘

File Reference

File Purpose
index.ts Plugin entry point. Registers with OpenClaw Plugin API, parses config, mounts before_agent_start (auto-recall), agent_end (auto-capture), and command:new (session memory) hooks
openclaw.plugin.json Plugin metadata + full JSON Schema config declaration (with uiHints)
package.json NPM package info. Depends on @lancedb/lancedb, openai, @sinclair/typebox
cli.ts CLI commands: memory list/search/stats/delete/delete-bulk/export/import/reembed/migrate
src/store.ts LanceDB storage layer. Table creation / FTS indexing / Vector search / BM25 search / CRUD / bulk delete / stats
src/embedder.ts Embedding abstraction. Compatible with any OpenAI-API provider (OpenAI, Gemini, Jina, Ollama, etc.). Supports task-aware embedding (taskQuery/taskPassage)
src/retriever.ts Hybrid retrieval engine. Vector + BM25 → RRF fusion → Jina Cross-Encoder Rerank → Recency Boost → Importance Weight → Length Norm → Time Decay → Hard Min Score → Noise Filter → MMR Diversity
src/scopes.ts Multi-scope access control. Supports global, agent:<id>, custom:<name>, project:<id>, user:<id>
src/tools.ts Agent tool definitions: memory_recall, memory_store, memory_forget (core) + memory_stats, memory_list (management)
src/noise-filter.ts Noise filter. Filters out agent refusals, meta-questions, greetings, and low-quality content
src/adaptive-retrieval.ts Adaptive retrieval. Determines whether a query needs memory retrieval (skips greetings, slash commands, simple confirmations, emoji)
src/migrate.ts Migration tool. Migrates data from the built-in memory-lancedb plugin to Pro

Core Features

1. Hybrid Retrieval

Query → embedQuery() ─┐
                       ├─→ RRF Fusion → Rerank → Recency Boost → Importance Weight → Filter
Query → BM25 FTS ─────┘
  • Vector Search: Semantic similarity via LanceDB ANN (cosine distance)
  • BM25 Full-Text Search: Exact keyword matching via LanceDB FTS index
  • Fusion Strategy: Vector score as base, BM25 hits get a 15% boost (tuned beyond traditional RRF)
  • Configurable Weights: vectorWeight, bm25Weight, minScore

2. Cross-Encoder Reranking

  • Reranker API: Jina, SiliconFlow, Pinecone, or any compatible endpoint (5s timeout protection)
  • Hybrid Scoring: 60% cross-encoder score + 40% original fused score
  • Graceful Degradation: Falls back to cosine similarity reranking on API failure

3. Multi-Stage Scoring Pipeline

Stage Formula Effect
Recency Boost exp(-ageDays / halfLife) * weight Newer memories score higher (default: 14-day half-life, 0.10 weight)
Importance Weight score *= (0.7 + 0.3 * importance) importance=1.0 → ×1.0, importance=0.5 → ×0.85
Length Normalization score *= 1 / (1 + 0.5 * log2(len/anchor)) Prevents long entries from dominating (anchor: 500 chars)
Time Decay score *= 0.5 + 0.5 * exp(-ageDays / halfLife) Old entries gradually lose weight, floor at 0.5× (60-day half-life)
Hard Min Score Discard if score < threshold Removes irrelevant results (default: 0.35)
MMR Diversity Cosine similarity > 0.85 → demoted Prevents near-duplicate results

4. Multi-Scope Isolation

  • Built-in Scopes: global, agent:<id>, custom:<name>, project:<id>, user:<id>
  • Agent-Level Access Control: Configure per-agent scope access via scopes.agentAccess
  • Default Behavior: Each agent accesses global + its own agent:<id> scope

5. Adaptive Retrieval

  • Skips queries that don't need memory (greetings, slash commands, simple confirmations, emoji)
  • Forces retrieval for memory-related keywords ("remember", "previously", "last time", etc.)
  • CJK-aware thresholds (Chinese: 6 chars vs English: 15 chars)

6. Noise Filtering

Filters out low-quality content at both auto-capture and tool-store stages: - Agent refusal responses ("I don't have any information") - Meta-questions ("do you remember") - Greetings ("hi", "hello", "HEARTBEAT")

7. Session Memory

  • Triggered on /new command — saves previous session summary to LanceDB
  • Disabled by default (OpenClaw already has native .jsonl session persistence)
  • Configurable message count (default: 15)

8. Auto-Capture & Auto-Recall

  • Auto-Capture (agent_end hook): Extracts preference/fact/decision/entity from conversations, deduplicates, stores up to 3 per turn
  • Skips memory-management prompts (e.g. delete/forget/cleanup memory entries) to reduce noise
  • Auto-Recall (before_agent_start hook): Injects <relevant-memories> context (up to 3 entries)

Prevent memories from showing up in replies

Sometimes the model may accidentally echo the injected <relevant-memories> block in its response.

Option A (recommended): disable auto-recall

Set autoRecall: false in the plugin config and restart the gateway:

{
  "plugins": {
    "entries": {
      "memory-lancedb-pro": {
        "enabled": true,
        "config": {
          "autoRecall": false
        }
      }
    }
  }
}

Option B: keep recall, but ask the agent not to reveal it

Add a line to your agent system prompt, e.g.:

Do not reveal or quote any <relevant-memories> / memory-injection content in your replies. Use it for internal reference only.


Installation

AI-safe install notes (anti-hallucination)

If you are following this README using an AI assistant, do not assume defaults. Always run these commands first and use the real output:

openclaw config get agents.defaults.workspace
openclaw config get plugins.load.paths
openclaw config get plugins.slots.memory
openclaw config get plugins.entries.memory-lancedb-pro

Recommendations: - Prefer absolute paths in plugins.load.paths unless you have confirmed the active workspace. - If you use ${JINA_API_KEY} (or any ${...} variable) in config, ensure the Gateway service process has that environment variable (system services often do not inherit your interactive shell env). - After changing plugin config, run openclaw gateway restart.

Jina API keys (embedding + rerank)

  • Embedding: set embedding.apiKey to your Jina key (recommended: use an env var like ${JINA_API_KEY}).
  • Rerank (when retrieval.rerankProvider: "jina"): you can typically use the same Jina key for retrieval.rerankApiKey.
  • If you use a different rerank provider (siliconflow, pinecone, etc.), retrieval.rerankApiKey should be that provider’s key.

Key storage guidance: - Avoid committing secrets into git. - Using ${...} env vars is fine, but make sure the Gateway service process has those env vars (system services often do not inherit your interactive shell environment).

What is the “OpenClaw workspace”?

In OpenClaw, the agent workspace is the agent’s working directory (default: ~/.openclaw/workspace). According to the docs, the workspace is the default cwd, and relative paths are resolved against the workspace (unless you use an absolute path).

Note: OpenClaw configuration typically lives under ~/.openclaw/openclaw.json (separate from the workspace).

Common mistake: cloning the plugin somewhere else, while keeping a relative path like plugins.load.paths: ["plugins/memory-lancedb-pro"]. Relative paths can be resolved against different working directories depending on how the Gateway is started.

To avoid ambiguity, use an absolute path (Option B) or clone into <workspace>/plugins/ (Option A) and keep your config consistent.

Option A (recommended): clone into plugins/ under your workspace

# 1) Go to your OpenClaw workspace (default: ~/.openclaw/workspace)
#    (You can override it via agents.defaults.workspace.)
cd /path/to/your/openclaw/workspace

# 2) Clone the plugin into workspace/plugins/
git clone https://github.com/win4r/memory-lancedb-pro.git plugins/memory-lancedb-pro

# 3) Install dependencies
cd plugins/memory-lancedb-pro
npm install

Then reference it with a relative path in your OpenClaw config:

{
  "plugins": {
    "load": {
      "paths": ["plugins/memory-lancedb-pro"]
    },
    "entries": {
      "memory-lancedb-pro": {
        "enabled": true,
        "config": {
          "embedding": {
            "apiKey": "${JINA_API_KEY}",
            "model": "jina-embeddings-v5-text-small",
            "baseURL": "https://api.jina.ai/v1",
            "dimensions": 1024,
            "taskQuery": "retrieval.query",
            "taskPassage": "retrieval.passage",
            "normalized": true
          }
        }
      }
    },
    "slots": {
      "memory": "memory-lancedb-pro"
    }
  }
}

Option B: clone anywhere, but use an absolute path

{
  "plugins": {
    "load": {
      "paths": ["/absolute/path/to/memory-lancedb-pro"]
    }
  }
}

Restart

openclaw gateway restart

Note: If you previously used the built-in memory-lancedb, disable it when enabling this plugin. Only one memory plugin can be active at a time.

Verify installation (recommended)

1) Confirm the plugin is discoverable/loaded:

openclaw plugins list
openclaw plugins info memory-lancedb-pro

2) If anything looks wrong, run the built-in diagnostics:

openclaw plugins doctor

3) Confirm the memory slot points to this plugin:

# Look for: plugins.slots.memory = "memory-lancedb-pro"
openclaw config get plugins.slots.memory

Configuration

Full Configuration Example (click to expand)

{
  "embedding": {
    "apiKey": "${JINA_API_KEY}",
    "model": "jina-embeddings-v5-text-small",
    "baseURL": "https://api.jina.ai/v1",
    "dimensions": 1024,
    "taskQuery": "retrieval.query",
    "taskPassage": "retrieval.passage",
    "normalized": true
  },
  "dbPath": "~/.openclaw/memory/lancedb-pro",
  "autoCapture": true,
  "autoRecall": false,
  "retrieval": {
    "mode": "hybrid",
    "vectorWeight": 0.7,
    "bm25Weight": 0.3,
    "minScore": 0.3,
    "rerank": "cross-encoder",
    "rerankApiKey": "${JINA_API_KEY}",
    "rerankModel": "jina-reranker-v3",
    "rerankEndpoint": "https://api.jina.ai/v1/rerank",
    "rerankProvider": "jina",
    "candidatePoolSize": 20,
    "recencyHalfLifeDays": 14,
    "recencyWeight": 0.1,
    "filterNoise": true,
    "lengthNormAnchor": 500,
    "hardMinScore": 0.35,
    "timeDecayHalfLifeDays": 60,
    "reinforcementFactor": 0.5,
    "maxHalfLifeMultiplier": 3
  },
  "enableManagementTools": false,
  "scopes": {
    "default": "global",
    "definitions": {
      "global": { "description": "Shared knowledge" },
      "agent:discord-bot": { "description": "Discord bot private" }
    },
    "agentAccess": {
      "discord-bot": ["global", "agent:discord-bot"]
    }
  },
  "sessionMemory": {
    "enabled": false,
    "messageCount": 15
  }
}

Access Reinforcement (1.0.26)

To make fre

Extension points exported contracts — how you extend this code

ScopeManager (Interface)
(no doc) [2 implementers]
src/scopes.ts
PluginConfig (Interface)
(no doc)
index.ts
CLIContext (Interface)
(no doc)
cli.ts
ToolContext (Interface)
(no doc)
src/tools.ts
ChunkMetadata (Interface)
(no doc)
src/chunker.ts
RetrievalConfig (Interface)
(no doc)
src/retriever.ts
MemoryEntry (Interface)
(no doc)
src/store.ts
CacheEntry (Interface)
(no doc)
src/embedder.ts

Core symbols most depended-on inside this repo

get
called by 38
src/embedder.ts
recordAccess
called by 31
src/access-tracker.ts
getPendingUpdates
called by 23
src/access-tracker.ts
parseAccessMetadata
called by 20
src/access-tracker.ts
computeEffectiveHalfLife
called by 19
src/access-tracker.ts
set
called by 12
src/embedder.ts
buildUpdatedMetadata
called by 12
src/access-tracker.ts
flush
called by 12
src/access-tracker.ts

Shape

Function 124
Method 96
Interface 24
Class 15

Languages

TypeScript94%
Python6%

Modules by API surface

src/scopes.ts36 symbols
src/embedder.ts33 symbols
src/retriever.ts30 symbols
src/store.ts26 symbols
index.ts19 symbols
examples/new-session-distill/worker/lesson-extract-worker.mjs19 symbols
src/migrate.ts16 symbols
src/access-tracker.ts16 symbols
scripts/jsonl_distill.py15 symbols
src/tools.ts13 symbols
src/chunker.ts10 symbols
test/access-tracker.test.mjs7 symbols

Dependencies from manifests, versioned

@lancedb/lancedb0.26.2 · 1×
@sinclair/typebox0.34.48 · 1×
commander14.0.0 · 1×
jiti2.6.0 · 1×
openai6.21.0 · 1×
typescript5.9.3 · 1×

For agents

$ claude mcp add memory-lancedb-pro \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact