MCPcopy Index your code
hub / github.com/LearningCircuit/local-deep-research

github.com/LearningCircuit/local-deep-research @v1.8.1 sqlite

repository ↗ · DeepWiki ↗ · release v1.8.1 ↗
54,390 symbols 179,912 edges 2,486 files 36,996 documented · 68%
README

Local Deep Research

GitHub stars Docker Pulls PyPI Downloads

Trendshift

Commits Last Commit

SimpleQA Accuracy SQLCipher

OpenSSF Scorecard CodeQL Semgrep

🔧 Pre-commit

🐳 Docker Publish 📦 PyPI Publish

Discord Reddit YouTube

AI-powered research assistant for deep, agentic research

Performs deep, agentic research using multiple LLMs and search engines with proper citations

🧪 First open-source project — fully-local on a single RTX 3090 (Qwen3.6-27B) — to report ~95% SimpleQA (n=500) and 77% xbench-DeepSearch (n=100) on local hardware. See the r/LocalLLaMA announcement and the benchmark dataset.

▶️ Watch Review by The Art Of The Terminal

🚀 What is Local Deep Research?

AI research assistant you control. Run locally for privacy, use any LLM and build your own searchable knowledge base. You own your data and see exactly how it works.

⚡ Quick Start

CPU requirement (x86-64): an AVX-capable CPU — Intel Sandy Bridge / AMD Bulldozer (2011) or newer. Several scientific Python dependencies (pandas, scikit-learn) ship wheels that crash with Illegal instruction on older CPUs. ARM64 (aarch64) is fully supported. Every release is smoke-tested against this floor, including AVX-without-AVX2 CPUs (#4480).

Option 1: Docker Run (Linux)

# Step 1: Pull and run Ollama
docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull gpt-oss:20b

# Step 2: Pull and run SearXNG for optimal search results
docker run -d -p 8080:8080 --name searxng searxng/searxng

# Step 3: Pull and run Local Deep Research
docker run -d -p 5000:5000 --network host \
  --name local-deep-research \
  --volume "deep-research:/data" \
  -e LDR_DATA_DIR=/data \
  localdeepresearch/local-deep-research

Mac / Windows / WSL2 users: --network host only works on native Linux. On Docker Desktop it silently fails to publish port 5000 and leaves localhost pointing at the LDR container itself (so it can't reach Ollama/SearXNG). Use Option 2 below, or see the Windows/WSL2 FAQ entry for a working docker run recipe.

Option 2: Docker Compose

CPU-only (all platforms):

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml && docker compose up -d

With NVIDIA GPU (Linux):

curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.yml && \
curl -O https://raw.githubusercontent.com/LearningCircuit/local-deep-research/main/docker-compose.gpu.override.yml && \
docker compose -f docker-compose.yml -f docker-compose.gpu.override.yml up -d

Open http://localhost:5000 after ~30 seconds. For GPU setup, environment variables, and more, see the Docker Compose Guide.

Option 3: pip install

pip install local-deep-research
python -m local_deep_research.web.app   # starts the web UI on http://localhost:5000

You'll also need Ollama (or any OpenAI-compatible LLM endpoint) and SearXNG running — see the pip install guide for the full recipe. Works on Windows, macOS, and Linux. SQLCipher encryption is included via pre-built wheels — no compilation needed. PDF export on Windows requires Pango (setup guide). If you encounter issues with encryption, set export LDR_BOOTSTRAP_ALLOW_UNENCRYPTED=true to use standard SQLite instead.

Detailed install guides: Docker · Docker Compose · pip · Unraid · full install reference

🏗️ How It Works

Research

You ask a complex question. LDR: - Does the research for you automatically - Searches across web, academic papers, and your own documents - Synthesizes everything into a report with proper citations

Choose from 20+ research strategies for quick facts, deep analysis, or academic research.

LangGraph Agent Strategy — An autonomous agentic research mode where the LLM decides what to search, which specialized engines to use (arXiv, PubMed, Semantic Scholar, etc.), and when to synthesize. It adaptively switches between search engines based on what it finds and collects significantly more sources than pipeline-based strategies — this is the strategy behind the ~95% SimpleQA result above. Select langgraph-agent in Settings.

Build Your Knowledge Base

flowchart LR
    R[Research] --> D[Download Sources]
    D --> L[(Library)]
    L --> I[Index & Embed]
    I --> S[Search Your Docs]
    S -.-> R

Every research session finds valuable sources. Download them directly into your encrypted library—academic papers from ArXiv, PubMed articles, web pages. LDR extracts text, indexes everything, and makes it searchable. Next time you research, ask questions across your own documents and the live web together. Your knowledge compounds over time.

🛡️ Security

DevSkim Bearer

OSV-Scanner npm-audit Retire.js

Container Security Dockle Hadolint Checkov

Zizmor OWASP ZAP Security Tests

flowchart LR
    U1[User A] --> D1[(Encrypted DB)]
    U2[User B] --> D2[(Encrypted DB)]

Your data stays yours. Each user gets their own isolated SQLCipher database encrypted with AES-256, with the key derived from your password. Your password is never stored — login works by attempting to decrypt your database, so the database files on their own are unusable to anyone who obtains them. Per-user LLM API keys live encrypted inside the same personal database rather than in a shared server-level store.

The Docker setup ships with cap_drop: ALL, no-new-privileges, and a non-root runtime, with images pinned by digest. Or run fully local with Ollama + SearXNG and nothing ever leaves your machine.

In-memory credentials: Like all applications that use secrets at runtime — including password managers, browsers, and API clients — credentials are held in plain text in process memory during active sessions. This is an industry-wide accepted reality, not specific to LDR: if an attacker can read process memory, they can also read any in-process decryption key. We mitigate this with session-scoped credential lifetimes and core dump exclusion. Ideas for further improvements are always welcome via GitHub Issues. See our Security Policy for details.

Supply Chain Security: Docker images are signed with Cosign using GitHub's keyless OIDC flow, include SLSA provenance attestations, and ship with attested SPDX SBOMs. Verify the image and its SBOM before running:

```bash

1. Verify image signature

cosign verify \ --certificate-identity-regexp "^https://github.com/LearningCircuit/local-deep-research/.github/workflows/prerelease-docker.yml@.*$" \ --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \ --certificate-github-workflow-repository "LearningCircuit/local-deep-research" \ localdeepresearch/local-deep-research:latest

2. Verify SBOM attestation (SPDX JSON) for YOUR platform

SBOM attestations are stored per-architecture (amd64, arm64) on the

per-arch image digest, not on the multi-arch manifest list. Resolve to

your platform's digest first.

ARCH=$(uname -m | sed -e 's/^x86_64$/amd64/' -e 's/^aarch64$/arm64/') PLATFORM_DIGEST=$(docker buildx imagetools inspect localdeepresearch/local-deep-research:latest --raw \ | jq -r --arg arch "$ARCH" '.manifests[] | select(.platform.architecture==$arch) | .digest') if [ -z "$PLATFORM_DIGEST" ]; then echo "No per-arch digest found for $ARCH — image may be single-arch or" \ "from a pre-build-once-promote release. Skip step 2 in that case." exit 1 fi cosign verify-attestation \ --type spdxjson \ --certi

Core symbols most depended-on inside this repo

get
called by 1561
src/local_deep_research/llm/llm_registry.py
get
called by 1250
src/local_deep_research/web_search_engines/retriever_registry.py
add
called by 621
tests/ui_tests/test_lib/test_results.js
skip
called by 521
tests/ui_tests/test_lib/test_results.js
navigateTo
called by 339
tests/ui_tests/test_lib/test_utils.js
to_bool
called by 323
src/local_deep_research/utilities/type_utils.py
get_user_db_session
called by 302
src/local_deep_research/database/session_context.py
_auth_client
called by 259
tests/research_library/routes/_route_helpers_rag.py

Shape

Method 37,628
Class 8,976
Function 5,673
Route 2,113

Languages

Python96%
TypeScript4%

Modules by API surface

tests/news/test_base_recommender.py557 symbols
tests/database/test_alembic_migrations.py290 symbols
tests/database/backup/test_backup_service.py236 symbols
tests/security/test_ssrf_validator.py199 symbols
tests/web/routes/test_metrics_routes_coverage.py188 symbols
tests/security/test_egress_policy.py180 symbols
tests/news/test_scheduler.py171 symbols
tests/research_library/routes/test_rag_routes.py168 symbols
tests/settings/test_settings_manager.py165 symbols
tests/security/test_url_validator_extended.py165 symbols
tests/research_library/services/test_download_service_coverage.py165 symbols
tests/web/routes/test_settings_routes_coverage.py158 symbols

Dependencies from manifests, versioned

@fortawesome/fontawesome-free7.3.0 · 1×
@lhci/cli0.15.1 · 1×
@playwright/test1.61.1 · 1×
@types/jest30.0.0 · 1×
@vitest/coverage-v84.1.7 · 1×
bootstrap5.3.8 · 1×
bootstrap-icons1.13.1 · 1×
chai6.2.2 · 1×
chart.js4.5.1 · 1×
chartjs-adapter-date-fns3.0.0 · 1×
chartjs-plugin-annotation3.1.0 · 1×

Datastores touched

(mongodb)Database · 1 repos
(mysql)Database · 1 repos
dbDatabase · 1 repos
newsDatabase · 1 repos

For agents

$ claude mcp add local-deep-research \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact