MCPcopy
hub / github.com/headroomlabs-ai/headroom

github.com/headroomlabs-ai/headroom @v0.28.0 sqlite

repository ↗ · DeepWiki ↗ · release v0.28.0 ↗
18,281 symbols 74,093 edges 1,126 files 10,446 documented · 57%
README
  ██╗  ██╗███████╗ █████╗ ██████╗ ██████╗  ██████╗  ██████╗ ███╗   ███╗
  ██║  ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║
  ███████║█████╗  ███████║██║  ██║██████╔╝██║   ██║██║   ██║██╔████╔██║
  ██╔══██║██╔══╝  ██╔══██║██║  ██║██╔══██╗██║   ██║██║   ██║██║╚██╔╝██║
  ██║  ██║███████╗██║  ██║██████╔╝██║  ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║
  ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═╝     ╚═╝
                  The context compression layer for AI agents

60–95% fewer tokens · library · proxy · MCP · content-aware compressors · local-first · reversible

CI codecov PyPI npm Model: Kompress-v2-base License: Apache 2.0 Docs

Docs · Install · Proof · Agents · Discord · llms.txt

AI agents / LLMs: read /llms.txt here, or fetch the live index / full docs blob.


chopratejas%2Fheadroom | Trendshift

Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

Headroom in action

Live: 10,144 → 1,260 tokens — same FATAL found.

What it does

  • Librarycompress(messages) in Python or TypeScript, inline in any app
  • Proxyheadroom proxy --port 8787, zero code changes, any language
  • Agent wrapheadroom wrap claude|codex|copilot|cursor|aider|opencode|cline|continue|goose|openhands|openclaw|vibe in one command; undo with headroom unwrap <tool>
  • MCP serverheadroom_compress, headroom_retrieve, headroom_stats for any MCP client
  • Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup
  • headroom learn — mines failed sessions, writes corrections to CLAUDE.local.md (default, gitignored) or CLAUDE.md / AGENTS.md / GEMINI.md
  • Output token reduction — trims what the model writes back (not just what you send): drops ceremony/restated code and skips deep "thinking" on routine steps. See Output token reduction.
  • Reversible (CCR) — originals are cached for retrieval on demand

How it works (30 seconds)

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-v2-base (text, HF)  │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)
  • ContentRouter — detects content type, selects the right compressor
  • SmartCrusher / CodeCompressor / Kompress-v2-base — compress JSON, AST, or prose
  • CacheAligner — stabilizes prefixes so provider KV caches actually hit
  • CCR — stores originals locally; LLM calls headroom_retrieve if it needs them

Architecture · CCR reversible compression · Kompress-v2-base model card

Get started (60 seconds)

# 1 — Install
pip install "headroom-ai[all]"          # Python
npm install headroom-ai                 # Node / TypeScript

# 2 — Pick your mode
headroom wrap claude                    # wrap a coding agent
headroom proxy --port 8787              # drop-in proxy, zero code changes
# or: from headroom import compress      # inline library

# 3 — Verify setup and see the savings
headroom doctor                         # health check — confirms routing is working
headroom perf
headroom dashboard                      # live savings dashboard (proxy must be running)

Granular extras: [proxy], [mcp], [ml], [code], [memory], [vector] (optional HNSW backend — needs a C++ toolchain, not in [all]), [relevance], [image], [agno], [langchain], [evals], [pytorch-mps] (Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps). Requires Python 3.10+.

Proof

Savings on real agent workloads:

Workload Before After Savings
Code search (100 results) 17,765 1,408 92%
SRE incident debugging 65,694 5,118 92%
GitHub issue triage 54,174 14,761 73%
Codebase exploration 78,502 41,254 47%

Accuracy preserved on standard benchmarks:

Benchmark Category N Baseline Headroom Delta
GSM8K Math 100 0.870 0.870 ±0.000
TruthfulQA Factual 100 0.530 0.560 +0.030
SQuAD v2 QA 100 97% 19% compression
BFCL Tools 100 97% 32% compression

Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology

Output token reduction (cut what the model writes back)

Everything above shrinks the prompt you send. But you also pay for every token the model writes back — and on Opus-class models output costs 5× input. A lot of that output is waste: "Great, let me…" preambles, re-printing code you just showed it, and deep "thinking" on routine steps like reading a file.

Headroom can trim that too, from the proxy, without you changing any code:

  • Verbosity steering — appends a short "be terse, don't restate context" note to the end of the system prompt (so your prompt cache still hits).
  • Effort routing — when a turn is just the model resuming after a tool result (a file read, a passing test), it dials the model's thinking effort down. New questions and errors keep full effort.

Turn it on:

export HEADROOM_OUTPUT_SHAPER=1     # off by default
headroom proxy --port 8787

Already running a proxy? These switches are read live on every request, so a proxy that headroom wrap reused (rather than started) would not see a value you export afterwards — its environment was snapshotted at launch. headroom wrap now hot-syncs your current settings to the running proxy via a loopback POST /admin/runtime-env, so they take effect immediately with no restart (no cold start, no dropped requests, no lost caches). Set them before you wrap. On a shared proxy these overrides are global — the last explicit setting wins.

Learn the right terseness for you. People don't say how terse they want answers — they show it (they interrupt long replies, or move on before they could have read them). headroom learn --verbosity reads your past sessions and picks the level automatically:

headroom learn --verbosity            # preview what it found (dry run)
headroom learn --verbosity --apply    # save it; the proxy uses it from now on

See how many output tokens you saved. Output savings are counterfactual — we never see what the model would have written — so Headroom reports an honest estimate with a confidence range, never a made-up number:

headroom output-savings
# Reduction: 31.7%  (95% CI 27.7% … 35.7%)   [estimated]

Want a measured number instead of an estimate? Leave 10% of conversations unshaped as a control group: export HEADROOM_OUTPUT_HOLDOUT=0.1. The dashboard shows an Output Tokens Saved card next to input compression, labelled measured or estimated with the confidence band.

→ Full write-up incl. the measurement methodology: Output token reduction

Star History Chart

Agent compatibility matrix

Agent headroom wrap Notes
Claude Code --memory · --code-graph · --1m · --tool-search
Codex shares memory with Claude
Cursor Manual setup starts proxy and prints base URLs for Cursor settings
Aider starts proxy + launches
Copilot CLI starts proxy + launches
OpenClaw installs as ContextEngine plugin
OpenCode injects config · starts proxy + launches
Cline starts proxy + injects config
Continue starts proxy + injects config
Goose starts proxy + launches
OpenHands starts proxy + launches
Mistral Vibe starts proxy + launches
Cortex Code Library only 60–65% savings (library mode; no wrap)

Any OpenAI-compatible client works via headroom proxy. MCP-native: headroom mcp install. Undo durable wrapping with headroom unwrap <tool> (supports: claude, copilot, codex, opencode, openclaw).

GitHub Copilot CLI subscription mode

Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:

headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o

This lets Headroom intercept OpenAI-compatible Copilot CLI requests and apply the same proxy compression pipeline before forwarding to GitHub Copilot's hosted API. The wrapper exchanges Headroom's reusable GitHub OAuth token for Copilot's short-lived API token and prints the upstream endpoint as COPILOT_PROVIDER_API_URL=... during launch.

headroom copilot-auth login stores a Headroom-specific Copilot OAuth token. This avoids relying on generic GitHub or Copilot CLI tokens that can read Copilot account metadata but may still be rejected by Copilot's token-exchange endpoint.

For GitHub Enterprise Server or custom-domain Copilot deployments, set the deployment domain before launching:

export GITHUB_COPILOT_ENTERPRISE_DOMAIN=ghe.example.com

For GitHub.com Enterprise Cloud URLs such as github.com/enterprises/your-enterprise, do not set an enterprise-domain override. Headroom uses GitHub's normal token-exchange endpoint and the Copilot API endpoint advertised for the signed-in account.

Platform support note: macOS auth reuse via Copilot CLI Keychain storage has been smoke-tested. Windows Credential Manager, Linux Secret Service / secret-tool, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. For Docker and CI, prefer passing an explicit GITHUB_COPILOT_TOKEN or GITHUB_COPILOT_GITHUB_TOKEN rather than relying on host keychain access.

When to use · When to skip

Great fit if you… - run AI coding agents daily and want savings without changing your code - work across multiple agents and want shared memory - need reversible compression — originals are retrievable via CCR within the configured TTL

Skip it if you… - only use a single provider's native compaction and don't need cross-agent memory - work in a sandboxed environment where local processes can't run

Integrations — drop Headroom into any stack

Your setup Hook in with
Any Python

Extension points exported contracts — how you extend this code

HeadroomClientInterface (Interface)
(no doc) [21 implementers]
sdk/typescript/src/types.ts
HeadroomOpenCodePluginOptions (Interface)
(no doc)
plugins/opencode/src/plugin.ts
HeadroomEngineConfig (Interface)
(no doc)
plugins/openclaw/src/engine.ts
CodeBlockProps (Interface)
(no doc)
docs/components/code-block.tsx
CommunityStats (Interface)
(no doc)
docs/lib/telemetry.ts
LanguageModel (Interface)
* Minimal structural type for Vercel AI SDK language models. * Compatible with both LanguageModelV1 (@ai-sdk/provider <
sdk/typescript/src/adapters/vercel-ai.ts
HeadroomModelMapping (Interface)
(no doc)
plugins/opencode/src/provider.ts
OpenAIMessage (Interface)
(no doc)
plugins/openclaw/src/convert.ts

Core symbols most depended-on inside this repo

append
called by 2918
agent-evals/src/agent_evals/orchestrator.py
get
called by 1320
headroom/proxy/semantic_cache.py
get
called by 926
headroom/storage/base.py
info
called by 457
plugins/openclaw/src/proxy-manager.ts
items
called by 446
headroom/cache/backends/sqlite.py
invoke
called by 432
headroom/integrations/agno/model.py
get
called by 349
headroom/cache/semantic.py
debug
called by 337
plugins/openclaw/src/proxy-manager.ts

Shape

Method 8,778
Function 6,929
Class 2,318
Route 175
Interface 81

Languages

Python98%
TypeScript2%

Modules by API surface

tests/test_memory/test_traffic_learner.py205 symbols
headroom/cli/wrap.py149 symbols
tests/test_parser.py133 symbols
tests/test_memory_system.py131 symbols
headroom/providers/proxy_routes.py131 symbols
headroom/proxy/server.py130 symbols
tests/test_learn/test_analyzer.py116 symbols
headroom/proxy/helpers.py110 symbols
tests/test_compression_store.py102 symbols
tests/test_memory_handler_native_ops.py99 symbols
tests/test_memory/test_extraction.py95 symbols
tests/test_transforms/test_code_compressor.py92 symbols

Dependencies from manifests, versioned

@ai-sdk/anthropic3.0.64 · 1×
@ai-sdk/openai3.0.48 · 1×
@ai-sdk/provider1.0.0 · 1×
@anthropic-ai/sdk0.104.1 · 1×
@opencode-ai/plugin1.17.8 · 1×
@radix-ui/react-slot1.3.0 · 1×
@types/mdx2.0.13 · 1×
@types/node22.10.0 · 1×
@types/react19.2.14 · 1×
@types/react-dom19.2.3 · 1×
ai6.0.0 · 1×

Datastores touched

dbDatabase · 1 repos
appDatabase · 1 repos
devDatabase · 1 repos
mydbDatabase · 1 repos

For agents

$ claude mcp add headroom \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact