hub / github.com/headroomlabs-ai/headroom

github.com/headroomlabs-ai/headroom @v0.28.0 sqlite

repository ↗ · DeepWiki ↗ · release v0.28.0 ↗

18,281 symbols 74,093 edges 1,126 files 10,446 documented · 57%

README

  ██╗  ██╗███████╗ █████╗ ██████╗ ██████╗  ██████╗  ██████╗ ███╗   ███╗
  ██║  ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║
  ███████║█████╗  ███████║██║  ██║██████╔╝██║   ██║██║   ██║██╔████╔██║
  ██╔══██║██╔══╝  ██╔══██║██║  ██║██╔══██╗██║   ██║██║   ██║██║╚██╔╝██║
  ██║  ██║███████╗██║  ██║██████╔╝██║  ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║
  ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═╝     ╚═╝
                  The context compression layer for AI agents

60–95% fewer tokens · library · proxy · MCP · content-aware compressors · local-first · reversible

Docs · Install · Proof · Agents · Discord · llms.txt

_{AI agents / LLMs: read /llms.txt here, or fetch the live index / full docs blob.}

Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

Headroom in action

_{Live: 10,144 → 1,260 tokens — same FATAL found.}

What it does

Library — compress(messages) in Python or TypeScript, inline in any app
Proxy — headroom proxy --port 8787, zero code changes, any language
Agent wrap — headroom wrap claude|codex|copilot|cursor|aider|opencode|cline|continue|goose|openhands|openclaw|vibe in one command; undo with headroom unwrap <tool>
MCP server — headroom_compress, headroom_retrieve, headroom_stats for any MCP client
Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup
headroom learn — mines failed sessions, writes corrections to CLAUDE.local.md (default, gitignored) or CLAUDE.md / AGENTS.md / GEMINI.md
Output token reduction — trims what the model writes back (not just what you send): drops ceremony/restated code and skips deep "thinking" on routine steps. See Output token reduction.
Reversible (CCR) — originals are cached for retrieval on demand

How it works (30 seconds)

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-v2-base (text, HF)  │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)

ContentRouter — detects content type, selects the right compressor
SmartCrusher / CodeCompressor / Kompress-v2-base — compress JSON, AST, or prose
CacheAligner — stabilizes prefixes so provider KV caches actually hit
CCR — stores originals locally; LLM calls headroom_retrieve if it needs them

→ Architecture · CCR reversible compression · Kompress-v2-base model card

Get started (60 seconds)

# 1 — Install
pip install "headroom-ai[all]"          # Python
npm install headroom-ai                 # Node / TypeScript

# 2 — Pick your mode
headroom wrap claude                    # wrap a coding agent
headroom proxy --port 8787              # drop-in proxy, zero code changes
# or: from headroom import compress      # inline library

# 3 — Verify setup and see the savings
headroom doctor                         # health check — confirms routing is working
headroom perf
headroom dashboard                      # live savings dashboard (proxy must be running)

Granular extras: [proxy], [mcp], [ml], [code], [memory], [vector] (optional HNSW backend — needs a C++ toolchain, not in [all]), [relevance], [image], [agno], [langchain], [evals], [pytorch-mps] (Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps). Requires Python 3.10+.

Proof

Savings on real agent workloads:

Workload	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%

Accuracy preserved on standard benchmarks:

Benchmark	Category	N	Baseline	Headroom	Delta
GSM8K	Math	100	0.870	0.870	±0.000
TruthfulQA	Factual	100	0.530	0.560	+0.030
SQuAD v2	QA	100	—	97%	19% compression
BFCL	Tools	100	—	97%	32% compression

Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology

Output token reduction (cut what the model writes back)

Everything above shrinks the prompt you send. But you also pay for every token the model writes back — and on Opus-class models output costs 5× input. A lot of that output is waste: "Great, let me…" preambles, re-printing code you just showed it, and deep "thinking" on routine steps like reading a file.

Headroom can trim that too, from the proxy, without you changing any code:

Verbosity steering — appends a short "be terse, don't restate context" note to the end of the system prompt (so your prompt cache still hits).
Effort routing — when a turn is just the model resuming after a tool result (a file read, a passing test), it dials the model's thinking effort down. New questions and errors keep full effort.

Turn it on:

export HEADROOM_OUTPUT_SHAPER=1     # off by default
headroom proxy --port 8787

Already running a proxy? These switches are read live on every request, so a proxy that headroom wrap reused (rather than started) would not see a value you export afterwards — its environment was snapshotted at launch. headroom wrap now hot-syncs your current settings to the running proxy via a loopback POST /admin/runtime-env, so they take effect immediately with no restart (no cold start, no dropped requests, no lost caches). Set them before you wrap. On a shared proxy these overrides are global — the last explicit setting wins.

Learn the right terseness for you. People don't say how terse they want answers — they show it (they interrupt long replies, or move on before they could have read them). headroom learn --verbosity reads your past sessions and picks the level automatically:

headroom learn --verbosity            # preview what it found (dry run)
headroom learn --verbosity --apply    # save it; the proxy uses it from now on

See how many output tokens you saved. Output savings are counterfactual — we never see what the model would have written — so Headroom reports an honest estimate with a confidence range, never a made-up number:

headroom output-savings
# Reduction: 31.7%  (95% CI 27.7% … 35.7%)   [estimated]

Want a measured number instead of an estimate? Leave 10% of conversations unshaped as a control group: export HEADROOM_OUTPUT_HOLDOUT=0.1. The dashboard shows an Output Tokens Saved card next to input compression, labelled measured or estimated with the confidence band.

→ Full write-up incl. the measurement methodology: Output token reduction

Agent compatibility matrix

Agent	`headroom wrap`	Notes
Claude Code	✅	`--memory` · `--code-graph` · `--1m` · `--tool-search`
Codex	✅	shares memory with Claude
Cursor	Manual setup	starts proxy and prints base URLs for Cursor settings
Aider	✅	starts proxy + launches
Copilot CLI	✅	starts proxy + launches
OpenClaw	✅	installs as ContextEngine plugin
OpenCode	✅	injects config · starts proxy + launches
Cline	✅	starts proxy + injects config
Continue	✅	starts proxy + injects config
Goose	✅	starts proxy + launches
OpenHands	✅	starts proxy + launches
Mistral Vibe	✅	starts proxy + launches
Cortex Code	Library only	60–65% savings (library mode; no `wrap`)

Any OpenAI-compatible client works via headroom proxy. MCP-native: headroom mcp install. Undo durable wrapping with headroom unwrap <tool> (supports: claude, copilot, codex, opencode, openclaw).

GitHub Copilot CLI subscription mode

Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:

headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o

This lets Headroom intercept OpenAI-compatible Copilot CLI requests and apply the same proxy compression pipeline before forwarding to GitHub Copilot's hosted API. The wrapper exchanges Headroom's reusable GitHub OAuth token for Copilot's short-lived API token and prints the upstream endpoint as COPILOT_PROVIDER_API_URL=... during launch.

headroom copilot-auth login stores a Headroom-specific Copilot OAuth token. This avoids relying on generic GitHub or Copilot CLI tokens that can read Copilot account metadata but may still be rejected by Copilot's token-exchange endpoint.

For GitHub Enterprise Server or custom-domain Copilot deployments, set the deployment domain before launching:

export GITHUB_COPILOT_ENTERPRISE_DOMAIN=ghe.example.com

For GitHub.com Enterprise Cloud URLs such as github.com/enterprises/your-enterprise, do not set an enterprise-domain override. Headroom uses GitHub's normal token-exchange endpoint and the Copilot API endpoint advertised for the signed-in account.

Platform support note: macOS auth reuse via Copilot CLI Keychain storage has been smoke-tested. Windows Credential Manager, Linux Secret Service / secret-tool, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. For Docker and CI, prefer passing an explicit GITHUB_COPILOT_TOKEN or GITHUB_COPILOT_GITHUB_TOKEN rather than relying on host keychain access.

When to use · When to skip

Great fit if you… - run AI coding agents daily and want savings without changing your code - work across multiple agents and want shared memory - need reversible compression — originals are retrievable via CCR within the configured TTL

Skip it if you… - only use a single provider's native compaction and don't need cross-agent memory - work in a sandboxed environment where local processes can't run

Integrations — drop Headroom into any stack

Your setup	Hook in with
Any Python

Extension points exported contracts — how you extend this code

HeadroomClientInterface (Interface)

(no doc) [21 implementers]

sdk/typescript/src/types.ts

HeadroomOpenCodePluginOptions (Interface)

(no doc)

plugins/opencode/src/plugin.ts

HeadroomEngineConfig (Interface)

(no doc)

plugins/openclaw/src/engine.ts

CodeBlockProps (Interface)

(no doc)

docs/components/code-block.tsx

CommunityStats (Interface)

(no doc)

docs/lib/telemetry.ts

LanguageModel (Interface)

* Minimal structural type for Vercel AI SDK language models. * Compatible with both LanguageModelV1 (@ai-sdk/provider <

sdk/typescript/src/adapters/vercel-ai.ts

HeadroomModelMapping (Interface)

(no doc)

plugins/opencode/src/provider.ts

OpenAIMessage (Interface)

(no doc)

plugins/openclaw/src/convert.ts

Core symbols most depended-on inside this repo

append

called by 2918

agent-evals/src/agent_evals/orchestrator.py

get

called by 1320

headroom/proxy/semantic_cache.py

get

called by 926

headroom/storage/base.py

info

called by 457

plugins/openclaw/src/proxy-manager.ts

items

called by 446

headroom/cache/backends/sqlite.py

invoke

called by 432

headroom/integrations/agno/model.py

get

called by 349

headroom/cache/semantic.py

debug

called by 337

plugins/openclaw/src/proxy-manager.ts

Shape

Method 8,778

Function 6,929

Class 2,318

Route 175

Interface 81

Languages

Python98%

TypeScript2%

Modules by API surface

tests/test_memory/test_traffic_learner.py205 symbols

headroom/cli/wrap.py149 symbols

tests/test_parser.py133 symbols

tests/test_memory_system.py131 symbols

headroom/providers/proxy_routes.py131 symbols

headroom/proxy/server.py130 symbols

tests/test_learn/test_analyzer.py116 symbols

headroom/proxy/helpers.py110 symbols

tests/test_compression_store.py102 symbols

tests/test_memory_handler_native_ops.py99 symbols

tests/test_memory/test_extraction.py95 symbols

tests/test_transforms/test_code_compressor.py92 symbols

Dependencies from manifests, versioned

@ai-sdk/anthropic3.0.64 · 1×

@ai-sdk/openai3.0.48 · 1×

@ai-sdk/provider1.0.0 · 1×

@anthropic-ai/sdk0.104.1 · 1×

@opencode-ai/plugin1.17.8 · 1×

@radix-ui/react-slot1.3.0 · 1×

@tailwindcss/postcss4.2.2 · 1×

@types/mdx2.0.13 · 1×

@types/node22.10.0 · 1×

@types/react19.2.14 · 1×

@types/react-dom19.2.3 · 1×

ai6.0.0 · 1×

Datastores touched

dbDatabase · 1 repos

appDatabase · 1 repos

devDatabase · 1 repos

mydbDatabase · 1 repos

For agents

$ claude mcp add headroom \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact