One OpenAI-compatible endpoint. Sixteen free LLM providers. ~1.7B tokens per month.
Aggregate the free tiers from Google, Groq, Cerebras, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, HuggingFace, Z.ai (Zhipu), Ollama, Kilo, Pollinations, LLM7, OVH AI Endpoints, and OpenCode Zen — plus any custom OpenAI-compatible endpoint (llama.cpp, LM Studio, vLLM, local Ollama) — behind a single /v1/chat/completions endpoint. Keys are stored encrypted. A router picks the best available model for each request, falls over to the next provider when one is rate-limited, and tracks per-key usage so you stay under every free-tier cap.
freellmapi.co — browse the live model catalog

Every serious AI lab now offers a free tier — a few million tokens a month, a few thousand requests a day. On its own each tier is a toy. Stacked together, they add up to roughly 1.7 billion tokens per month of working inference capacity, across 100+ models from small-and-fast to reasonably capable.
The problem is that stacking them by hand is painful: seventeen different SDKs, seventeen different rate limits, seventeen places a request can fail. FreeLLMAPI collapses that into one OpenAI-compatible endpoint. Point any OpenAI client library at your local server, and it routes transparently across whichever providers you've added keys for.
Plus a custom provider — point at any OpenAI-compatible endpoint (llama.cpp, LM Studio, vLLM, a local Ollama, or a remote gateway) from the Keys page.
POST /v1/chat/completions and GET /v1/models work with the official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, Continue, Hermes, etc.). Just change base_url.POST /v1/responses (the wire format current Codex CLI versions require) is implemented as a translating shim over the same router, with full streaming events and tool calls.stream: true, JSON response otherwise. Every provider adapter implements both.tools / tool_choice requests are passed through, and assistant tool_calls + tool role follow-up messages round-trip across providers./v1/embeddings with family-based routing: failover only ever happens between providers serving the same model (vectors from different models are incompatible), never across models. See Embeddings.(platform, model, key) so the router always picks a key that's under its caps.freellmapi-… bearer token. You never expose upstream provider keys to your apps./api/* routes are gated behind an email + password account (scrypt-hashed, session-token auth), set on first run. The /v1 proxy keeps its own unified-key auth for apps.healthy, rate_limited, invalid, or error so the router skips dead ones automatically.FREELLMAPI_CONTEXT_HANDOFF=on_model_switch. See Context Handoff.The scope is deliberately narrow. If a feature isn't on this list and isn't below, assume it isn't there yet.
/v1/images/*)/v1/audio/*)/v1/completions) — only the chat endpoint is implemented/v1/moderations)n > 1 (multiple completions per request)PRs that add any of these are very welcome. See Contributing.
One-liner (Docker required — sets up ~/freellmapi, generates an encryption key, pulls the image, and starts the container):
curl -fsSL https://freellmapi.co/install.sh | bash
Prefer to read before you pipe to bash? The script is here. Re-running it is safe: your .env (and encryption key) is preserved and the container updates to :latest. Override the defaults with FREELLMAPI_DIR, PORT, or HOST_BIND env vars.
On Windows, the easiest path is the desktop .exe installer from Releases (below); the Docker steps work in WSL or any bash shell.
Or manually with Docker Compose. It runs the API and dashboard together on port 3001 and persists SQLite in a named volume.
Prerequisites: Docker, Docker Compose, OpenSSL.
git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi
# Generate an encryption key for at-rest key storage
ENCRYPTION_KEY="$(openssl rand -hex 32)"
printf "ENCRYPTION_KEY=%s\nPORT=3001\n" "$ENCRYPTION_KEY" > .env
docker compose up -d
Open http://localhost:3001, add your provider keys on the Keys page, reorder the Fallback Chain to taste, and grab your unified API key from the Keys page header. That unified key is what you point your OpenAI SDK at.
Reaching it from another machine? By default the container is published only on
127.0.0.1, sohttp://<server-ip>:3001won't load from another device (the page just hangs). To expose it on your LAN — e.g. a Raspberry Pi athttp://192.168.1.x:3001— start it withHOST_BIND=0.0.0.0:
bash HOST_BIND=0.0.0.0 docker compose up -dOnly do this on a trusted network: the proxy is single-user and guarded only by the unified API key.
Prerequisites: Node.js 20+, npm.
git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi
npm install
cp .env.example .env
ENCRYPTION_KEY="$(node -e 'console.log(require("crypto").randomBytes(32).toString("hex"))')"
printf "ENCRYPTION_KEY=%s\nPORT=3001\n" "$ENCRYPTION_KEY" > .env
npm run dev
ENCRYPTION_KEY is required for startup. The server only falls back to a
database-stored development key when DEV_MODE=true and NODE_ENV is not
production; do not use that fallback with real provider keys.
Request analytics are retained for 90 days or 100000 request rows by default,
whichever limit prunes first. Set REQUEST_ANALYTICS_RETENTION_DAYS=0 or
REQUEST_ANALYTICS_MAX_ROWS=0 in .env to disable either retention limit.
Open http://localhost:5173 (the Vite dev UI), add your provider keys on the Keys page, reorder the Fallback Chain to taste, and grab your unified API key from the Keys page header. That unified key is what you point your OpenAI SDK at.
Reaching the dev UI from another device on your LAN? Use
npm run dev:lan— it passes--hostthrough to Vite, which then prints aNetwork: http://<your-ip>:5173URL you can open from a phone or another machine. (Plainnpm run dev -- --hostdoes not work here: the rootdevscript is aconcurrentlywrapper, so the flag never reaches Vite.) API calls go through Vite's dev proxy, so no extra server config is needed.
For a production build without Docker:
npm run build
node server/dist/index.js # server + dashboard both served on :3001
FreeLLMAPI publishes a single production image that contains the Express server and the built React dashboard:
docker pull ghcr.io/tashfeenahmed/freellmapi:latest # or pin a release, e.g. :v1.2.3
The image is multi-arch (linux/amd64 + linux/arm64, so it runs on a Raspberry Pi). Published tags: latest (default branch), v*.*.* (git release tags), and sha-<commit>.
The included docker-compose.yml is the recommended install path:
docker compose up -d
docker compose logs -f freellmapi
By default the container's port is bound to 127.0.0.1 (localhost only). To reach the dashboard/API from another machine on your network, publish it on all interfaces with HOST_BIND=0.0.0.0 docker compose up -d — only on a trusted LAN, since the proxy is single-user.
SQLite data is stored in the freellmapi-data volume at /app/server/data. Keep the same .env ENCRYPTION_KEY and volume when upgrading, because provider keys are encrypted at rest.
More Docker operations and examples live in docker/README.md.
A native menu-bar app lives in desktop/: the entire router +
dashboard running locally from your tray, with a glass popover showing live
request stats.

Download from Releases — the macOS .dmg and the Windows .exe installer are built and attached to every release by the desktop-release workflow. Or build it from this repo in a few minutes:
npm install
npm run desktop:dist # macOS → desktop/dist-electron/FreeLLMAPI-…-arm64.dmg
npm run desktop:dist:win # Windows → "desktop/dist-electron/FreeLLMAPI Setup ….exe"
Locally built apps are unsigned, so Windows SmartScreen may warn on first run ("More info" → "Run anyway"); the macOS build launches without Gatekeeper prompts.
The router keeps its model catalog fresh on its own: it pulls a signed catalog from [freellmapi.co](https://freellmapi.c
$ claude mcp add freellmapi \
-- python -m otcore.mcp_server <graph>