hub / github.com/tashfeenahmed/freellmapi

github.com/tashfeenahmed/freellmapi @v0.4.1 sqlite

repository ↗ · DeepWiki ↗ · release v0.4.1 ↗

740 symbols 2,202 edges 156 files 46 documented · 6%

README

FreeLLMAPI

One OpenAI-compatible endpoint. Sixteen free LLM providers. ~1.7B tokens per month.

Aggregate the free tiers from Google, Groq, Cerebras, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, HuggingFace, Z.ai (Zhipu), Ollama, Kilo, Pollinations, LLM7, OVH AI Endpoints, and OpenCode Zen — plus any custom OpenAI-compatible endpoint (llama.cpp, LM Studio, vLLM, local Ollama) — behind a single /v1/chat/completions endpoint. Keys are stored encrypted. A router picks the best available model for each request, falls over to the next provider when one is rate-limited, and tracks per-key usage so you stay under every free-tier cap.

freellmapi.co — browse the live model catalog

Fallback chain with per-provider token budget

Why this exists
Supported providers
Features
Not yet supported
Quick start
Docker
Desktop app
Premium (live catalog)
Using the API
Screenshots
How it works
Context Handoff
Limitations
Contributing
Terms of Service review
Disclaimer

Why this exists

Every serious AI lab now offers a free tier — a few million tokens a month, a few thousand requests a day. On its own each tier is a toy. Stacked together, they add up to roughly 1.7 billion tokens per month of working inference capacity, across 100+ models from small-and-fast to reasonably capable.

The problem is that stacking them by hand is painful: seventeen different SDKs, seventeen different rate limits, seventeen places a request can fail. FreeLLMAPI collapses that into one OpenAI-compatible endpoint. Point any OpenAI client library at your local server, and it routes transparently across whichever providers you've added keys for.

Supported providers

Google Gemini 2.5 Flash · 3.x previews	Groq Llama 3.3, Llama 4, GPT-OSS, Qwen3	Cerebras Qwen3 235B	OpenCode Zen DeepSeek V4 Flash · Nemotron (promo)
Mistral Large 3 · Medium 3.5 · Codestral · Devstral	OpenRouter 21 free-tier models	GitHub Models GPT-4.1 · GPT-4o	Cloudflare Kimi K2 · GLM-4.7 · GPT-OSS · Granite 4
Cohere Command R+ · Command-A (trial)	Z.ai (Zhipu) GLM-4.5 · GLM-4.7 Flash	NVIDIA NIM · 40 RPM free (eval-only ToS)	HuggingFace Router → DeepSeek V4 · Kimi K2.6 · Qwen3
Ollama Cloud GLM-4.7 · Kimi K2 · gpt-oss · Qwen3	Kilo Gateway :free routes (anon ok)	Pollinations GPT-OSS 20B (anon ok)	LLM7 GPT-OSS · Llama 3.1 · GLM (anon ok)
OVH AI Endpoints Qwen3.5 397B · GPT-OSS · Llama 3.3 (anon ok)

Plus a custom provider — point at any OpenAI-compatible endpoint (llama.cpp, LM Studio, vLLM, a local Ollama, or a remote gateway) from the Keys page.

Features

OpenAI-compatible — POST /v1/chat/completions and GET /v1/models work with the official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, Continue, Hermes, etc.). Just change base_url.
Responses API — POST /v1/responses (the wire format current Codex CLI versions require) is implemented as a translating shim over the same router, with full streaming events and tool calls.
Streaming and non-streaming — Server-Sent Events for stream: true, JSON response otherwise. Every provider adapter implements both.
Tool calling — OpenAI-style tools / tool_choice requests are passed through, and assistant tool_calls + tool role follow-up messages round-trip across providers.
Embeddings — /v1/embeddings with family-based routing: failover only ever happens between providers serving the same model (vectors from different models are incompatible), never across models. See Embeddings.
Automatic fallover — If the chosen provider returns a 429, 5xx, or times out, the router skips it, puts the key on a short cooldown, and retries on the next model in your fallback chain (up to 20 attempts).
Per-key rate tracking — RPM, RPD, TPM, and TPD counters per (platform, model, key) so the router always picks a key that's under its caps.
Sticky sessions — Multi-turn conversations keep talking to the same model for 30 minutes to avoid the hallucination spike that comes from mid-conversation model switches.
Encrypted key storage — API keys are encrypted with AES-256-GCM before hitting SQLite; decryption happens in-memory just before a request.
Unified API key — Clients authenticate to your proxy with a single freellmapi-… bearer token. You never expose upstream provider keys to your apps.
Dashboard login — The admin UI and all /api/* routes are gated behind an email + password account (scrypt-hashed, session-token auth), set on first run. The /v1 proxy keeps its own unified-key auth for apps.
Health checks — Periodic probes mark keys as healthy, rate_limited, invalid, or error so the router skips dead ones automatically.
Admin dashboard — React + Vite UI to manage keys, reorder the fallback chain, inspect analytics, and run prompts in a playground. Dark mode included.
Analytics — Per-request logging with latency, token counts, success rate, and per-provider breakdowns.
Context handoff on model switch — Optional. When a session falls over to a different model, injects one compact system message so the new model knows it is continuing an existing task. Disabled by default; enable with FREELLMAPI_CONTEXT_HANDOFF=on_model_switch. See Context Handoff.
Runs anywhere Node 20+ runs — Windows, macOS, Linux servers, or a small ARM SBC (Raspberry Pi included). ~40 MB RSS at idle behind PM2 / systemd / whatever supervisor you prefer.

Not yet supported

The scope is deliberately narrow. If a feature isn't on this list and isn't below, assume it isn't there yet.

Image generation (/v1/images/*)
Audio / speech (/v1/audio/*)
Legacy completions (/v1/completions) — only the chat endpoint is implemented
Moderation (/v1/moderations)
n > 1 (multiple completions per request)
Per-user billing / multi-tenant auth — single-user by design

PRs that add any of these are very welcome. See Contributing.

Quick start

One-liner (Docker required — sets up ~/freellmapi, generates an encryption key, pulls the image, and starts the container):

curl -fsSL https://freellmapi.co/install.sh | bash

Prefer to read before you pipe to bash? The script is here. Re-running it is safe: your .env (and encryption key) is preserved and the container updates to :latest. Override the defaults with FREELLMAPI_DIR, PORT, or HOST_BIND env vars.

On Windows, the easiest path is the desktop .exe installer from Releases (below); the Docker steps work in WSL or any bash shell.

Or manually with Docker Compose. It runs the API and dashboard together on port 3001 and persists SQLite in a named volume.

Prerequisites: Docker, Docker Compose, OpenSSL.

git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi

# Generate an encryption key for at-rest key storage
ENCRYPTION_KEY="$(openssl rand -hex 32)"
printf "ENCRYPTION_KEY=%s\nPORT=3001\n" "$ENCRYPTION_KEY" > .env

docker compose up -d

Open http://localhost:3001, add your provider keys on the Keys page, reorder the Fallback Chain to taste, and grab your unified API key from the Keys page header. That unified key is what you point your OpenAI SDK at.

Reaching it from another machine? By default the container is published only on 127.0.0.1, so http://<server-ip>:3001 won't load from another device (the page just hangs). To expose it on your LAN — e.g. a Raspberry Pi at http://192.168.1.x:3001 — start it with HOST_BIND=0.0.0.0:

bash HOST_BIND=0.0.0.0 docker compose up -d

Only do this on a trusted network: the proxy is single-user and guarded only by the unified API key.

Local development

Prerequisites: Node.js 20+, npm.

git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi
npm install
cp .env.example .env
ENCRYPTION_KEY="$(node -e 'console.log(require("crypto").randomBytes(32).toString("hex"))')"
printf "ENCRYPTION_KEY=%s\nPORT=3001\n" "$ENCRYPTION_KEY" > .env
npm run dev

ENCRYPTION_KEY is required for startup. The server only falls back to a database-stored development key when DEV_MODE=true and NODE_ENV is not production; do not use that fallback with real provider keys.

Request analytics are retained for 90 days or 100000 request rows by default, whichever limit prunes first. Set REQUEST_ANALYTICS_RETENTION_DAYS=0 or REQUEST_ANALYTICS_MAX_ROWS=0 in .env to disable either retention limit.

Open http://localhost:5173 (the Vite dev UI), add your provider keys on the Keys page, reorder the Fallback Chain to taste, and grab your unified API key from the Keys page header. That unified key is what you point your OpenAI SDK at.

Reaching the dev UI from another device on your LAN? Use npm run dev:lan — it passes --host through to Vite, which then prints a Network: http://<your-ip>:5173 URL you can open from a phone or another machine. (Plain npm run dev -- --host does not work here: the root dev script is a concurrently wrapper, so the flag never reaches Vite.) API calls go through Vite's dev proxy, so no extra server config is needed.

For a production build without Docker:

npm run build
node server/dist/index.js     # server + dashboard both served on :3001

Docker

FreeLLMAPI publishes a single production image that contains the Express server and the built React dashboard:

docker pull ghcr.io/tashfeenahmed/freellmapi:latest   # or pin a release, e.g. :v1.2.3

The image is multi-arch (linux/amd64 + linux/arm64, so it runs on a Raspberry Pi). Published tags: latest (default branch), v*.*.* (git release tags), and sha-<commit>.

The included docker-compose.yml is the recommended install path:

docker compose up -d
docker compose logs -f freellmapi

By default the container's port is bound to 127.0.0.1 (localhost only). To reach the dashboard/API from another machine on your network, publish it on all interfaces with HOST_BIND=0.0.0.0 docker compose up -d — only on a trusted LAN, since the proxy is single-user.

SQLite data is stored in the freellmapi-data volume at /app/server/data. Keep the same .env ENCRYPTION_KEY and volume when upgrading, because provider keys are encrypted at rest.

More Docker operations and examples live in docker/README.md.

Desktop app

A native menu-bar app lives in desktop/: the entire router + dashboard running locally from your tray, with a glass popover showing live request stats.

FreeLLMAPI desktop app

Download from Releases — the macOS .dmg and the Windows .exe installer are built and attached to every release by the desktop-release workflow. Or build it from this repo in a few minutes:

npm install
npm run desktop:dist        # macOS  → desktop/dist-electron/FreeLLMAPI-…-arm64.dmg
npm run desktop:dist:win    # Windows → "desktop/dist-electron/FreeLLMAPI Setup ….exe"

Locally built apps are unsigned, so Windows SmartScreen may warn on first run ("More info" → "Run anyway"); the macOS build launches without Gatekeeper prompts.

Premium (live catalog)

The router keeps its model catalog fresh on its own: it pulls a signed catalog from [freellmapi.co](https://freellmapi.c

Extension points exported contracts — how you extend this code

MockModel (Interface)

(no doc)

client/dev/mockApi.ts

AuthStatus (Interface)

(no doc)

client/src/components/auth-gate.tsx

TodayStats (Interface)

WindowState (Interface)

(no doc)

server/src/middleware/rateLimit.ts

CopyButtonProps (Interface)

(no doc)

client/src/components/copy-button.tsx

DesktopConfig (Interface)

(no doc)

desktop/src/config.ts

Quirk (Interface)

(no doc)

shared/types.ts

Core symbols most depended-on inside this repo

getDb

called by 210

server/src/db/index.ts

initDb

called by 49

server/src/db/index.ts

apiFetch

called by 47

client/src/lib/api.ts

isRetryableError

called by 47

server/src/lib/error-classify.ts

called by 44

client/src/lib/utils.ts

chatCompletion

called by 34

server/src/providers/cohere.ts

setSetting

called by 30

server/src/db/index.ts

contentToString

called by 28

server/src/lib/content.ts

Shape

Function 604

Interface 99

Method 22

Class 15

Languages

TypeScript100%

Modules by API surface

server/src/services/router.ts39 symbols

server/src/db/migrations.ts38 symbols

client/src/pages/FallbackPage.tsx28 symbols

server/src/services/ratelimit.ts27 symbols

shared/types.ts25 symbols

server/src/services/fusion.ts23 symbols

server/src/providers/google.ts23 symbols

server/src/services/catalog-sync.ts18 symbols

server/src/routes/proxy.ts16 symbols

server/src/services/model-groups.ts15 symbols

server/src/services/embeddings.ts15 symbols

server/src/lib/proxy.ts15 symbols

Dependencies from manifests, versioned

@base-ui/react1.3.0 · 1×

@dnd-kit/core6.3.1 · 1×

@dnd-kit/sortable10.0.0 · 1×

@dnd-kit/utilities3.2.2 · 1×

@electron/rebuild3.7.0 · 1×

@eslint/js9.39.4 · 1×

@fontsource-variable/geist5.2.8 · 1×

@fontsource-variable/geist-mono5.2.7 · 1×

@freellmapi/shared* · 1×

@tailwindcss/vite4.2.2 · 1×

@tanstack/react-query5.97.0 · 1×

@types/better-sqlite37.6.13 · 1×

For agents

$ claude mcp add freellmapi \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/tashfeenahmed/freellmapi @v0.4.1 sqlite

FreeLLMAPI

Contents

Why this exists

Supported providers

Features

Not yet supported

Quick start

Local development

Docker

Desktop app

Premium (live catalog)

Extension points exported contracts — how you extend this code

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents