<a href="https://awesome.re">
<img src="https://awesome.re/badge-flat2.svg" alt="Awesome">
</a>
LLM APIs with permanent free tiers for text inference.
All endpoints are OpenAI SDK-compatible unless noted. Each link points to the provider's API key page.
APIs run by the companies that train or fine-tune the models themselves.
Permanent free tier, no credit card required. 15 RPM, 20K tokens/day. Specialized for roleplay and storytelling.
Base URL: https://api.aionlabs.ai/v1
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| Aion 2.5 | 128K | 32K | Text (roleplay) | 15 RPM, 20K TPD |
| Aion 2.0 | 128K | 32K | Text (roleplay) | 15 RPM, 20K TPD |
| Aion-RP 1.0 (8B) | 32K | ~8K | Text (roleplay) | 15 RPM, 20K TPD |
Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.
Base URL: https://api.cohere.com/v2
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| Command A+ (218B) | 128K | 4K | Text | 20 RPM |
| Command A (111B) | 256K | 4K | Text | 20 RPM |
| Command R+ | 128K | 4K | Text | 20 RPM |
| Command R | 128K | 4K | Text | 20 RPM |
| Command R7B | 128K | 4K | Text | 20 RPM |
Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products. [^1]
Base URL: https://generativelanguage.googleapis.com/v1beta
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| Gemini 3.5 Flash | 1M | 64K | Text + Image + Audio + Video | 15 RPM, 1,500 RPD |
| Gemini 3.1 Flash-Lite | 1M | 65K | Text + Image + Audio + Video | 30 RPM, 1,500 RPD |
| Gemini 2.5 Flash | 1M | 65K | Text + Image + Audio + Video | 15 RPM, 1,500 RPD |
| Gemini 2.5 Pro | 2M | 65K | Text + Image + Audio + Video | 5 RPM, 50 RPD |
Free "Experiment" plan, no credit card. ~1B tokens/month. Prompts may be used to improve models.
Base URL: https://api.mistral.ai/v1
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| Mistral Medium 3.5 (128B) | 256K | 256K | Text + Image + Code | ~1 RPS, 500K TPM |
| Mistral Small 4 | 256K | 256K | Text + Image + Code | ~1 RPS, 500K TPM |
| Mistral Large 3 | 256K | 256K | Text | ~1 RPS, 500K TPM |
| Mistral Nemo (12B) | 128K | 128K | Text | ~1 RPS, 500K TPM |
| Codestral | 256K | 256K | Code | ~1 RPS, 500K TPM |
| Pixtral Large | 128K | 128K | Text + Image | ~1 RPS, 500K TPM |
Permanent free models, no credit card required.
Base URL: https://open.bigmodel.cn/api/paas/v4
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| GLM-4.7-Flash | 200K | 128K | Text | 1 concurrent request |
| GLM-4.6V-Flash | 128K | ~4K | Text + Image | 1 concurrent request |
Third-party platforms that host open-weight models from various sources.
Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap. 8K context cap on free tier.
Base URL: https://api.cerebras.ai/v1
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| gpt-oss-120b | 128K (8K on free) | 8K | Text | 30 RPM, 14,400 RPD, 1M TPD |
| zai-glm-4.7 | 128K (8K on free) | 8K | Text | 10 RPM, 100 RPD, 1M TPD |
10,000 Neurons/day free. 50+ models available on free tier.
Base URL: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
@cf/meta/llama-3.3-70b-instruct-fp8-fast |
131K | Shared w/ context | Text | 10K neurons/day (shared) |
@cf/meta/llama-4-scout-17b-16e-instruct |
Up to 10M | Shared w/ context | Multimodal | 10K neurons/day (shared) |
@cf/openai/gpt-oss-120b |
128K | Shared w/ context | Text | 10K neurons/day (shared) |
@cf/moonshotai/kimi-k2.7-code |
262K | Shared w/ context | Text (code) | 10K neurons/day (shared) |
@cf/google/gemma-4-26b-a4b-it |
256K | Shared w/ context | Text | 10K neurons/day (shared) |
@cf/zhipuai/glm-4.7-flash |
131K | Shared w/ context | Text | 10K neurons/day (shared) |
@cf/mistralai/mistral-small-3.1-24b-instruct |
128K | Shared w/ context | Text | 10K neurons/day (shared) |
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b |
32K | Shared w/ context | Text (reasoning) | 10K neurons/day (shared) |
| + 42 more models | Varies | Varies | Text, Image, Audio, Embeddings | 10K neurons/day (shared) |
Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).
Base URL: https://models.github.ai/inference
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| gpt-5 | 200K | 32K | Text | 10 RPM, 50 RPD |
| gpt-4.1 | 1M | 32K | Text | 10 RPM, 50 RPD |
| gpt-4.1-mini | 1M | 32K | Text | 15 RPM, 150 RPD |
| gpt-4o | 128K | 16K | Text + Vision | 10 RPM, 50 RPD |
| o4-mini | 200K | 100K | Text (reasoning) | 10 RPM, 50 RPD |
| Llama-4-Scout-17B-16E | 512K | ~4K | Text + Vision | 15 RPM, 150 RPD |
| Llama-4-Maverick-17B-128E | 256K | ~4K | Text + Vision | 10 RPM, 50 RPD |
| Meta-Llama-3.3-70B | 131K | ~4K | Text | 15 RPM, 150 RPD |
| DeepSeek-R1 | 64K | 8K | Text (reasoning) | 15 RPM, 150 RPD |
| Mistral-Small-3.1 | 128K | ~4K | Text + Vision | 15 RPM, 150 RPD |
| + 35 more models | Varies | Varies | Text / Image | Varies by tier |
Free tier, no credit card. Ultra-fast LPU inference. [^2]
Base URL: https://api.groq.com/openai/v1
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| llama-3.3-70b-versatile | 131K | 32K | Text | 30 RPM, 1,000 RPD |
| llama-3.1-8b-instant | 131K | 131K | Text | 30 RPM, 1,000 RPD |
| llama-4-scout-17b-16e-instruct | 131K | 8K | Text + Vision | 30 RPM, 1,000 RPD |
| qwen3-32b | 131K | 131K | Text | 30 RPM, 1,000 RPD |
| gpt-oss-120b | 131K | 32K | Text | 30 RPM, 1,000 RPD |
100K monthly Inference Provider credits for free users. Routes to Fireworks, Together, Hyperbolic, Nebius, Novita, DeepInfra and others. Thousands of models.
Base URL: https://router.huggingface.co/v1
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| Meta-Llama-3.1-8B-Instruct | 128K | ~4K | Text | Credit-metered |
| Mistral-7B-Instruct-v0.3 | 32K | ~4K | Text | Credit-metered |
| Mixtral-8x7B-Instruct-v0.1 | 32K | ~4K | Text | Credit-metered |
| Phi-3.5-mini-instruct | 128K | ~4K | Text | Credit-metered |
| Qwen2.5-7B-Instruct | 131K | ~4K | Text | Credit-metered |
| + thousands of community models | Varies | Varies | Text, Image, Audio, Embeddings | 100K credits/month free |
Free models with no credit card required. kilo-auto/free auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). [^5]
Base URL: https://api.kilo.ai/api/gateway
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
x-ai/grok-code-fast-1:free |
256K | — | Text (code) | ~200 req/hr |
minimax/minimax-m2.5:free |
196K | 8K | Text | ~200 req/hr |
bytedance-seed/dola-seed-2.0-pro:free |
— | — | Text | ~200 req/hr |
nvidia/nemotron-3-super-120b-a12b:free |
262K | 32K | Text | ~200 req/hr |
arcee-ai/trinity-large-thinking:free |
— | — | Text (reasoning) | ~200 req/hr |
openrouter/free |
Varies | Varies | Text | ~200 req/hr |
Zero-friction API gateway. No registration needed for basic access. 30+ models. GDPR-compliant.
Base URL: https://api.llm7.io/v1
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
| deepseek-r1-0528 | — | — | Text (reasoning) | 30 RPM (120 with token) |
| deepseek-v3-0324 | — | — | Text | 30 RPM (120 with token) |
| gemini-2.5-flash-lite | — | — | Text + Vision | 30 RPM (120 with token) |
| gpt-4o-mini | — | — | Text + Vision | 30 RPM (120 with token) |
| mistral-small-3.1-24b | 32K | — | Text | 30 RPM (120 with token) |
| qwen2.5-coder-32b | — | — | Text (code) | 30 RPM (120 with token) |
| + ~24 more models | Varies | Varies | Text | 30 RPM (120 with token) |
Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification. [^6]
Base URL: https://api-inference.modelscope.cn/v1
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
Qwen/Qwen3.5-35B-A3B |
— | — | Text | 2,000 RPD total; <=500 RPD/model (dynamic) |
Qwen/Qwen3.5-27B |
— | — | Text | 2,000 RPD total; <=500 RPD/model (dynamic) |
| + API-Inference-enabled models | Varies | Varies | LLM, MLLM | Dynamic quotas + dynamic concurrency |
Free with NVIDIA Developer Program membership. 100+ models. Rate-limited (no daily token cap).
Base URL: https://integrate.api.nvidia.com/v1
| Model Name | Context | Max Output | Modality | Rate Limit |
|---|---|---|---|---|
deepseek-ai/deepseek-r1 |
128K | ~163K | Text (reasoning) | ~40 RPM |
nvidia/nemotron-3-super-120b-a12b |
262K | 262K | Text | ~40 RPM |
nvidia/nemotron-3-nano-30b-a3b |
128K | 32K | Text | ~40 RPM |
nvidia/llama-3.1-nemotron-ultra-253b-v1 |
128K | 4K | Text | ~40 RPM |
meta/llama-3.1-405b-instruct |
128K | 4K |
$ claude mcp add awesome-free-llm-apis \
-- python -m otcore.mcp_server <graph>