MCPcopy
hub / github.com/mnfst/awesome-free-llm-apis

github.com/mnfst/awesome-free-llm-apis @main sqlite

repository ↗ · DeepWiki ↗
6 symbols 11 edges 1 files 0 documented · 0%
README

Awesome Free LLM APIs

<a href="https://awesome.re">
    <img src="https://awesome.re/badge-flat2.svg" alt="Awesome">
</a>

LLM APIs with permanent free tiers for text inference.

All endpoints are OpenAI SDK-compatible unless noted. Each link points to the provider's API key page.

Contents

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

Aion Labs 🇮🇱

Permanent free tier, no credit card required. 15 RPM, 20K tokens/day. Specialized for roleplay and storytelling.

Base URL: https://api.aionlabs.ai/v1

Model Name Context Max Output Modality Rate Limit
Aion 2.5 128K 32K Text (roleplay) 15 RPM, 20K TPD
Aion 2.0 128K 32K Text (roleplay) 15 RPM, 20K TPD
Aion-RP 1.0 (8B) 32K ~8K Text (roleplay) 15 RPM, 20K TPD

Cohere 🇨🇦

Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.

Base URL: https://api.cohere.com/v2

Model Name Context Max Output Modality Rate Limit
Command A+ (218B) 128K 4K Text 20 RPM
Command A (111B) 256K 4K Text 20 RPM
Command R+ 128K 4K Text 20 RPM
Command R 128K 4K Text 20 RPM
Command R7B 128K 4K Text 20 RPM

Google Gemini 🇺🇸

Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products. [^1]

Base URL: https://generativelanguage.googleapis.com/v1beta

Model Name Context Max Output Modality Rate Limit
Gemini 3.5 Flash 1M 64K Text + Image + Audio + Video 15 RPM, 1,500 RPD
Gemini 3.1 Flash-Lite 1M 65K Text + Image + Audio + Video 30 RPM, 1,500 RPD
Gemini 2.5 Flash 1M 65K Text + Image + Audio + Video 15 RPM, 1,500 RPD
Gemini 2.5 Pro 2M 65K Text + Image + Audio + Video 5 RPM, 50 RPD

Mistral AI 🇫🇷

Free "Experiment" plan, no credit card. ~1B tokens/month. Prompts may be used to improve models.

Base URL: https://api.mistral.ai/v1

Model Name Context Max Output Modality Rate Limit
Mistral Medium 3.5 (128B) 256K 256K Text + Image + Code ~1 RPS, 500K TPM
Mistral Small 4 256K 256K Text + Image + Code ~1 RPS, 500K TPM
Mistral Large 3 256K 256K Text ~1 RPS, 500K TPM
Mistral Nemo (12B) 128K 128K Text ~1 RPS, 500K TPM
Codestral 256K 256K Code ~1 RPS, 500K TPM
Pixtral Large 128K 128K Text + Image ~1 RPS, 500K TPM

Z AI (Zhipu AI) 🇨🇳

Permanent free models, no credit card required.

Base URL: https://open.bigmodel.cn/api/paas/v4

Model Name Context Max Output Modality Rate Limit
GLM-4.7-Flash 200K 128K Text 1 concurrent request
GLM-4.6V-Flash 128K ~4K Text + Image 1 concurrent request

Inference providers

Third-party platforms that host open-weight models from various sources.

Cerebras 🇺🇸

Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap. 8K context cap on free tier.

Base URL: https://api.cerebras.ai/v1

Model Name Context Max Output Modality Rate Limit
gpt-oss-120b 128K (8K on free) 8K Text 30 RPM, 14,400 RPD, 1M TPD
zai-glm-4.7 128K (8K on free) 8K Text 10 RPM, 100 RPD, 1M TPD

Cloudflare Workers AI 🇺🇸

10,000 Neurons/day free. 50+ models available on free tier.

Base URL: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run

Model Name Context Max Output Modality Rate Limit
@cf/meta/llama-3.3-70b-instruct-fp8-fast 131K Shared w/ context Text 10K neurons/day (shared)
@cf/meta/llama-4-scout-17b-16e-instruct Up to 10M Shared w/ context Multimodal 10K neurons/day (shared)
@cf/openai/gpt-oss-120b 128K Shared w/ context Text 10K neurons/day (shared)
@cf/moonshotai/kimi-k2.7-code 262K Shared w/ context Text (code) 10K neurons/day (shared)
@cf/google/gemma-4-26b-a4b-it 256K Shared w/ context Text 10K neurons/day (shared)
@cf/zhipuai/glm-4.7-flash 131K Shared w/ context Text 10K neurons/day (shared)
@cf/mistralai/mistral-small-3.1-24b-instruct 128K Shared w/ context Text 10K neurons/day (shared)
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b 32K Shared w/ context Text (reasoning) 10K neurons/day (shared)
+ 42 more models Varies Varies Text, Image, Audio, Embeddings 10K neurons/day (shared)

GitHub Models 🇺🇸

Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).

Base URL: https://models.github.ai/inference

Model Name Context Max Output Modality Rate Limit
gpt-5 200K 32K Text 10 RPM, 50 RPD
gpt-4.1 1M 32K Text 10 RPM, 50 RPD
gpt-4.1-mini 1M 32K Text 15 RPM, 150 RPD
gpt-4o 128K 16K Text + Vision 10 RPM, 50 RPD
o4-mini 200K 100K Text (reasoning) 10 RPM, 50 RPD
Llama-4-Scout-17B-16E 512K ~4K Text + Vision 15 RPM, 150 RPD
Llama-4-Maverick-17B-128E 256K ~4K Text + Vision 10 RPM, 50 RPD
Meta-Llama-3.3-70B 131K ~4K Text 15 RPM, 150 RPD
DeepSeek-R1 64K 8K Text (reasoning) 15 RPM, 150 RPD
Mistral-Small-3.1 128K ~4K Text + Vision 15 RPM, 150 RPD
+ 35 more models Varies Varies Text / Image Varies by tier

Groq 🇺🇸

Free tier, no credit card. Ultra-fast LPU inference. [^2]

Base URL: https://api.groq.com/openai/v1

Model Name Context Max Output Modality Rate Limit
llama-3.3-70b-versatile 131K 32K Text 30 RPM, 1,000 RPD
llama-3.1-8b-instant 131K 131K Text 30 RPM, 1,000 RPD
llama-4-scout-17b-16e-instruct 131K 8K Text + Vision 30 RPM, 1,000 RPD
qwen3-32b 131K 131K Text 30 RPM, 1,000 RPD
gpt-oss-120b 131K 32K Text 30 RPM, 1,000 RPD

Hugging Face 🇺🇸

100K monthly Inference Provider credits for free users. Routes to Fireworks, Together, Hyperbolic, Nebius, Novita, DeepInfra and others. Thousands of models.

Base URL: https://router.huggingface.co/v1

Model Name Context Max Output Modality Rate Limit
Meta-Llama-3.1-8B-Instruct 128K ~4K Text Credit-metered
Mistral-7B-Instruct-v0.3 32K ~4K Text Credit-metered
Mixtral-8x7B-Instruct-v0.1 32K ~4K Text Credit-metered
Phi-3.5-mini-instruct 128K ~4K Text Credit-metered
Qwen2.5-7B-Instruct 131K ~4K Text Credit-metered
+ thousands of community models Varies Varies Text, Image, Audio, Embeddings 100K credits/month free

Kilo Code 🇺🇸

Free models with no credit card required. kilo-auto/free auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). [^5]

Base URL: https://api.kilo.ai/api/gateway

Model Name Context Max Output Modality Rate Limit
x-ai/grok-code-fast-1:free 256K Text (code) ~200 req/hr
minimax/minimax-m2.5:free 196K 8K Text ~200 req/hr
bytedance-seed/dola-seed-2.0-pro:free Text ~200 req/hr
nvidia/nemotron-3-super-120b-a12b:free 262K 32K Text ~200 req/hr
arcee-ai/trinity-large-thinking:free Text (reasoning) ~200 req/hr
openrouter/free Varies Varies Text ~200 req/hr

LLM7.io 🇬🇧

Zero-friction API gateway. No registration needed for basic access. 30+ models. GDPR-compliant.

Base URL: https://api.llm7.io/v1

Model Name Context Max Output Modality Rate Limit
deepseek-r1-0528 Text (reasoning) 30 RPM (120 with token)
deepseek-v3-0324 Text 30 RPM (120 with token)
gemini-2.5-flash-lite Text + Vision 30 RPM (120 with token)
gpt-4o-mini Text + Vision 30 RPM (120 with token)
mistral-small-3.1-24b 32K Text 30 RPM (120 with token)
qwen2.5-coder-32b Text (code) 30 RPM (120 with token)
+ ~24 more models Varies Varies Text 30 RPM (120 with token)

ModelScope 🇨🇳

Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification. [^6]

Base URL: https://api-inference.modelscope.cn/v1

Model Name Context Max Output Modality Rate Limit
Qwen/Qwen3.5-35B-A3B Text 2,000 RPD total; <=500 RPD/model (dynamic)
Qwen/Qwen3.5-27B Text 2,000 RPD total; <=500 RPD/model (dynamic)
+ API-Inference-enabled models Varies Varies LLM, MLLM Dynamic quotas + dynamic concurrency

NVIDIA NIM 🇺🇸

Free with NVIDIA Developer Program membership. 100+ models. Rate-limited (no daily token cap).

Base URL: https://integrate.api.nvidia.com/v1

Model Name Context Max Output Modality Rate Limit
deepseek-ai/deepseek-r1 128K ~163K Text (reasoning) ~40 RPM
nvidia/nemotron-3-super-120b-a12b 262K 262K Text ~40 RPM
nvidia/nemotron-3-nano-30b-a3b 128K 32K Text ~40 RPM
nvidia/llama-3.1-nemotron-ultra-253b-v1 128K 4K Text ~40 RPM
meta/llama-3.1-405b-instruct 128K 4K

Core symbols most depended-on inside this repo

alignTable
called by 2
scripts/generate-readme.js
formatRow
called by 1
scripts/generate-readme.js
formatModelName
called by 1
scripts/generate-readme.js
buildTable
called by 1
scripts/generate-readme.js
pad
called by 0
scripts/generate-readme.js
buildProviderSection
called by 0
scripts/generate-readme.js

Shape

Function 6

Languages

TypeScript100%

Modules by API surface

scripts/generate-readme.js6 symbols

For agents

$ claude mcp add awesome-free-llm-apis \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact