hub / github.com/mnfst/awesome-free-llm-apis

github.com/mnfst/awesome-free-llm-apis @main sqlite

6 symbols 11 edges 1 files 0 documented · 0%

README

<a href="https://awesome.re">
    <img src="https://awesome.re/badge-flat2.svg" alt="Awesome">
</a>

LLM APIs with permanent free tiers for text inference.

_{All endpoints are OpenAI SDK-compatible unless noted. Each link points to the provider's API key page.}

Provider APIs
Inference providers
Glossary

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

Aion Labs 🇮🇱

Permanent free tier, no credit card required. 15 RPM, 20K tokens/day. Specialized for roleplay and storytelling.

Base URL: https://api.aionlabs.ai/v1

Model Name	Context	Max Output	Modality	Rate Limit
Aion 2.5	128K	32K	Text (roleplay)	15 RPM, 20K TPD
Aion 2.0	128K	32K	Text (roleplay)	15 RPM, 20K TPD
Aion-RP 1.0 (8B)	32K	~8K	Text (roleplay)	15 RPM, 20K TPD

Cohere 🇨🇦

Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.

Base URL: https://api.cohere.com/v2

Model Name	Context	Max Output	Modality	Rate Limit
Command A+ (218B)	128K	4K	Text	20 RPM
Command A (111B)	256K	4K	Text	20 RPM
Command R+	128K	4K	Text	20 RPM
Command R	128K	4K	Text	20 RPM
Command R7B	128K	4K	Text	20 RPM

Google Gemini 🇺🇸

Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products. [^1]

Base URL: https://generativelanguage.googleapis.com/v1beta

Model Name	Context	Max Output	Modality	Rate Limit
Gemini 3.5 Flash	1M	64K	Text + Image + Audio + Video	15 RPM, 1,500 RPD
Gemini 3.1 Flash-Lite	1M	65K	Text + Image + Audio + Video	30 RPM, 1,500 RPD
Gemini 2.5 Flash	1M	65K	Text + Image + Audio + Video	15 RPM, 1,500 RPD
Gemini 2.5 Pro	2M	65K	Text + Image + Audio + Video	5 RPM, 50 RPD

Mistral AI 🇫🇷

Free "Experiment" plan, no credit card. ~1B tokens/month. Prompts may be used to improve models.

Base URL: https://api.mistral.ai/v1

Model Name	Context	Max Output	Modality	Rate Limit
Mistral Medium 3.5 (128B)	256K	256K	Text + Image + Code	~1 RPS, 500K TPM
Mistral Small 4	256K	256K	Text + Image + Code	~1 RPS, 500K TPM
Mistral Large 3	256K	256K	Text	~1 RPS, 500K TPM
Mistral Nemo (12B)	128K	128K	Text	~1 RPS, 500K TPM
Codestral	256K	256K	Code	~1 RPS, 500K TPM
Pixtral Large	128K	128K	Text + Image	~1 RPS, 500K TPM

Z AI (Zhipu AI) 🇨🇳

Permanent free models, no credit card required.

Base URL: https://open.bigmodel.cn/api/paas/v4

Model Name	Context	Max Output	Modality	Rate Limit
GLM-4.7-Flash	200K	128K	Text	1 concurrent request
GLM-4.6V-Flash	128K	~4K	Text + Image	1 concurrent request

Inference providers

Third-party platforms that host open-weight models from various sources.

Cerebras 🇺🇸

Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap. 8K context cap on free tier.

Base URL: https://api.cerebras.ai/v1

Model Name	Context	Max Output	Modality	Rate Limit
gpt-oss-120b	128K (8K on free)	8K	Text	30 RPM, 14,400 RPD, 1M TPD
zai-glm-4.7	128K (8K on free)	8K	Text	10 RPM, 100 RPD, 1M TPD

Cloudflare Workers AI 🇺🇸

10,000 Neurons/day free. 50+ models available on free tier.

Base URL: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run

Model Name	Context	Max Output	Modality	Rate Limit
`@cf/meta/llama-3.3-70b-instruct-fp8-fast`	131K	Shared w/ context	Text	10K neurons/day (shared)
`@cf/meta/llama-4-scout-17b-16e-instruct`	Up to 10M	Shared w/ context	Multimodal	10K neurons/day (shared)
`@cf/openai/gpt-oss-120b`	128K	Shared w/ context	Text	10K neurons/day (shared)
`@cf/moonshotai/kimi-k2.7-code`	262K	Shared w/ context	Text (code)	10K neurons/day (shared)
`@cf/google/gemma-4-26b-a4b-it`	256K	Shared w/ context	Text	10K neurons/day (shared)
`@cf/zhipuai/glm-4.7-flash`	131K	Shared w/ context	Text	10K neurons/day (shared)
`@cf/mistralai/mistral-small-3.1-24b-instruct`	128K	Shared w/ context	Text	10K neurons/day (shared)
`@cf/deepseek-ai/deepseek-r1-distill-qwen-32b`	32K	Shared w/ context	Text (reasoning)	10K neurons/day (shared)
+ 42 more models	Varies	Varies	Text, Image, Audio, Embeddings	10K neurons/day (shared)

GitHub Models 🇺🇸

Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).

Base URL: https://models.github.ai/inference

Model Name	Context	Max Output	Modality	Rate Limit
gpt-5	200K	32K	Text	10 RPM, 50 RPD
gpt-4.1	1M	32K	Text	10 RPM, 50 RPD
gpt-4.1-mini	1M	32K	Text	15 RPM, 150 RPD
gpt-4o	128K	16K	Text + Vision	10 RPM, 50 RPD
o4-mini	200K	100K	Text (reasoning)	10 RPM, 50 RPD
Llama-4-Scout-17B-16E	512K	~4K	Text + Vision	15 RPM, 150 RPD
Llama-4-Maverick-17B-128E	256K	~4K	Text + Vision	10 RPM, 50 RPD
Meta-Llama-3.3-70B	131K	~4K	Text	15 RPM, 150 RPD
DeepSeek-R1	64K	8K	Text (reasoning)	15 RPM, 150 RPD
Mistral-Small-3.1	128K	~4K	Text + Vision	15 RPM, 150 RPD
+ 35 more models	Varies	Varies	Text / Image	Varies by tier

Groq 🇺🇸

Free tier, no credit card. Ultra-fast LPU inference. [^2]

Base URL: https://api.groq.com/openai/v1

Model Name	Context	Max Output	Modality	Rate Limit
llama-3.3-70b-versatile	131K	32K	Text	30 RPM, 1,000 RPD
llama-3.1-8b-instant	131K	131K	Text	30 RPM, 1,000 RPD
llama-4-scout-17b-16e-instruct	131K	8K	Text + Vision	30 RPM, 1,000 RPD
qwen3-32b	131K	131K	Text	30 RPM, 1,000 RPD
gpt-oss-120b	131K	32K	Text	30 RPM, 1,000 RPD

Hugging Face 🇺🇸

100K monthly Inference Provider credits for free users. Routes to Fireworks, Together, Hyperbolic, Nebius, Novita, DeepInfra and others. Thousands of models.

Base URL: https://router.huggingface.co/v1

Model Name	Context	Max Output	Modality	Rate Limit
Meta-Llama-3.1-8B-Instruct	128K	~4K	Text	Credit-metered
Mistral-7B-Instruct-v0.3	32K	~4K	Text	Credit-metered
Mixtral-8x7B-Instruct-v0.1	32K	~4K	Text	Credit-metered
Phi-3.5-mini-instruct	128K	~4K	Text	Credit-metered
Qwen2.5-7B-Instruct	131K	~4K	Text	Credit-metered
+ thousands of community models	Varies	Varies	Text, Image, Audio, Embeddings	100K credits/month free

Kilo Code 🇺🇸

Free models with no credit card required. kilo-auto/free auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). [^5]

Base URL: https://api.kilo.ai/api/gateway

Model Name	Context	Max Output	Modality	Rate Limit
`x-ai/grok-code-fast-1:free`	256K	—	Text (code)	~200 req/hr
`minimax/minimax-m2.5:free`	196K	8K	Text	~200 req/hr
`bytedance-seed/dola-seed-2.0-pro:free`	—	—	Text	~200 req/hr
`nvidia/nemotron-3-super-120b-a12b:free`	262K	32K	Text	~200 req/hr
`arcee-ai/trinity-large-thinking:free`	—	—	Text (reasoning)	~200 req/hr
`openrouter/free`	Varies	Varies	Text	~200 req/hr

LLM7.io 🇬🇧

Zero-friction API gateway. No registration needed for basic access. 30+ models. GDPR-compliant.

Base URL: https://api.llm7.io/v1

Model Name	Context	Max Output	Modality	Rate Limit
deepseek-r1-0528	—	—	Text (reasoning)	30 RPM (120 with token)
deepseek-v3-0324	—	—	Text	30 RPM (120 with token)
gemini-2.5-flash-lite	—	—	Text + Vision	30 RPM (120 with token)
gpt-4o-mini	—	—	Text + Vision	30 RPM (120 with token)
mistral-small-3.1-24b	32K	—	Text	30 RPM (120 with token)
qwen2.5-coder-32b	—	—	Text (code)	30 RPM (120 with token)
+ ~24 more models	Varies	Varies	Text	30 RPM (120 with token)

ModelScope 🇨🇳

Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification. [^6]

Base URL: https://api-inference.modelscope.cn/v1

Model Name	Context	Max Output	Modality	Rate Limit
`Qwen/Qwen3.5-35B-A3B`	—	—	Text	2,000 RPD total; <=500 RPD/model (dynamic)
`Qwen/Qwen3.5-27B`	—	—	Text	2,000 RPD total; <=500 RPD/model (dynamic)
+ API-Inference-enabled models	Varies	Varies	LLM, MLLM	Dynamic quotas + dynamic concurrency

NVIDIA NIM 🇺🇸

Free with NVIDIA Developer Program membership. 100+ models. Rate-limited (no daily token cap).

Base URL: https://integrate.api.nvidia.com/v1

Model Name	Context	Max Output	Modality	Rate Limit
`deepseek-ai/deepseek-r1`	128K	~163K	Text (reasoning)	~40 RPM
`nvidia/nemotron-3-super-120b-a12b`	262K	262K	Text	~40 RPM
`nvidia/nemotron-3-nano-30b-a3b`	128K	32K	Text	~40 RPM
`nvidia/llama-3.1-nemotron-ultra-253b-v1`	128K	4K	Text	~40 RPM
`meta/llama-3.1-405b-instruct`	128K	4K

Core symbols most depended-on inside this repo

alignTable

called by 2

scripts/generate-readme.js

formatRow

called by 1

scripts/generate-readme.js

formatModelName

called by 1

scripts/generate-readme.js

buildTable

called by 1

scripts/generate-readme.js

pad

called by 0

scripts/generate-readme.js

buildProviderSection

called by 0

scripts/generate-readme.js

Shape

Function 6

Languages

TypeScript100%

Modules by API surface

scripts/generate-readme.js6 symbols

For agents

$ claude mcp add awesome-free-llm-apis \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/mnfst/awesome-free-llm-apis @main sqlite

Contents

Provider APIs

Aion Labs 🇮🇱

Cohere 🇨🇦

Google Gemini 🇺🇸

Mistral AI 🇫🇷

Z AI (Zhipu AI) 🇨🇳

Inference providers

Cerebras 🇺🇸

Cloudflare Workers AI 🇺🇸

GitHub Models 🇺🇸

Groq 🇺🇸

Hugging Face 🇺🇸

Kilo Code 🇺🇸

LLM7.io 🇬🇧

ModelScope 🇨🇳

NVIDIA NIM 🇺🇸

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

For agents