<a href="https://kiln.tech">
<img width="205" alt="Kiln AI Logo" src="https://github.com/user-attachments/assets/4ca9b69f-1c90-43a4-8d2e-13de4eb2ee9c">
</a>
Highlights • Evals • Auto-Optimize • RAG • Agents • Fine-Tuning • Synthetic Data • All Docs
Kiln is a workbench for the full AI development loop: evals, optimization, prompts, RAG, fine-tuning, synthetic data, agents, and tools - all working together. The desktop app lets your whole team contribute (PMs, subject-experts, and QA can rate outputs and add data without writing code). The MIT-licensed Python library ships the same tasks to production. Runs locally - bring your own API keys, or go fully offline with Ollama.
Get started in minutes - one-click install.
Download Kiln Desktop for macOS, Windows, or Linux, then follow the 5-minute quickstart to run your first task.
Prefer to start in code? See the Python library quickstart.
Watch a 2-minute overview, or our end-to-end project demo (20 minutes).
Most AI tooling forces a tradeoff: a code-only framework that covers one slice (orchestration or evals or RAG), or a paid SaaS that locks in your data and can't be extended. Kiln is a free, local-first workbench where a single task and dataset flow through evals, prompt optimization, fine-tuning, RAG, agents, and synthetic data — all in one tool.
One dataset, every technique. Define a task once. Eval it, optimize the prompt, fine-tune a model, generate synthetic data, add RAG — all against the same dataset, with results that compound across stages.
Track every axis. Move fast. Don't regress. Keeping agents running well is hard — a prompt change quietly regresses behavior three steps downstream; a model upgrade improves five things and breaks two. Kiln tracks quality across every dimension you care about, so you iterate without breaking what already works.
Optimization, not just evaluation. Other tools tell you how a prompt scores, but not how to fix it. Kiln's Auto-Optimize searches across hundreds of prompt mutations and models to find what works best for every eval dimension.
GUI for the whole team, library for engineers. Kiln's desktop app lets PMs rate outputs, SMEs add training examples, and QA flag regressions — without a terminal. Engineers ship the same tasks via an MIT-licensed Python library. Data scientists can use the library in notebooks and experiments.
Local-first. Most AI platforms are SaaS-only. Kiln runs entirely on your machine. Bring your own API keys, or go fully offline with Ollama. Your data never leaves your control. Team-sync is provided via Git infrastructure you already own.
190+ models tested across every provider. Skip the guesswork — we've tested every model's capabilities across all major providers. OpenAI, Anthropic, Gemini, Bedrock, Ollama, OpenRouter, Fireworks, Groq, any OpenAI-compatible endpoint, and more. Swap models with confidence.
Build AI tasks in the app. Deploy with the open-source library. Same engine, same project files, no rewrite. The MIT-licensed kiln-ai library is the same library used in the app. Load Kiln projects, run tasks, build fine-tunes, work in notebooks, integrate Pandas/Polars dataframes, and more.
pip install kiln-ai
📚 Library docs · REST API · PyPI
Full docs at docs.kiln.tech. Common starting points:
See CONTRIBUTING.md for development setup and contribution guidelines.
Kiln's core Python library and REST server are MIT-licensed. The desktop app is source-available, free to use, and built on the fair-code model — so Kiln stays free for individuals while remaining sustainable.
Datasets are open JSON. You own and control your datasets.
Kiln Pro is our service that adds the AI Assistant, Auto-Optimize, and the Eval Builder. It's opt-in, and the core Kiln app remains fully functional without it.
The Kiln name and logos are trademarks of Chesterfield Laboratories Inc.
Copyright 2024 — Chesterfield Laboratories Inc.