A project-based course on building the environment, state management, verification, and control mechanisms that make AI coding agents work reliably.
🌍 This course is available in 15 languages: English, 简体中文, 繁體中文, 日本語, 한국어, Español, Français, Русский, Deutsch, العربية, Tiếng Việt, Oʻzbekcha, Türkçe, Portuguese (BR), Українська. Choose your language from the badges above.
Learn Harness Engineering is a course dedicated to the engineering of AI coding agents. We have deeply studied and synthesized the most advanced Harness Engineering theories and practices in the industry. Our core references include:
Quick start? The
skills/harness-creator/skill can help you scaffold a production-grade harness (AGENTS.md, feature lists, init.sh, verification workflows) for your own project in minutes.
A comprehensive course outline and introduction to core philosophies, providing a clear path to get started.

Deep dives into real-world pain points and hands-on projects (like Project 01) for an immersive learning experience.

Templates and reference configurations designed to solve common pitfalls in multi-turn AI agent development, such as context loss and premature task completion.

The repository now includes a PDF build pipeline for the course content.
npm run pdf:build to generate the currently configured PDF coursebooks locally.artifacts/pdfs/.npm run screenshots:readme if you want to refresh the README preview images.release-course-pdfs.yml can build the PDFs and publish them to GitHub Releases.There's a hard truth most people learn the hard way: the strongest model in the world will still fail on real engineering tasks if you don't build a proper environment around it.
You've probably seen this yourself. You give Claude or GPT a task in your repo. It starts well — reads files, writes code, looks productive. Then something goes wrong. It skips a step. It breaks a test. It says "done" but nothing actually works. You spend more time cleaning up than if you'd done it yourself.
This isn't a model problem. It's a harness problem.
The evidence is clear. Anthropic ran a controlled experiment: same model (Opus 4.5), same prompt ("build a 2D retro game editor"). Without a harness, it spent $9 in 20 minutes and produced something that didn't work. With a full harness (planner + generator + evaluator), it spent $200 in 6 hours and built a game you could actually play. The model didn't change. The harness did.
OpenAI reported the same thing with Codex: in a well-harnessed repository, the same model goes from "unreliable" to "reliable." Not a marginal improvement — a qualitative shift.
This course teaches you how to build that environment.
THE HARNESS PATTERN
====================
You --> give task --> Agent reads harness files --> Agent executes
|
harness governs every step:
|
+--> Instructions: what to do, in what order
+--> Scope: one feature at a time, no overreach
+--> State: progress log, feature list, git history
+--> Verification: tests, lint, type-check, smoke runs
+--> Lifecycle: init at start, clean state at end
|
v
Agent stops only when
verification passes
Harness engineering is about building a complete working environment around the model so it produces reliable results. It's not about writing better prompts. It's about designing the system the model operates inside.
A harness has five subsystems:
┌────────────────────────────────────────────────────────────────┐
│ THE HARNESS │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Instructions │ │ State │ │ Verification │ │
│ │ │ │ │ │ │ │
│ │ AGENTS.md │ │ progress.md │ │ tests + lint │ │
│ │ CLAUDE.md │ │ feature_list │ │ type-check │ │
│ │ feature_list │ │ git log │ │ smoke runs │ │
│ │ docs/ │ │ session hand │ │ e2e pipeline │ │
│ └──────────────┘ └──────────────┘ └────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────────────────────────────┐ │
│ │ Scope │ │ Session Lifecycle │ │
│ │ │ │ │ │
│ │ one feature │ │ init.sh at start │ │
│ │ at a time │ │ clean-state checklist at end │ │
│ │ definition │ │ handoff note for next session │ │
│ │ of done │ │ commit only when safe to resume │ │
│ └──────────────┘ └──────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
The MODEL decides what code to write.
The HARNESS governs when, where, and how it writes it.
The harness doesn't make the model smarter.
It makes the model's output reliable.
Each subsystem has one job:
The question isn't "can models write code?" They can. The question is: can they reliably complete real engineering tasks inside real repositories, over multiple sessions, without constant human supervision?
Right now, the answer is: not without a harness.
WITHOUT HARNESS WITH HARNESS
============== ============
Session 1: agent writes code Session 1: agent reads instructions
agent breaks tests agent runs init.sh
agent says "done" agent works on one feature
you fix it manually agent verifies before claiming done
agent updates progress log
Session 2: agent starts fresh agent commits clean state
agent has no memory
of what happened before Session 2: agent reads progress log
agent re-does work agent picks up exactly where it left off
or does something else entirely agent continues the unfinished feature
you fix it again you review, not rescue
Result: you spend more time Result: agent does the work,
cleaning up than if you you verify the result
did it yourself
The questions this course actually cares about:
For the full course materials, please visit the Documentation Website.
The curriculum is divided into three parts:
AGENTS.md, feature_list.json, init.sh, etc.) to use in your own repositories today.You don't need to read all 12 lectures before you start getting value. If you're already using a coding agent on a real project, here's how to improve it right now.
The idea is simple: instead of just writing prompts, give your agent a set of structured files that define what to do, what's been done, and how to verify the work. These files live inside your repo, so every session starts from the same state.
```text YOUR PROJECT ROOT ├── AGENTS.md <-- the agent's operating manual ├── CLAUDE.md <-- (alternative, if using Claude Code) ├── init.sh <-- runs install + verify + start ├── feature_list.json <-- what features exist, which are done ├── claude-progress.md <-- what happen
$ claude mcp add learn-harness-engineering \
-- python -m otcore.mcp_server <graph>