hub / github.com/walkinglabs/learn-harness-engineering

github.com/walkinglabs/learn-harness-engineering @main

repository ↗ · DeepWiki ↗ · Ask this repo → · + Follow

2,485 symbols 5,187 edges 505 files 415 documented · 17% ● updated 5d ago★ 9,85712 open issues

README

Learn Harness Engineering

A project-based course on building the environment, state management, verification, and control mechanisms that make AI coding agents work reliably.

🌍 This course is available in 15 languages: English, 简体中文, 繁體中文, 日本語, 한국어, Español, Français, Русский, Deutsch, العربية, Tiếng Việt, Oʻzbekcha, Türkçe, Portuguese (BR), Українська. Choose your language from the badges above.

Learn Harness Engineering is a course dedicated to the engineering of AI coding agents. We have deeply studied and synthesized the most advanced Harness Engineering theories and practices in the industry. Our core references include:

Quick start? The skills/harness-creator/ skill can help you scaffold a production-grade harness (AGENTS.md, feature lists, init.sh, verification workflows) for your own project in minutes.

✨ Visual Preview
What Harness Engineering Actually Means
Quick Start: Improve Your Agent Today
Capstone Project: A Real App
Learning Path
Syllabus
Skills
Other Courses

✨ Visual Preview

🏠 Course Homepage

A comprehensive course outline and introduction to core philosophies, providing a clear path to get started.

Course homepage preview

📖 Immersive Lectures

Deep dives into real-world pain points and hands-on projects (like Project 01) for an immersive learning experience.

Course lecture preview

🗂️ Ready-to-Use Resource Library

Templates and reference configurations designed to solve common pitfalls in multi-turn AI agent development, such as context loss and premature task completion.

Resource library preview

PDF Coursebooks

The repository now includes a PDF build pipeline for the course content.

Run npm run pdf:build to generate the currently configured PDF coursebooks locally.
Output files are written to artifacts/pdfs/.
Run npm run screenshots:readme if you want to refresh the README preview images.
GitHub Actions workflow release-course-pdfs.yml can build the PDFs and publish them to GitHub Releases.

The Model Is Smart, The Harness Makes It Reliable

There's a hard truth most people learn the hard way: the strongest model in the world will still fail on real engineering tasks if you don't build a proper environment around it.

You've probably seen this yourself. You give Claude or GPT a task in your repo. It starts well — reads files, writes code, looks productive. Then something goes wrong. It skips a step. It breaks a test. It says "done" but nothing actually works. You spend more time cleaning up than if you'd done it yourself.

This isn't a model problem. It's a harness problem.

The evidence is clear. Anthropic ran a controlled experiment: same model (Opus 4.5), same prompt ("build a 2D retro game editor"). Without a harness, it spent $9 in 20 minutes and produced something that didn't work. With a full harness (planner + generator + evaluator), it spent $200 in 6 hours and built a game you could actually play. The model didn't change. The harness did.

OpenAI reported the same thing with Codex: in a well-harnessed repository, the same model goes from "unreliable" to "reliable." Not a marginal improvement — a qualitative shift.

This course teaches you how to build that environment.

                    THE HARNESS PATTERN
                    ====================

    You --> give task --> Agent reads harness files --> Agent executes
                                                        |
                                              harness governs every step:
                                              |
                                              +--> Instructions: what to do, in what order
                                              +--> Scope:        one feature at a time, no overreach
                                              +--> State:        progress log, feature list, git history
                                              +--> Verification: tests, lint, type-check, smoke runs
                                              +--> Lifecycle:    init at start, clean state at end
                                              |
                                              v
                                         Agent stops only when
                                         verification passes

What Harness Engineering Actually Means

Harness engineering is about building a complete working environment around the model so it produces reliable results. It's not about writing better prompts. It's about designing the system the model operates inside.

A harness has five subsystems:

    ┌────────────────────────────────────────────────────────────────┐
    │                          THE HARNESS                           │
    │                                                                │
    │   ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐   │
    │   │ Instructions │  │    State     │  │   Verification     │   │
    │   │              │  │              │  │                    │   │
    │   │ AGENTS.md    │  │ progress.md  │  │ tests + lint       │   │
    │   │ CLAUDE.md    │  │ feature_list │  │ type-check         │   │
    │   │ feature_list │  │ git log      │  │ smoke runs         │   │
    │   │ docs/        │  │ session hand │  │ e2e pipeline       │   │
    │   └──────────────┘  └──────────────┘  └────────────────────┘   │
    │                                                                │
    │   ┌──────────────┐  ┌──────────────────────────────────────┐   │
    │   │    Scope     │  │         Session Lifecycle            │   │
    │   │              │  │                                      │   │
    │   │ one feature  │  │ init.sh at start                     │   │
    │   │ at a time    │  │ clean-state checklist at end         │   │
    │   │ definition   │  │ handoff note for next session        │   │
    │   │ of done      │  │ commit only when safe to resume      │   │
    │   └──────────────┘  └──────────────────────────────────────┘   │
    │                                                                │
    └────────────────────────────────────────────────────────────────┘

    The MODEL decides what code to write.
    The HARNESS governs when, where, and how it writes it.
    The harness doesn't make the model smarter.
    It makes the model's output reliable.

Each subsystem has one job:

Instructions — Tell the agent what to do, in what order, and what to read before starting. Not one giant file; a progressive disclosure structure the agent navigates on demand.
State — Track what's been done, what's in progress, and what's next. Persisted to disk so the next session picks up exactly where the last one left off.
Verification — Only a passing test suite counts as evidence. The agent cannot declare victory without runnable proof.
Scope — Constrain the agent to one feature at a time. No overreach. No half-finishing three things. No rewriting the feature list to hide unfinished work.
Session Lifecycle — Initialize at the start. Clean up at the end. Leave a clean restart path for the next session.

Why This Course Exists

The question isn't "can models write code?" They can. The question is: can they reliably complete real engineering tasks inside real repositories, over multiple sessions, without constant human supervision?

Right now, the answer is: not without a harness.

    WITHOUT HARNESS                            WITH HARNESS
    ==============                             ============

    Session 1: agent writes code               Session 1: agent reads instructions
               agent breaks tests                         agent runs init.sh
               agent says "done"                          agent works on one feature
               you fix it manually                        agent verifies before claiming done
                                                          agent updates progress log
    Session 2: agent starts fresh                         agent commits clean state
               agent has no memory
               of what happened before         Session 2: agent reads progress log
               agent re-does work                         agent picks up exactly where it left off
               or does something else entirely            agent continues the unfinished feature
               you fix it again                           you review, not rescue

    Result: you spend more time                Result: agent does the work,
            cleaning up than if you                    you verify the result
            did it yourself

The questions this course actually cares about:

Which harness designs improve task completion rates?
Which designs reduce rework and incorrect completions?
Which mechanisms keep long-running tasks progressing steadily?
Which structures keep the system maintainable after multiple agent runs?

Course Curriculum & Documentation

For the full course materials, please visit the Documentation Website.

The curriculum is divided into three parts:

Lectures: 12 conceptual units explaining the theory behind harness engineering.
Projects: 6 hands-on projects where you build an agentic workspace from scratch.
Resource Library: Copy-ready templates (AGENTS.md, feature_list.json, init.sh, etc.) to use in your own repositories today.

Quick Start: Improve Your Agent Today

You don't need to read all 12 lectures before you start getting value. If you're already using a coding agent on a real project, here's how to improve it right now.

The idea is simple: instead of just writing prompts, give your agent a set of structured files that define what to do, what's been done, and how to verify the work. These files live inside your repo, so every session starts from the same state.

```text YOUR PROJECT ROOT ├── AGENTS.md <-- the agent's operating manual ├── CLAUDE.md <-- (alternative, if using Claude Code) ├── init.sh <-- runs install + verify + start ├── feature_list.json <-- what features exist, which are done ├── claude-progress.md <-- what happen

Extension points exported contracts — how you extend this code

Window (Interface)

(no doc)

projects/project-01/solution/src/renderer/types.d.ts

Window (Interface)

(no doc)

projects/project-04/solution/src/renderer/types.d.ts

Window (Interface)

(no doc)

projects/project-02/solution/src/renderer/types.d.ts

Window (Interface)

(no doc)

projects/shared/src/renderer/types.d.ts

Window (Interface)

(no doc)

projects/project-05/solution/plan-gen-eval/src/renderer/types.d.ts

Window (Interface)

(no doc)

projects/project-06/solution/src/renderer/types.d.ts

Window (Interface)

(no doc)

projects/project-03/solution/src/renderer/types.d.ts

PipelineStep (Interface)

(no doc)

docs/ko/lectures/lecture-10-why-end-to-end-testing-changes-results/code/e2e-runner.ts

Core symbols most depended-on inside this repo

log

called by 4111

projects/project-05/starter/src/services/logger.ts

error

called by 130

projects/project-05/starter/src/services/logger.ts

info

called by 41

projects/project-05/solution/plan-gen-eval/src/services/logger.ts

info

called by 41

projects/project-05/solution/single-role/src/services/logger.ts

info

called by 41

projects/project-05/solution/gen-eval/src/services/logger.ts

info

called by 41

projects/project-05/starter/src/services/logger.ts

info

called by 38

projects/project-04/solution/src/services/logger.ts

pad

called by 34

docs/uz/lectures/lecture-12-why-every-session-must-leave-a-clean-state/code/benchmark-runner.ts

Shape

Function 1,104

Interface 632

Method 594

Class 148

Enum 7

Languages

TypeScript99%

Python1%

Modules by API surface

skills/harness-creator/scripts/lib/harness-utils.mjs28 symbols

projects/project-06/starter/src/services/logger.ts20 symbols

projects/project-06/solution/src/services/logger.ts20 symbols

projects/project-05/starter/src/services/logger.ts20 symbols

projects/project-05/solution/single-role/src/services/logger.ts20 symbols

projects/project-05/solution/plan-gen-eval/src/services/logger.ts20 symbols

projects/project-05/solution/gen-eval/src/services/logger.ts20 symbols

projects/project-04/solution/src/services/logger.ts20 symbols

projects/project-06/starter/src/services/persistence-service.ts16 symbols

projects/project-06/solution/src/services/persistence-service.ts16 symbols

projects/shared/src/services/persistence-service.ts15 symbols

projects/project-05/starter/src/services/persistence-service.ts15 symbols

Dependencies from manifests, versioned

@types/react18.3.12 · 1×

@types/react-dom18.3.1 · 1×

@types/uuid9.0.7 · 1×

@vitejs/plugin-react4.3.4 · 1×

electron33.2.0 · 1×

github-slugger2.0.0 · 1×

mermaid11.14.0 · 1×

pdf-lib1.17.1 · 1×

playwright1.59.1 · 1×

react18.3.1 · 1×

react-dom18.3.1 · 1×

tsx4.19.0 · 1×

For agents

$ claude mcp add learn-harness-engineering \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/walkinglabs/learn-harness-engineering @main

Learn Harness Engineering

Table of Contents

✨ Visual Preview

🏠 Course Homepage

📖 Immersive Lectures

🗂️ Ready-to-Use Resource Library

PDF Coursebooks

The Model Is Smart, The Harness Makes It Reliable

What Harness Engineering Actually Means

Why This Course Exists

Course Curriculum & Documentation

Quick Start: Improve Your Agent Today

Extension points exported contracts — how you extend this code

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents