MCPcopy Index your code
hub / github.com/harvard-edge/cs249r_book

github.com/harvard-edge/cs249r_book @vol2-v0.2.0 sqlite

repository ↗ · DeepWiki ↗ · release vol2-v0.2.0 ↗
32,557 symbols 146,232 edges 1,690 files 6,392 documented · 20% 50 cross-repo links
README

Machine Learning Systems

Principles and Practices of Engineering Artificially Intelligent Systems

English中文日本語한국어

Book TinyTorch Labs Kits MLSys·im

Slides Instructors StaffML Newsletter Updated

License Cite Fund Us

📘 Textbook📗 Vol I + 📘 Vol II🔥 TinyTorch🔬 Labs🔮 MLSys·im💼 StaffML

📚 Hardcopy edition coming 2026 with MIT Press.


Mission

The world is rushing to build AI systems. It is not engineering them.

That gap is what we mean by AI engineering.

AI engineering is the discipline of building efficient, reliable, safe, and robust intelligent systems that operate in the real world, not just models in isolation. Our mission is to establish AI engineering as a foundational discipline alongside software engineering and computer engineering, by teaching how to design, build, and evaluate end-to-end intelligent systems.

Our goal: Help 100,000 learners master ML Systems this year, and reach 1 million by 2030.


Why One Repository

I designed this as a single integrated curriculum, not a collection of independent projects. The textbook teaches the theory. TinyTorch makes you build the internals. The hardware kits force you to confront real constraints. The simulator lets you reason about infrastructure you can't afford to rent. Each piece exists because I found that students who only read don't internalize, and students who only code don't generalize.

The repository is the curriculum.

A growing community of contributors helps improve every part of it: fixing errors, sharpening explanations, testing on new hardware. Their work makes this better for everyone, and I'm grateful for every pull request.


The Curriculum

Every component connects. The textbook gives you the mental models. The labs let you reason through trade-offs interactively, powered by MLSys·im — a modeling engine for infrastructure you can't physically access, and a standalone tool in its own right. TinyTorch makes you build the machinery yourself. The hardware kits put you face-to-face with real deployment constraints. StaffML tests whether you actually understand it. Socratiq adds AI-guided reading, contextual quizzes, and spaced repetition inside the learning experience. And the instructor hub, slides, and newsletter give educators everything they need to bring this into a classroom.

Curriculum map showing how the textbook, labs, TinyTorch, hardware kits, MLSys·im, and StaffML connect

For Students

Component Role in the Curriculum Link
📖 Textbook Two-volume MIT Press textbook. The theory, the mental models, and the quantitative reasoning that everything else builds on. Vol I · Vol II
🔬 Labs Interactive Marimo notebooks where you explore trade-offs from the textbook: change a parameter, see what breaks, build intuition. Powered by MLSys·im under the hood. Launch labs · Repo guide
🔥 Tiny🔥Torch Build your own ML framework from scratch across 20 progressive modules. You don't understand a system until you've built one. Get started
🛠️ Hardware Kits Deploy ML to Arduino, Seeed, Grove, and Raspberry Pi devices. Real memory limits, real power budgets, real latency. Browse labs
🔮 MLSys·im Calculate memory bottlenecks, network saturation, and scheduling limits at infrastructure scales you can't physically access. Use simulator · Repo guide
💼 StaffML Physics-grounded interview questions for ML systems roles. Vault, practice drills, mock interviews, and progress tracking. Practice · Repo guide

For Educators

Component What It Provides Link
🎓 Instructor Hub The AI Engineering Blueprint: two 16-week syllabi, pedagogy guide, assessment rubrics, and a TA handbook. View hub · Repo guide
🎬 Lecture Slides Beamer slide decks for every chapter, with four theme variants. Drop into your course and teach. Browse decks · Repo guide
📬 Newsletter Updates on the curriculum, new chapters, and what the community is building. Subscribe

Choose Your Path

The pieces are designed to work together, but you do not need to adopt everything at once.

If you are... Start here Then go deeper
A student or self-learner Read Volume I and try Lab 00 Build TinyTorch, use MLSys·im, and practice with StaffML
An instructor Open The AI Engineering Blueprint Use the course map, slides, rubrics, and TA guide
A contributor Pick the component you use most Improve chapters, labs, tests, examples, hardware notes, simulator models, or assessment content

The learning loop is: Read → Explore → Build → Model → Deploy → Practice → Teach.

Adjacent and Experimental Work

Some projects are intentionally earlier-stage than the main curriculum:

  • Socratiq explores AI-guided reading, contextual quizzes, and spaced repetition for static learning sites.
  • MLPerf EDU is an under-construction pedagogical benchmark suite aligned with MLCommons MLPerf.
  • ML Systems Design Grammar is an experimental framework for reasoning from stable primitives, constraints, and rewrite rules.

What You Will Learn

This textbook teaches you to think at the intersection of machine learning and systems engineering. Each chapter bridges algorithmic concepts with the infrastructure that makes them work in practice.

You know... You will learn...
How to train a model How training scales across GPU clusters
That quantization shrinks models How INT8 math maps to silicon
What a transformer is Why KV-cache dominates memory at inference
Models run on GPUs How schedulers balance latency vs throughput
Edge devices have limits How to co-design models and hardware

Book Structure

The textbook follows the Hennessy & Patterson pedagogical model across two volumes:

Volume Theme Scope
📗 Volume I Build, Optimize, Deploy Single-machine ML systems (1–8 GPUs). Foundations, optimization, and deployment on one node.
📘 Volume II Scale, Distribute, Govern Distributed systems at production scale. Multi-machine infrastructure, fault tolerance, and governance.

FAQ

Who is this for, and what should I know first?

This is for anyone who wants to engineer intelligent systems, not only train models: students, working engineers moving into ML infrastructure, and educators building a course. We assume you can program in Python and have met basic machine learning ideas, but the book builds the systems concepts from the ground up. You do not need a background in computer architecture, distributed systems, or datacenter ope

Extension points exported contracts — how you extend this code

ConfigEntry (Interface)
Parsed entry from config: relative path (e.g. "frontmatter/dedication.qmd") and order index.
book/vscode-ext/src/utils/chapters.ts
TitoModuleRecord (Interface)
Raw module record returned by `tito module list --json`
tinytorch/vscode-ext/src/utils/modules.ts
DropdownItem (Interface)
* MLSysBook ecosystem navbar — identical structure and appearance to the * Quarto Bootstrap navbar, but rendered with i
interviews/staffml/src/components/EcosystemBar.tsx
CommandRunRecord (Interface)
(no doc)
labs/vscode-ext/src/types.ts
CommandRunRecord (Interface)
(no doc)
kits/vscode-ext/src/types.ts
Env (Interface)
(no doc)
interviews/staffml-vault-worker/src/types.ts
ChainRef (Interface)
(no doc)
interviews/staffml-vault-types/index.ts
CommandRunRecord (Interface)
(no doc)
mlsysim/vscode-ext/src/types.ts

Core symbols most depended-on inside this repo

append
called by 6079
book/tools/scripts/socratiQ/bundle.js
len
called by 3986
book/tools/scripts/socratiQ/bundle.js
push
called by 3722
book/tools/scripts/socratiQ/bundle.js
push
called by 3369
book/quarto/tools/scripts/socratiQ/bundle.js
b
called by 2195
book/tools/scripts/socratiQ/bundle.js
b
called by 2195
book/quarto/tools/scripts/socratiQ/bundle.js
get
called by 1995
interviews/vault-cli/src/vault_cli/chains/embeddings.py
join
called by 1969
book/tools/scripts/socratiQ/bundle.js

Shape

Function 18,771
Method 11,906
Class 1,709
Interface 151
Route 20

Languages

TypeScript66%
Python34%

Modules by API surface

book/tools/scripts/socratiQ/bundle.js8,587 symbols
book/quarto/tools/scripts/socratiQ/bundle.js8,587 symbols
book/quarto/tools/scripts/socratiQ/collaborative-widget-bridge.js645 symbols
book/quarto/tools/scripts/socratiQ/collaborative-widget-bridge.umd.cjs636 symbols
mlsysim/tests/test_fmt.py220 symbols
book/cli/commands/validate.py216 symbols
mlsysim/tests/test_solver_suite.py191 symbols
mlsysim/tests/test_formulas.py142 symbols
book/tools/scripts/margin_figures/generate_margin_figures.py141 symbols
book/tools/scripts/quizzes/_legacy/quizzes_reference.py123 symbols
mlsysim/mlsysim/fmt.py116 symbols
book/cli/commands/layout.py99 symbols

Dependencies from manifests, versioned

@cloudflare/workers-types4.20260518.1 · 1×
@leeoniya/ufuzzy1.0.17 · 1×
@playwright/test1.42.0 · 1×
@react-sigma/core5.0.6 · 1×
@testing-library/react16.3.2 · 1×
@types/js-quantities1.6.6 · 1×
@types/js-yaml4.0.9 · 1×
@types/katex0.16.8 · 1×
@types/node25.9.0 · 1×
@types/react19 · 1×

For agents

$ claude mcp add cs249r_book \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact