<img alt="Sandcastle" src="https://res.cloudinary.com/total-typescript/image/upload/v1775033787/readme-sandcastle-onlight_2x.png" height="200" style="margin-bottom: 20px;">
A TypeScript library for orchestrating AI coding agents in isolated sandboxes:
sandcastle.run().Sandcastle is provider-agnostic — it ships with built-in providers for Docker, Podman, and Vercel, and you can create your own. Great for parallelizing multiple AFK agents, creating review pipelines, or even just orchestrating your own agents.
@vercel/sandboxcreateBindMountSandboxProvider or createIsolatedSandboxProvidernpm install --save-dev @ai-hero/sandcastle
npx @ai-hero/sandcastle init. This scaffolds a .sandcastle directory with all the files needed.npx @ai-hero/sandcastle init
.sandcastle/.env and fill in your default values for CLAUDE_CODE_OAUTH_TOKEN (run claude setup-token on your host to get one). To use an Anthropic API key instead, uncomment and fill in ANTHROPIC_API_KEY.cp .sandcastle/.env.example .sandcastle/.env
.sandcastle/main.ts (or main.mts) file with npx tsxnpx tsx .sandcastle/main.ts
// 3. Run the agent via the JS API
import { run, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
await run({
agent: claudeCode("claude-opus-4-8"),
sandbox: docker(), // or podman(), vercel(), or your own provider
promptFile: ".sandcastle/prompt.md",
});
Sandcastle uses a SandboxProvider to create isolated environments. The sandbox option on run(), interactive(), and createSandbox() accepts any provider, including noSandbox() — opt in to running the agent directly on the host when container isolation is undesired. Built-in providers:
| Provider | Import path | Type | Accepted by |
|---|---|---|---|
| Docker | @ai-hero/sandcastle/sandboxes/docker |
Bind-mount | run(), createSandbox(), interactive() |
| Podman | @ai-hero/sandcastle/sandboxes/podman |
Bind-mount | run(), createSandbox(), interactive() |
| Vercel | @ai-hero/sandcastle/sandboxes/vercel |
Isolated | run(), createSandbox(), interactive() |
| No-sandbox | @ai-hero/sandcastle/sandboxes/no-sandbox |
None | run(), createSandbox(), interactive() |
Worktree methods (wt.run(), wt.interactive(), wt.createSandbox()) accept the same providers as their top-level counterparts. wt.interactive() defaults to noSandbox() when no sandbox is specified.
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
import { podman } from "@ai-hero/sandcastle/sandboxes/podman";
import { vercel } from "@ai-hero/sandcastle/sandboxes/vercel";
import { noSandbox } from "@ai-hero/sandcastle/sandboxes/no-sandbox";
// Docker, Podman, and Vercel are interchangeable in run() and createSandbox():
await run({
agent: claudeCode("claude-opus-4-8"),
sandbox: docker(),
prompt: "...",
});
// No-sandbox runs the agent directly on the host — accepted by run(),
// createSandbox(), and interactive(). Skips container isolation entirely:
await interactive({
agent: claudeCode("claude-opus-4-8"),
sandbox: noSandbox(),
prompt: "...", // optional — omit to launch the TUI with no initial prompt
cwd: "/path/to/other-repo", // optional — defaults to process.cwd()
});
You can also create your own provider using createBindMountSandboxProvider or createIsolatedSandboxProvider.
Sandcastle exports a programmatic run() function for use in scripts, CI pipelines, or custom tooling. The examples below use docker(), but any SandboxProvider works in its place.
import { run, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
const result = await run({
agent: claudeCode("claude-opus-4-8"),
sandbox: docker(),
promptFile: ".sandcastle/prompt.md",
});
console.log(result.iterations.length); // number of iterations executed
console.log(result.iterations); // per-iteration results with optional sessionId
console.log(result.commits); // array of { sha } for commits created
console.log(result.branch); // target branch name
import { run, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
const result = await run({
// Agent provider — required. Pass a model string to claudeCode().
// Optional second arg for provider-specific options like effort level.
agent: claudeCode("claude-opus-4-8", { effort: "high" }),
// Sandbox provider — required. Any SandboxProvider works (docker, podman, vercel, or custom).
// Provider-specific config (like imageName, mounts) lives inside the provider factory call.
sandbox: docker({
imageName: "sandcastle:local",
// Optional: override the UID/GID used for --user flag (defaults to host UID/GID).
// Must match the UID baked into the image. Pre-flight check catches mismatches.
// containerUid: 1000,
// containerGid: 1000,
// Optional: mount host directories into the sandbox (e.g. package manager caches)
// hostPath supports absolute, tilde-expanded (~), and relative paths (resolved from cwd).
// sandboxPath supports absolute and relative paths (resolved from the sandbox repo directory).
mounts: [
{ hostPath: "~/.npm", sandboxPath: "/home/agent/.npm", readonly: true },
{ hostPath: "data", sandboxPath: "data" }, // mounts <cwd>/data → <sandbox-repo>/data
],
// Optional: SELinux volume label — "z" (default, shared), "Z" (private), or false (none).
// No-op on non-SELinux systems (Docker Desktop on macOS/Windows, Linux without SELinux).
selinuxLabel: "z",
// Optional: provider-level env vars merged at launch time
env: { DOCKER_SPECIFIC: "value" },
// Optional: attach container to Docker network(s) — string or string[]
network: "my-network",
// Optional: add the container user to supplementary groups via --group-add.
// Accepts group names or numeric GIDs (e.g. for a bind-mounted Docker socket).
groups: ["docker", 999],
// Optional: expose host devices via --device. Each entry is a full device
// spec in host[:container[:permissions]] form (e.g. "/dev/kvm").
devices: ["/dev/kvm"],
// Optional: limit CPU resources via --cpus. Fractional values allowed (e.g. 1.5).
// cpus: 2,
}),
// Host repo directory — replaces process.cwd() as the anchor for
// .sandcastle/ artifacts (worktrees, logs, env, patches) and git operations.
// Relative paths resolve against process.cwd(). Defaults to process.cwd().
cwd: "../other-repo",
// Branch strategy — controls how the agent's changes relate to branches.
// Defaults to { type: "head" } for bind-mount and { type: "merge-to-head" } for isolated providers.
branchStrategy: { type: "branch", branch: "agent/fix-42" },
// Prompt source — provide one of these, not both.
// Note: promptFile resolves against process.cwd(), NOT cwd.
promptFile: ".sandcastle/prompt.md", // path to a prompt file
// prompt: "Fix issue #42 in this repo", // OR an inline prompt string
// Values substituted for {{KEY}} placeholders in the prompt.
promptArgs: {
ISSUE_NUMBER: "42",
},
// Maximum number of agent iterations to run before stopping. Default: 1
maxIterations: 5,
// Display name for this run, shown as a prefix in log output.
name: "fix-issue-42",
// Lifecycle hooks grouped by where they run: host or sandbox.
hooks: {
host: {
onWorktreeReady: [{ command: "cp .env.example .env" }],
onSandboxReady: [{ command: "echo setup done" }],
},
sandbox: {
onSandboxReady: [{ command: "npm install" }],
},
},
// Host-relative file paths to copy into the sandbox before the container starts.
// Not supported with branchStrategy: { type: "head" }.
copyToWorktree: [".env"],
// Override default timeouts for built-in lifecycle steps.
// Unset keys keep their defaults.
timeouts: {
copyToWorktreeMs: 120_000, // default: 60_000
gitSetupMs: 30_000, // default: 10_000
commitCollectionMs: 60_000, // default: 30_000
mergeToHostMs: 60_000, // default: 30_000
},
// How to record progress. Default: write to a file under .sandcastle/logs/
logging: {
type: "file",
path: ".sandcastle/logs/my-run.log",
// Optional: forward the agent's output stream to your own observability system.
// Fires for each text chunk, tool call, and raw stdout line the agent
// produces. Errors thrown by the callback are swallowed so a broken
// forwarder cannot kill the run.
onAgentStreamEvent: (event) => {
// event is { type: "text" | "toolCall" | "raw", iteration, timestamp, ... }
myLogger.info(event);
},
// Optional: append every raw stdout line the agent emits to the same
// log file, interleaved with the human-readable output. Includes lines
// the provider's stream parser would otherwise drop. Intended for
// debugging stuck or unexpected agent behaviour.
verbose: true,
},
// logging: { type: "stdout", verbose: true }, // OR terminal mode (verbose: raw lines to stdout)
// String (or array of strings) the agent emits to end the iteration loop early.
// Default: "<promise>COMPLETE</promise>"
completionSignal: "<promise>COMPLETE</promise>",
// Idle timeout in seconds — resets whenever the agent produces output. Default: 600 (10 minutes)
idleTimeoutSeconds: 600,
// Grace window in seconds after the agent emits a completion signal but
// before its process has exited (a "hanging process" — typically a spawned
// `gh`/git child or MCP server keeping stdout open). Resets on every
// subsequent output line so trailing data is still captured. Default: 60
completionTimeoutSeconds: 60,
// Structured output — extract a typed payload from the agent's stdout.
// Requires maxIterations === 1 and the tag must appear in the prompt.
// output: Output.object({ tag: "result", schema: z.object({ answer: z.number() }) }),
// output: Output.string({ tag: "summary" }),
});
console.log(result.iterations.length); // number of iterations executed
console.log(result.completionSignal); // matched signal string, or undefined if none fired
console.log(result.commits); // array of { sha } for commits created
console.log(result.branch); // target branch name
createSandbox() — reusable sandboxUse createSandbox() when you need to run multiple agents (or multiple rounds of the same agent) inside a single sandbox. It creates the sandbox once, and you call sandbox.run() as many times as you need. This avoids repeated container startup costs and keeps all runs on the same branch.
Use run() instead when you only need a single one-shot invocation — it handles sandbox lifecycle automatically.
import { createSandbox, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
await using sandbox = await createSandbox({
branch: "agent/fix-42",
sandbox: docker(),
});
const result = await sandbox.run({
agent: claudeCode("claude-opus-4-8"),
prompt: "Fix issue #42 in this repo.",
});
console.log(result.commits); // [{ sha: "abc123" }]
import { createSandbox, claudeCode } from "@ai-hero/sandcastle";
import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
await using sandbox = await createSandbox({
branch: "agent/fix-42",
sandbox: docker(),
hooks: { sandbox: { onSandboxReady: [{ command: "npm install" }] } },
});
// Step 1: implement
const implResult = await sandbox.run({
agent: claudeCode("claude-opus-4-8"),
promptFile: ".sandcastle/implement.md",
maxIterations: 5,
});
// Step 2: review on the same branch, same container
const reviewResult = await sandbox.run({
agent: claudeCode("claude-sonnet-4-6"),
prompt: "Review the changes and fix any issues.",
});
Commits from all run() calls accumulate on the same branch. The sandbox container stays alive between runs, so installed dependencies and build artifacts persist.
sandbox.exec() lets the harness run shell commands directly in the same warm sandbox — handy for gating an implement step on a quick verification before kicking off the review:
```typescript await using sandbox = await createSandbox({ branch: "agent/fix-42", sandbox: docker(), hooks: { sandbox: { onSandboxReady: [{ command: "npm install" }] } }, });
await sandbox.run({ agent: claudeCode("claude-opus-4-8"), promptFile: ".sandcastle/implemen
$ claude mcp add sandcastle \
-- python -m otcore.mcp_server <graph>