Turn Your Coding Models to Be State-of-the-art Browser Agents
Webwright gives LLM a terminal where it can launch multiple browser sessions to inspect the page and complete a web task. It captures and inspects page screenshots/states only when needed. It enforces each web task to be completed end-to-end within a re-runnable Python script, i.e. your web agent browsing history is a single code file. No multi-agent system, no graph engine, no plugin layer, no hidden orchestration — just a terminal, a browser, and a model.
Already got your favorite agents, and wonder how to make Claude Code, Codex, Hermes, OpenClaw more capable in browser tasks? Consider adding Webwright plugin/skills!
/plugin install webwright@webwright. OpenClaw and Hermes Agent integrations shipped; the same skills/webwright/ folder now loads across Claude Code, Codex, OpenClaw, and Hermes.💡 Motivation: Beyond Step-by-Step Web Interaction in a Stateful Browser
Most web agents today treat the browser session itself as the workspace: at each step the model receives the current page state and predicts a single next operation — a click, a type, a DOM selector, or a short tool call. Whatever the format, the agent is locked into predicting one web action at a time inside a predefined interaction loop. That harness was useful when LLMs were weaker. As models get stronger at writing and debugging code, the same harness becomes a bottleneck.
Webwright takes a different stance: separate the agent from the browser, and treat the browser as something the agent can launch, inspect, and discard while developing a program. The persistent artifact is not the browser session — it's the code and logs in the local workspace.
🌟 Why Webwright
Most web agent frameworks bury the actual agent loop under layers of abstractions. Webwright takes the opposite stance:
httpx, pydantic, playwright, and typer.If you want a minimal, easy-to-debug starting point for browser-using agents instead of another heavyweight platform, this is it.
🆚 How Webwright Differs From Other Browser-Agent Repos
How they differ at the architectural level:
| Stagehand (Browserbase) | agent-browser (Vercel) | browser-use | Webwright | |
|---|---|---|---|---|
| Paradigm | Hybrid: code + NL primitives (act / extract / agent) |
CLI tool that another agent (Claude Code, Codex, etc.) calls | Autonomous LLM agent loop over DOM/AX snapshots | Coding agent with a terminal; browser is just an environment it spawns |
| Action space | Playwright code, or NL → LLM-translated Playwright | Discrete subcommands (open, click @e2, snapshot, eval) |
Indexed click/type actions selected by the LLM | Free-form Python (writes Playwright scripts itself) |
| What is "state"? | The browser session | The browser session (held by daemon across CLI calls) | The browser session | The local workspace — code, screenshots, logs. Browser is disposable. |
| Loop shape | Imperative; agent() does multi-step when needed |
One CLI invocation per micro-step | observe → predict next action → execute → repeat | write code → execute → inspect screenshots → repair (code-as-action) |
https://github.com/user-attachments/assets/4ed94cd5-11be-4daa-b2d7-1260a803baca
State-of-the-art on two real-website benchmarks with a 100-step budget — see the blog post for full details.

webwright/
├── pyproject.toml # package: webwright
├── src/webwright/
│ ├── run/cli.py # CLI entrypoint (`webwright`)
│ ├── agents/default.py # core agent loop
│ ├── environments/ # Playwright browser workspace
│ ├── tools/ # image_qa, self_reflection
│ ├── models/ # openai_model, anthropic_model, base
│ ├── config/ # base.yaml, model_openai.yaml, model_claude.yaml
│ └── utils/
├── assets/
│ └── task_showcase/ # tiny Flask dashboard for repeatable runs
│ ├── app.py
│ ├── templates/ # dashboard.html, task.html
│ └── tasks/<short_id>/ # task.json + report.json per task
├── tests/
└── outputs/ # run artifacts (trajectories, screenshots)
A tiny Flask app under assets/task_showcase/ consolidates
Webwright runs for repeatable odyssey tasks (deals, inventory, listings,
job boards, weather, etc.) into a single dashboard. Each task ships only two
files — task.json (metadata) and report.json (curated, structured output:
sources + result sections like tables, lists, summaries) — and the templates
render them generically, so adding a new task is just dropping a new folder
in assets/task_showcase/tasks/.
pip install flask
python assets/task_showcase/app.py # http://127.0.0.1:5005
To have Webwright produce a renderer-ready task folder at runtime, stack the Task Showcase overlay:
python -m webwright.run.cli \
-c base.yaml -c model_openai.yaml -c task_showcase.yaml \
-t "<repeatable web task>" \
--task-id my_repeatable_task \
-o outputs/default
Note:
report.jsonis only generated when-c task_showcase.yamlis included. A plainbase.yamlrun producestrajectory.jsonand debug artifacts but noreport.json.
The run writes task_showcase/tasks/<short_id>/task.json and report.json
inside the output workspace. Render those generated files without copying them
back into the repo:
python assets/task_showcase/app.py \
--tasks-dir outputs/default/<run>/task_showcase/tasks
pip install -e .
playwright install chromium
Export credentials for the configured backend (for example, OPENAI_API_KEY
with model_openai.yaml or ANTHROPIC_API_KEY with model_claude.yaml). The
image_qa and self_reflection tools use the same configured model by default,
so an Anthropic run does not require an OpenAI key. Then:
python -m webwright.run.cli \
-c base.yaml -c model_openai.yaml \
-t "Search for flights from SEA to JFK on 2026-08-15 to 2026-08-20" \
--start-url https://www.google.com/flights \
--task-id demo_openai \
-o outputs/default
| Flag | Description |
|---|---|
-c |
Config file(s) from src/webwright/config/ (stackable). |
-t |
Task instruction. |
--start-url |
Initial page. |
--task-id |
Output subfolder name. |
-o |
Output directory. |
Webwright ships plugin manifests for both Claude Code (.claude-plugin/plugin.json) and OpenAI Codex (.codex-plugin/plugin.json), with the shared skill at skills/webwright/ and slash commands at skills/webwright/commands/. The host agent drives the Webwright loop natively — no extra LLM API key or cost beyond your host subscription. Hosts that read PNG screenshots natively skip the image_qa / self_reflection tools.
Common runtime deps (install once after either path):
pip install -e .
playwright install chromium
Claude Code
Install through the bundled marketplace inside Claude Code:
# 1. Add this repo as a Claude Code plugin marketplace
/plugin marketplace add microsoft/Webwright
# 2. Install the plugin from that marketplace
/plugin install webwright@webwright
Prefer a local checkout? Point the marketplace command at the cloned repo instead:
/plugin marketplace add /absolute/path/to/Webwright
/plugin install webwright@webwright
Start a new Claude Code session after installing — plugins are loaded at session start and won't appear until you restart.
You can either ask Claude Code in plain English (the skill auto-activates from its description), or use one of the slash commands:
/webwright:run search Google Flights for flights from SEA to JFK on 2026-08-15 to 2026-08-20
/webwright:craft search a ticket on Google Flights from LAX to SFO depart June 7 return June 14
/webwright:run (or any plain prompt) produces a one-shot final_script.py for the literal task values./webwright:craft produces a reusable CLI tool: final_script.py becomes one parameterized function with a Google-style Args: docstring and an argparse wrapper whose flags default to the concrete task values, so you can rerun it later with different arguments — e.g. python final_script.py --origin JFK --destination LAX --depart-date 2026-07-01.In both modes Claude Cod
$ claude mcp add Webwright \
-- python -m otcore.mcp_server <graph>