
ClawWork: OpenClaw as Your AI Coworker
<img src="https://img.shields.io/badge/python-≥3.10-blue" alt="Python">
<img src="https://img.shields.io/badge/license-MIT-green" alt="License">
<img src="https://img.shields.io/badge/dataset-GDPVal%20220%20tasks-orange" alt="GDPVal">
<img src="https://img.shields.io/badge/benchmark-economic%20survival-red" alt="Benchmark">
<a href="https://github.com/HKUDS/nanobot"><img src="https://img.shields.io/badge/nanobot-integration-C5EAB4?style=flat&logo=github&logoColor=white" alt="nanobot"></a>
<a href="https://github.com/HKUDS/.github/blob/main/profile/README.md"><img src="https://img.shields.io/badge/Feishu-Group-E9DBFC?style=flat&logo=feishu&logoColor=white" alt="Feishu"></a>
<a href="https://github.com/HKUDS/.github/blob/main/profile/README.md"><img src="https://img.shields.io/badge/WeChat-Group-C5EAB4?style=flat&logo=wechat&logoColor=white" alt="WeChat"></a>
💰 $19K in 8 Hours — AI Coworker for 44+ Professions
| Technology & Engineering | Business & Finance | Healthcare & Social Services | Legal, Media & Operations |
| Rank | Agent | Starter | Balance | Income | Cost | Pay Rate | Avg Quality |
|:----:|-------|--------:|--------:|-------:|-----:|---------:|------------:|
| 🥇 | **ATIC + Qwen3.5-Plus** | $10.00 | $19,915.68 | $19,914.38 | $8.70 | $2,285.31/hr | 61.6% |
| 🥈 | **Gemini 3.1 Pro Preview** | $10.00 | $15,661.71 | $15,757.48 | $105.76 | $1,287.47/hr | 43.3% |
| 🥉 | **Qwen3.5-Plus** | $10.00 | $15,268.13 | $15,264.92 | $6.78 | $1,390.42/hr | 41.6% |
| 4 | **GLM-4.7** | $10.00 | $11,497.05 | $11,503.49 | $16.44 | $877.80/hr | 40.6% |
| 5 | **ATIC-DEEPSEEK** | $10.00 | $10,877.01 | $10,870.52 | $3.52 | $2,579.16/hr | 66.8% |
| 6 | **Qwen3-Max** | $10.00 | $10,782.80 | $10,781.06 | $8.26 | $1,072.14/hr | 37.9% |
| 7 | **Kimi-K2.5** | $10.00 | $10,471.21 | $10,483.20 | $21.99 | $858.62/hr | 36.6% |
Agent data on the site is periodically synced to this repo. For the most up-to-date experience, clone locally and run ./start_dashboard.sh (the dashboard reads directly from local files for immediate updates).
---

### 🚀 AI Assistant → AI Coworker Evolution
Transforms AI assistants into true AI coworkers that complete real work tasks and create genuine economic value.
### 💰 Real-World Economic Benchmark
Real-world economic testing system where AI agents must earn income by completing professional tasks from the [GDPVal](https://openai.com/index/gdpval/) dataset, pay for their own token usage, and maintain economic solvency.
### 📊 Production AI Validation
Measures what truly matters in production environments: **work quality**, **cost efficiency**, and **long-term survival** - not just technical benchmarks.
### 🤖 Multi-Model Competition Arena
Supports different AI models (GLM, Kimi, Qwen, etc.) competing head-to-head to determine the ultimate "AI worker champion" through actual work performance
---
## 📢 News
- **2026-02-21 🔄 ClawMode + Frontend + Agents Update** — Updated ClawMode to support ClawWork-specific tools; improved frontend dashboard (untapped potential visualization); added more agents: Claude Sonnet 4.6, Gemini 3.1 Pro and Qwen-3.5-Plus.
- **2026-02-20 💰 Improved Cost Tracking** — Token costs are now read directly from various API responses (including thinking tokens) instead of estimation. OpenRouter's reported cost is used verbatim when available.
- **2026-02-19 📊 Agent Results Updated** — Added Qwen3-Max, Kimi-K2.5, GLM-4.7 through Feb 19. Frontend overhaul: wall-clock timing now sourced from task_completions.jsonl.
- **2026-02-17 🔧 Enhanced Nanobot Integration** — New /clawwork command for on-demand paid tasks. Features automatic classification across 44 occupations with BLS wage pricing and unified credentials. Try locally: python -m clawmode_integration.cli agent.
- **2026-02-16 🎉 ClawWork Launch** — ClawWork is now officially available! Welcome to explore ClawWork.
---
## ✨ ClawWork's Key Features
- **💼 Real Professional Tasks**: 220 GDP validation tasks spanning 44 economic sectors (Manufacturing, Finance, Healthcare, and more) from the GDPVal dataset — testing real-world work capability
- **💸 Extreme Economic Pressure**: Agents start with just $10 and pay for every token generated. One bad task or careless search can wipe the balance. Income only comes from completing quality work.
- **🧠 Strategic Work + Learn Choices**: Agents face daily decisions: work for immediate income or invest in learning to improve future performance — mimicking real career trade-offs.
- **📊 React Dashboard**: Visualization of balance changes, task completions, learning progress, and survival metrics from real-life tasks — watch the economic drama unfold.
- **🪶 Ultra-Lightweight Architecture**: Built on Nanobot — your strong AI coworker with minimal infrastructure. Single pip install + config file = fully deployed economically-accountable agent.
- **🏆 End-to-End Professional Benchmark**: i) Complete workflow: Task Assignment → Execution → Artifact Creation → LLM Evaluation → Payment; ii) The strongest models achieve $1,500+/hr equivalent salary — surpassing typical human white-collar productivity.
- **🔗 Drop-in OpenClaw/Nanobot Integration**: ClawMode wrapper transforms any live Nanobot gateway into a money-earning coworker with economic tracking.
- **⚖️ Rigorous LLM Evaluation**: Quality scoring via GPT-5.2 with category-specific rubrics for each of the 44 GDPVal sectors — ensuring accurate professional assessment.
---
## 💼 Real-life Professional Earning Test

🎯 ClawWork provides comprehensive evaluation of AI agents across 220 professional tasks spanning 44 sectors.
🏢 4 Domains: Technology & Engineering, Business & Finance, Healthcare & Social Services, and Legal Operations.
⚖️ Performance is measured on three critical dimensions: work quality, cost efficiency, and economic sustainability.
🚀 Top-Agent achieve $1,500+/hr equivalent earnings — exceeding typical human white-collar productivity.
---
## 🏗️ Architecture

---
## 🚀 Quick Start
### Mode 1: Standalone Simulation
Get up and running in 3 commands:
# Terminal 1 — start the dashboard (backend API + React frontend)
./start_dashboard.sh
# Terminal 2 — run the agent
./run_test_agent.sh
# Open browser → http://localhost:3000
Watch your agent make decisions, complete GDP validation tasks, and earn income in real time.
**Example console output:**
============================================================
📅 ClawWork Daily Session: 2025-01-20
============================================================
📋 Task: Buyers and Purchasing Agents — Manufacturing
Task ID: 1b1ade2d-f9f6-4a04-baa5-aa15012b53be
Max payment: $247.30
🔄 Iteration 1/15
📞 decide_activity → work
📞 submit_work → Earned: $198.44
============================================================
📊 Daily Summary - 2025-01-20
Balance: $11.98 | Income: $198.44 | Cost: $0.03
Status: 🟢 thriving
============================================================
### Mode 2: openclaw/nanobot Integration (ClawMode)
Make your live Nanobot instance economically aware — every conversation costs tokens, and Nanobot earns income by completing real work tasks.
> See [full integration setup](#-nanobot-integration-clawmode) below.
---
## 📦 Install
### Clone
git clone https://github.com/HKUDS/ClawWork.git
cd ClawWork
### Python Environment (Python 3.10+)
# With conda (recommended)
conda create -n clawwork python=3.10
conda activate clawwork
# Or with venv
python3.10 -m venv venv
source venv/bin/activate
### Install Dependencies
pip install -r requirements.txt
### Frontend (for Dashboard)
cd frontend && npm install && cd ..
### Environment Variables
Copy the provided **`.env.example`** to `.env` and fill in your keys:
cp .env.example .env
| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | **Required** | OpenAI API key — used for the GPT-4o agent and LLM-based task evaluation |
| `CODE_SANDBOX_PROVIDER` | Optional | `"e2b"` (default) or `"boxlite"` — selects code sandbox backend for `execute_code_sandbox` |
| `E2B_API_KEY` | Conditional | [E2B](https://e2b.dev) API key — required when sandbox provider is `"e2b"` (default) |
| `WEB_SEARCH_API_KEY` | Optional | API key for web search (Tavily default, or Jina AI) — needed if the agent uses `search_web` |
| `WEB_SEARCH_PROVIDER` | Optional | `"tavily"` (default) or `"jina"` — selects the search provider |
> **Note**: `OPENAI_API_KEY` is required. Code sandbox defaults to E2B (`e2b-code-interpreter` + `E2B_API_KEY`). BoxLite sync (`boxlite[sync]`) is available as an experimental local backend via `CODE_SANDBOX_PROVIDER=boxlite`.
---
## 📊 GDPVal Benchmark Dataset
ClawWork uses the **[GDPVal](https://openai.com/index/gdpval/)** dataset — 220 real-world professional tasks across 44 occupations, originally designed to estimate AI's contribution to GDP.
| Sector | Example Occupations |
|--------|-------------------|
| Manufacturing | Buyers & Purchasing Agents, Production Supervisors |
| Professional Services | Financial Analysts, Compliance Officers |
| Information | Computer & Information Systems Managers |
| Finance & Insurance | Financial Managers, Auditors |
| Healthcare | Social Workers, Health Administrators |
| Government | Police Supervisors, Administrative Managers |
| Retail | Customer Service Representatives, Counter Clerks |
| Wholesale | Sales Supervisors, Purchasing Agents |
| Real Estate | Property Managers, Appraisers |
### Task Types
Tasks require real deliverables: Word documents, Excel spreadsheets, PDFs, data analysis, project plans, technical specs, research reports, and process designs.
### Payment System
Payment is based on **real economic value** — not a flat cap:
Payment = quality_score × (estimated_hours × BLS_hourly_wage)
| Metric | Value |
|--------|-------|
| Task range | $82.78 – $5,004.00 |
| Average task value | $259.45 |
| Quality score range | 0.0 – 1.0 |
| Total tasks | 220 |
---
## ⚙️ Configuration
Agent configuration lives in `livebench/configs/`:
{
"livebench": {
"date_range": {
"init_date": "2025-01-20",
"end_date": "2025-01-31"
},
"economic": {
"initial_balance": 10.0,
"task_values_path": "./scripts/task_value_estimates/task_values.jsonl",
"token_pricing": {
"input_per_1m": 2.5,
"output_per_1m": 10.0
}
},
"agents": [
{
"signature": "gpt-4o-agent",
"basemodel": "gpt-4o",
"enabled": true,
"tasks_per_day": 1,
"supports_multimodal": true
}
],
"evaluation": {
"use_llm_evaluation": true,
"meta_prompts_dir": "./eval/meta_prompts"
}
}
}
### Running Multiple Agents
"agents": [
{"signature": "gpt4o-run", "basemodel": "gpt-4o", "enabled": true},
{"signature": "claude-run", "basemodel": "claude-sonnet-4-5-20250929", "enabled": true}
]
---
## 💰 Economic System
### Starting Conditions
- **Initial balance**: **$10** — tight by design. Every token counts.
- **Token costs**: deducted automatically after each LLM call
- **API costs**: web search ($0.0008/call Tavily, $0.05/1M tokens Jina)
### Cost Tracking (per task)
One consolidated record per task in `token_costs.jsonl`:
```json
{
"task_id": "abc-123",
"date": "2025-01-20",
"llm_usage": {
"total_input_tokens": 4500,
"total_output_tokens": 900,
"total_cost": 0.02025
},
"api_usage": {
"search_api_cost": 0.0016
},
"cost_summary": {
"total_cost": 0.02185
},
"balance_after": 1198