hub / github.com/aiming-lab/AutoResearchClaw

github.com/aiming-lab/AutoResearchClaw @v0.5.0 sqlite

repository ↗ · DeepWiki ↗ · release v0.5.0 ↗

5,875 symbols 23,188 edges 405 files 2,223 documented · 38%

README

Chat an Idea. Get a Paper. Autonomous, Collaborative & Self-Evolving.

Just chat with OpenClaw: "Research X" → done.

📄 Our paper is on arXiv — come read it! AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

AutoResearchClaw Framework

🇨🇳 中文 · 🇯🇵 日本語 · 🇰🇷 한국어 · 🇫🇷 Français · 🇩🇪 Deutsch · 🇪🇸 Español · 🇧🇷 Português · 🇷🇺 Русский · 🇸🇦 العربية

🏆 Paper Showcase · 🧑‍✈️ Co-Pilot Guide · 📖 Integration Guide · 💬 Discord Community

🏆 Generated Paper Showcase 8 papers across 8 domains — math, statistics, biology, computing, NLP, RL, vision, robustness — generated fully autonomously or with Human-in-the-Loop co-pilot guidance.

🧪 We're looking for testers! Try the pipeline with your own research idea — from any field — and tell us what you think. Your feedback directly shapes the next version. → Testing Guide | → 中文测试指南 | → 日本語テストガイド

🔥 News

[05/19/2026] v0.5.0 — Multi-Domain Experiment Agents + ARC-Bench — Two headline updates. (1) Domain-specialist execution agents: the experiment stage (Stages 10–13) now routes beyond the default ML sandbox to specialist agents per field — high-energy physics (ColliderAgent: Lagrangian → FeynRules → MadGraph5 → Delphes via the Magnus cloud), biology (COBRApy genome-scale metabolic modelling), and statistics (simulation-study agent), with a generic Docker executor covering chemistry/materials. The pipeline auto-selects the right executor from the research domain. (2) ARC-Bench: a 55-topic open-ended autonomous-research benchmark spanning ML (25), HEP (10), quantum (10), biology (7), and statistics (3) — each topic ships a manifest (research question + conditions + metrics + datasets) and a rubric for graded scoring, all under experiments/arc_bench/. → Domain Integration Guide
[04/01/2026] v0.4.0 — Human-in-the-Loop Co-Pilot System — AutoResearchClaw is no longer purely autonomous. New HITL system adds 6 intervention modes (full-auto, gate-only, checkpoint, step-by-step, co-pilot, custom), per-stage policies, and deep human-AI collaboration. Includes: Idea Workshop for hypothesis co-creation, Baseline Navigator for experiment design review, Paper Co-Writer for collaborative drafting, SmartPause (confidence-driven dynamic intervention), ALHF intervention learning, anti-hallucination claim verification, cost budget guardrails, pipeline branching for parallel hypothesis exploration, and CLI commands (attach/status/approve/reject/guide). → Full HITL Guide
[03/30/2026] Flexible Skill Loading — AutoResearchClaw now supports loading open-source and custom skills from any discipline to further enhance your research experience. 20 pre-loaded skills are included as ready-to-use references, covering scientific writing, experiment design, chemistry, biology, and more — including an A-Evolve agentic evolution skill contributed by the community. Load your own via researchclaw skills install or drop a SKILL.md into .claude/skills/. See Skills Library.
[03/22/2026] v0.3.2 — Cross-Platform Support + Major Stability — AutoResearchClaw now runs on any ACP-compatible agent backend (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) and supports messaging platforms (Discord, Telegram, Lark, WeChat) via OpenClaw bridge. New CLI-agent code generation backend delegates Stages 10 & 13 to external CLI agents with budget control and timeout management. Also includes anti-fabrication system (VerifiedRegistry + experiment diagnosis & repair loop), 100+ bug fixes, modular executor refactoring, --resume auto-detection, LLM retry hardening, and community-reported fixes.

Earlier releases

[03/18/2026] v0.3.1 — OpenCode Beast Mode + Community Contributions — New "Beast Mode" routes complex code generation to OpenCode with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit.
[03/17/2026] v0.3.0 — MetaClaw Integration — AutoResearchClaw now supports MetaClaw cross-run learning: pipeline failures → structured lessons → reusable skills, injected into all 23 stages. +18.3% robustness in controlled experiments. Opt-in (metaclaw_bridge.enabled: true), fully backward-compatible. See Integration Guide.
[03/16/2026] v0.2.0 — Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs.
[03/15/2026] v0.1.0 — We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.

⚡ One Command. One Paper.

# Fully autonomous — no human intervention
pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve

# Co-Pilot mode — collaborate with AI at key decision points
researchclaw run --topic "Your research idea here" --mode co-pilot

🤔 What Is This?

You think it. AutoResearchClaw writes it. You guide the key decisions.

Drop a research topic — get back a full academic paper with real literature from OpenAlex, Semantic Scholar & arXiv, hardware-aware sandbox experiments (GPU/MPS/CPU auto-detected), statistical analysis, multi-agent peer review, and conference-ready LaTeX targeting NeurIPS/ICML/ICLR. Run it fully autonomous, or use Co-Pilot mode to guide the AI at critical decision points — choose research directions, review experiment designs, and co-write the paper. No hallucinated references.

📄	`paper_draft.md`	Full academic paper (Introduction, Related Work, Method, Experiments, Results, Conclusion)
📐	`paper.tex`	Conference-ready LaTeX (NeurIPS / ICLR / ICML templates)
📚	`references.bib`	Real BibTeX references from OpenAlex, Semantic Scholar and arXiv — auto-pruned to match inline citations
🔍	`verification_report.json`	4-layer citation integrity + relevance verification (arXiv, CrossRef, DataCite, LLM)
🧪	`experiment runs/`	Generated code + sandbox results + structured JSON metrics
📊	`charts/`	Auto-generated condition comparison charts with error bars and confidence intervals
📝	`reviews.md`	Multi-agent peer review with methodology-evidence consistency checks
🧬	`evolution/`	Self-learning lessons extracted from each run
📦	`deliverables/`	All final outputs in one folder — compile-ready for Overleaf

The pipeline runs end-to-end — fully autonomous or with human-in-the-loop collaboration. When experiments fail, it self-heals. When hypotheses don't hold, it pivots. When citations are fake, it kills them. When you want to steer, it pauses and listens.

🌍 Run it anywhere. AutoResearchClaw isn't locked to a single platform. Use it standalone via CLI, plug it into OpenClaw, or wire it up through any ACP-compatible agent — 🤖 Claude Code, 💻 Codex CLI, 🐙 Copilot CLI, ♊ Gemini CLI, 🌙 Kimi CLI, you name it. And because OpenClaw bridges to messaging platforms, you can kick off a full research run from 💬 Discord, ✈️ Telegram, 🐦 Lark (飞书), 💚 WeChat, or wherever your team already hangs out. One topic in, one paper out — no matter where you type it.

🚀 Quick Start

# 1. Clone & install
git clone https://github.com/aiming-lab/AutoResearchClaw.git
cd AutoResearchClaw
python3 -m venv .venv && source .venv/bin/activate
pip install -e .

# 2. Setup (interactive — installs OpenCode beast mode, checks Docker/LaTeX)
researchclaw setup

# 3. Configure
researchclaw init          # Interactive: choose LLM provider, creates config.arc.yaml
# Or manually: cp config.researchclaw.example.yaml config.arc.yaml

# 4. Run
export OPENAI_API_KEY="sk-..."
researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve

Output → artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/ — compile-ready LaTeX, BibTeX, experiment code, charts.

📝 Minimum required config

project:
  name: "my-research"

research:
  topic: "Your research topic here"

llm:
  base_url: "https://api.openai.com/v1"
  api_key_env: "OPENAI_API_KEY"
  primary_model: "gpt-4o"
  fallback_models: ["gpt-4o-mini"]

experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python"

🧠 What Makes It Different

Capability	How It Works
🧑‍✈️ Co-Pilot Mode	6 intervention modes — from fully autonomous to step-by-step. Guide the AI at critical decisions (hypotheses, baselines, paper writing) or let it run free. SmartPause auto-detects when human input would help.
🔄 PIVOT / REFINE Loop	Stage 15 autonomously decides: PROCEED, REFINE (tweak params), or PIVOT (new direction). Artifacts auto-versioned.
🤖 Multi-Agent Debate	Hypothesis generation, result analysis, and peer review each use structured multi-perspective debate.
🧬 Self-Learning	Lessons extracted per run (decision rationale, runtime warnings, metric anomalies) with 30-day time-decay. Future runs learn from past mistakes.
📚 Knowledge Base	Every run builds structured KB across 6 categories (decisions, experiments, findings, literature, questions, reviews).
🛡️ Sentinel Watchdog	Background quality monitor: NaN/Inf detection, paper-evidence consistency, citation relevance scoring, anti-fabrication guard.
🔍 Claim Verification	Inline fact-checking: extracts claims from AI-generated text and cross-references against collected literature. Flags ungrounded citations and fabricated numbers.
🌿 Branch Exploration	Fork the pipeline to explore multiple research directions simultaneously, compare results side-by-side, and merge the best path forward.

🦞 OpenClaw Integration

**AutoResearchClaw is an [OpenClaw](https://github.com/openclaw/openclaw)-compatible service.** Install it in OpenClaw and launch autonomous research with a single message —

Core symbols most depended-on inside this repo

get

called by 1558

researchclaw/memory/store.py

append

called by 1142

researchclaw/metaclaw_bridge/skill_feedback.py

get

called by 466

researchclaw/project/idea_pool.py

get

called by 359

researchclaw/agents/code_searcher/cache.py

append

called by 209

researchclaw/adapters.py

called by 104

researchclaw/web/search.py

run

called by 94

researchclaw/experiment/sandbox.py

resolve

called by 88

researchclaw/overleaf/conflict.py

Shape

Method 3,480

Function 1,460

Class 869

Route 66

Languages

Python99%

TypeScript1%

Modules by API surface

tests/test_rc_executor.py241 symbols

tests/test_rc_templates.py103 symbols

researchclaw/config.py84 symbols

tests/test_memory_system.py78 symbols

tests/test_decision_agent.py73 symbols

tests/test_web_platform.py72 symbols

tests/test_skills_library.py64 symbols

tests/test_rc_literature.py64 symbols

tests/test_figure_agent.py62 symbols

tests/test_rc_runner.py60 symbols

tests/test_benchmark_agent.py60 symbols

tests/test_ssh_and_colab_sandbox.py59 symbols

Dependencies from manifests, versioned

arxiv2.1 · 1×

numpy1.24 · 1×

pyyaml6.0 · 1×

rich13.0 · 1×

For agents

$ claude mcp add AutoResearchClaw \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact