hub / github.com/HKUDS/AI-Researcher

github.com/HKUDS/AI-Researcher @main sqlite

1,108 symbols 4,737 edges 182 files 483 documented · 44%

README

Logo

"AI-Researcher: Autonomous Scientific Innovation"

Welcome to AI-Researcher🤗 AI-Researcher introduces a revolutionary breakthrough in Automated Scientific Discovery🔬, presenting a new system that fundamentally Reshapes the Traditional Research Paradigm. This state-of-the-art platform empowers researchers with:

🎯 Full Autonomy: Complete end-to-end research automation
🔄 Seamless Orchestration: From concept to publication
🧠 Advanced AI Integration: Powered by cutting-edge AI agents
🚀 Research Acceleration: Streamlined scientific innovation

✨ The AI-Researcher system accepts user input queries at two distinct levels ✨

Level 1: Detailed Idea Description

At this level, users provide comprehensive descriptions of their specific research ideas. The system processes these detailed inputs to develop implementation strategies based on the user's explicit requirements.

Level 2: Reference-Based Ideation

This simpler level involves users submitting reference papers without a specific idea in mind. The user query typically follows the format: "I have some reference papers, please come up with an innovative idea and implement it with these papers." The system then analyzes the provided references to generate and develop novel research concepts.

🌟Core Capabilities & Integration

AI-Researcher delivers a Comprehensive Research Ecosystem through seamless integration of critical components:

🚀Primary Research Functions - 📚 Literature Review: Conducts comprehensive analysis and synthesis of existing research. - 📊 Idea Generation: Systematically gathers, organizes, and formulates novel research directions. - 🧪 Algorithm Design and Implementation: Develops methodologies and transforms ideas into functional implementations. - 💻 Algorithm Validation and Refinement: Automates testing, performance evaluation, and iterative optimization. - 📈 Result Analysis: Delivers advanced interpretation of experimental data and insights. - ✍️ Manuscript Creation: Automatically generates polished, full-length academic papers.

Quick Overview of AI-Researcher.

🔥 News

<ul>
  <li><strong>[2025. 09]</strong>: &nbsp; 🎯🎯📢📢 Exciting News! We are thrilled to announce that our 🌟AI-Researcher🌟 has been accepted as a Spotlight paper at NeurIPS 2025! 🎉🎉 Thanks to all the team members 🤗 </b>
  </li>
  <li><strong>[2025. 05]</strong>: &nbsp;🎉🎉 <b>Major Release! AI-Researcher Comprehensive Upgrade!</b> 🚀

We are excited to announce a significant milestone for AI-Researcher:

📄 Academic Paper Release: Detailed exposition of our innovative methods and experimental results
📊 Benchmark Suite: Comprehensive evaluation framework and datasets
🖥️ Web GUI Interface: User-friendly graphical interface making research more convenient

🤝 Join Us! We welcome researchers, developers, and AI enthusiasts to contribute together and advance AI research development. Whether it's code contributions, bug reports, feature suggestions, or documentation improvements, every contribution is valuable!

💡 Let's build a smarter AI research assistant together!

[2025, Mar 04]: 🎉🎉We've launched AI-Researcher!, The release includes the complete framework, datasets, benchmark construction pipeline, and much more. Stay tuned—there's plenty more to come! 🚀

⚡ Quick Start

Installation

AI Installation

Using uv

We recommend to use uv to manage packages in our project (Much more faster than conda)

# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc

# clone the project
git clone https://github.com/HKUDS/AI-Researcher.git
cd AI-Researcher

# install and activate enviroment
uv venv --python 3.11
source ./.venv/bin/activate
uv pip install -e .
playwright install

Docker Installation

To set up the agent-interactive environment, we use Docker for containerization. Please ensure you have Docker installed on your system before proceeding. For running the research agent, we utilize the Docker image 'tjbtech1/airesearcher:v1t'. You can pull this image by executing the following command:

docker pull tjbtech1/airesearcher:v1

or you can build the docker image from our provided Dockerfile.

cd ./docker && docker build -t tjbtech1/airesearcher:v1 .

API Keys Setup

Create an environment variable file based on the provided '.env.template' file. In this file, you should set the configuration including api key, instance id of the test case.


# ================ container configuration ================
# workplace of the research agent
DOCKER_WORKPLACE_NAME=workplace_paper
# base image of the research agent
BASE_IMAGES=tjbtech1/airesearcher:v1
# completion model name, configuration details see: https://docs.litellm.ai/docs/
COMPLETION_MODEL=openrouter/google/gemini-2.5-pro-preview-05-20
# cheep model name, configuration details see: https://docs.litellm.ai/docs/
CHEEP_MODEL=openrouter/google/gemini-2.5-pro-preview-05-20
# specific gpu of the research agent, can be: 
# '"device=0"' using the first gpu
# '"device=0,1"' using the first and second gpu
# '"all"' using all gpus
# None for no gpu
GPUS='"device=0"'
# name of the container
CONTAINER_NAME=paper_eval
# name of the workplace
WORKPLACE_NAME=workplace
# path of the cache
CACHE_PATH=cache
# port of the research agent
PORT=7020
# platform of the research agent
PLATFORM=linux/amd64

# ================ llm configuration ================
# github ai token of the research agent
GITHUB_AI_TOKEN=your_github_ai_token
# openrouter api key of the research agent
OPENROUTER_API_KEY=your_openrouter_api_key
# openrouter api base url of the research agent
OPENROUTER_API_BASE=https://openrouter.ai/api/v1

# ================ task configuration ================
# category of the research agent, based on: ./benchmark/final. Can be: 
# diffu_flow
# gnn
# reasoning
# recommendation
# vq
# example: ./benchmark/final/vq
CATEGORY=vq
# instance id of the research agent, example: ./benchmark/final/vq/one_layer_vq.json
INSTANCE_ID=one_layer_vq
# task level of the research agent, can be: 
# task1
# task2
TASK_LEVEL=task1
# maximum iteration times of the research agent
MAX_ITER_TIMES=0

🔥 Web GUI

We add a webgui based on gradio. Just run the following command:

python web_ai_researcher.py

You can configure the environment variables in the following tab:

Select the following example to run our AI-Researcher:

⬇️ Examples

⚠️ ALERT: The GIFs below are large files and may take some time to load. Please be patient while they render completely.

Example 1 (Vector Quantized)

Input:Prompt

I have some reference papers, please implement the following idea with these papers:

The proposed model designed in this paper is designed to improve the performance of Vector Quantized Variational AutoEncoders (VQ-VAEs) by addressing issues with gradient propagation through the non-differentiable vector quantization layer.

The core methodologies utilized include:
- Rotation and Rescaling Transformation: A linear transformation that alters the encoder output to align it with the nearest codebook vector without changing the forward pass output.
- Gradient Propagation Method: The proposed model ensures that gradients flow from the decoder to the encoder while preserving the angle between the gradient and codebook vector.
- Codebook Management: Utilizes the connection between the encoder output and the corresponding codebook vectors to mitigate codebook collapse and improve utilization.
The primary functions of these components are:
- The rotation and rescaling transformation modifies how the encoder output is quantized and how information is retained during backpropagation, enabling gradients to reflect the true positioning of the encoder output relative to the codebook vectors.
- The gradient propagation method redefines how gradients are transported back to the encoder, allowing for an enhanced and nuanced movement through the quantization layer, which leads to a better performance during training.
- Codebook management practices help in maintaining a diverse set of codebook vectors throughout training, avoiding scenarios where multiple vectors become redundant or unused.
Implementation details for each component:
- Key Parameters:
  - Codebook size should be configured based on the complexity of the dataset (e.g., 1024 or 8192).
  - Commitment loss coefficient (β) is typically set within [0.25, 2].
- Input/Output Specifications:
  - Input to the encoder is a continuous high-dimensional vector, while the output is a corresponding quantized vector from the codebook.
  - The output for reconstruction is generated using the decoder applied to the transformed codebook vectors.
- Important Constraints:
  - Ensure that the codebook is updated correctly with an exponential moving average procedure, and treat both rotation and rescaling during the forward pass as constants with respect to the gradient.
Step-by-Step Integration of Components:
- Step 1: Input the data vector into the encoder to obtain the continuous representation.
- Step 2: Identify the nearest codebook vector to the encoder output.
- Step 3: Compute the rotation matrix that aligns the encoder output to the

Core symbols most depended-on inside this repo

info

called by 370

research_agent/inno/logger.py

get

called by 232

research_agent/inno/memory/rag_memory.py

run

called by 47

research_agent/inno/core.py

called by 38

research_agent/inno/environment/browser_env.py

step

called by 28

research_agent/inno/environment/browser_env.py

write_temp_log

called by 25

paper_agent/section_composer.py

chat

called by 24

benchmark_collection/utils/openai_utils.py

step

called by 22

examples/gnn_nodeformer/project/experiments/run_enhanced_experiments.py

Shape

Method 500

Function 451

Class 147

Route 10

Languages

Python100%

Modules by API surface

research_agent/inno/environment/mdconvert.py57 symbols

research_agent/inno/environment/markdown_browser/mdconvert.py56 symbols

web_ai_researcher.py32 symbols

research_agent/inno/tools/inno_tools/web_tools.py30 symbols

research_agent/inno/workflow/flowgraph.py28 symbols

benchmark_collection/utils/pdf_utils.py26 symbols

research_agent/inno/environment/markdown_browser/requests_markdown_browser.py22 symbols

research_agent/inno/tools/web_tools.py21 symbols

benchmark_collection/0_crawl_paper.py21 symbols

research_agent/inno/logger.py20 symbols

research_agent/inno/util.py19 symbols

research_agent/inno/tools/terminal_tools.py19 symbols

Dependencies from manifests, versioned

PyYAML5.3.1 · 1×

datasets2.17.1 · 1×

easydict1.9 · 1×

fastapi0.115.12 · 1×

ipdb0.13.3 · 1×

lmdb1.0.0 · 1×

numpy1.19.0 · 1×

openai1.59.8 · 1×

pillow10.2.0 · 1×

pyautogui0.9.54 · 1×

pydantic2.6.1 · 1×

pyflakes2.2.0 · 1×

For agents

$ claude mcp add AI-Researcher \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact