hub / github.com/ruc-datalab/DeepAnalyze

github.com/ruc-datalab/DeepAnalyze @main sqlite

7,027 symbols 31,592 edges 1,023 files 1,398 documented · 20%

README

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Authors: Shaolei Zhang, Ju Fan*, Meihao Fan, Guoliang Li, Xiaoyong Du

Renmin University of China, Tsinghua University

DeepAnalyze is the first agentic LLM for autonomous data science. It can autonomously complete a wide range of data-centric tasks without human intervention, supporting: - 🛠 Entire data science pipeline: Automatically perform any data science tasks such as data preparation, analysis, modeling, visualization, and report generation. - 🔍 Open-ended data research: Conduct deep research on diverse data sources, including structured data (Databases, CSV, Excel), semi-structured data (JSON, XML, YAML), and unstructured data (TXT, Markdown), and finally produce analyst-grade research reports. - 📊 Fully open-source: The model, code, training data, and demo of DeepAnalyze are all open-sourced, allowing you to deploy or extend your own data analysis assistant.

deepanalyze

🔥 News

[2026.07]: We look forward to releasing DeepPrep, a data-preparation companion to DeepAnalyze that turns raw tables into analysis-ready data.

More about DeepPrep

DeepPrep is an LLM-powered agentic system for autonomous data preparation. It constructs data-preparation pipelines through execution-grounded interaction with intermediate table states and runtime feedback, helping clean, transform, and standardize raw data before downstream analysis.

▶️ Demo:

https://github.com/user-attachments/assets/6b94927f-5c0c-4cfe-bc33-de56b8e459cd

[2026.06.15]: We release CoDA-Bench, a benchmark for evaluating whether code agents can handle data-intensive analytical tasks, closely aligned with DeepAnalyze's target scenarios.

More about CoDA-Bench

CoDA-Bench evaluates agents in a Linux sandbox with hundreds of data files. Given a natural-language question, an agent must discover relevant data, write executable code, and produce the final answer. It provides a benchmark setting for the same type of data discovery and code-execution challenges targeted by DeepAnalyze.

▶️ Demo:

https://github.com/user-attachments/assets/34e50a62-744b-4079-8988-6a8bbfe166a0

[2026.05.31]: DA-Studio, the system behind DeepAnalyze WebUI v2 (demo/chat_v2), has been accepted to the VLDB 2026 Demonstration Track.
[2026.03.16]: Update DeepAnalyze WebUI v2, featuring a smoother UI, support for the HeyWhale API, and support for Docker-based sandboxed code execution. More details in Readme .
[2026.01.31]: 🎉🎉🎉DeepAnalyze served as the official agent supporting the 2026年(第19届)中国大学生计算机设计大赛大数据主题赛 (2026 (19th) China Collegiate Computer Design Contest – Big Data Track).
[2025.12.28] ANNOUNCEMENT: DeepAnalyze API Keys Are Now Available 🎉🎉🎉 You can now apply for your API key via this Google Form or this Feishu Form. For full details and usage instructions, please refer to the Guide or the Feishu Wiki.
[2025.11.13]: DeepAnalyze now supports OpenAI-style API endpointsis and is accessible through the Command Line Terminal UI. Thanks to the contributor @LIUyizheSDU
[2025.11.08]: DeepAnalyze is now accessible through the JupyterUI, building based on jupyter-mcp-server. Thanks to the contributor @ChengJiale150.
[2025.10.28]: We welcome all contributions, including improving the DeepAnalyze and sharing use cases (see CONTRIBUTION.md). All merged PRs will be listed as contributors.
[2025.10.27]: DeepAnalyze has attracted widespread attention, gaining 1K+ GitHub stars and 200K+ Twitter views within a week.
[2025.10.21]: DeepAnalyze's paper, code, model, training data are released!

🖥 Demo

WebUI

https://github.com/user-attachments/assets/04184975-7ee7-4ae0-8761-7a7550c5c8fe

Upload the data, DeepAnalyze can perform data-oriented deep research 🔍 and any data-centric tasks 🛠

Clone this repo and download DeepAnalyze-8B.
Deploy DeepAnalyze-8B via vllm: vllm serve DeepAnalyze-8B
Run these scripts to launch the API and interface, and then interact through the browser (http://localhost:4000): ```bash cd demo/chat/frontend npm install cd .. bash start.sh

stop the api and interface

bash stop.sh

stop vllm if needed

``` - If you want to deploy under a specific IP, please replace localhost with your IP address in ./demo/chat/backend.py and ./demo/chat/frontend/lib/config.ts

WebUI v2

https://github.com/user-attachments/assets/2dd1d2aa-6fb9-4202-bc8d-cbe874844725

Upload the data, DeepAnalyze can perform data-oriented deep research 🔍 and any data-centric tasks 🛠

A more streamlined UI
Added support for HeyWhale API keys
Added support for a Docker-based sandbox code execution environment.
The usage method is the same as WebUI.

```bash cd demo/chat_v2/frontend npm install cd .. cp .env.example .env bash start.sh

stop the api and interface

bash stop.sh

stop vllm if needed

```

JupyterUI

https://github.com/user-attachments/assets/a2335f45-be0e-4787-a4c1-e93192891c5f

Familiar with Jupyter Notebook? Try DeepAnalyze through the JupyterUI!

This Demo runs Jupyter Lab as frontend, creating a new notebook, converting <Analyze|Understand|Answer> to Markdown cells, converting <Code> to Code cells and executing them as <Execute>.
Go to demo/jupyter to see more and try!
👏Thanks a lot to the contributor @ChengJiale150.

CLI

https://github.com/user-attachments/assets/018acae5-b979-4143-ae1e-5b74da453c1d

Try DeepAnalyze through the command-line interface

Deploy DeepAnalyze-8B via vllm: vllm serve DeepAnalyze-8B
Start the API server and launch the CLI interface: ```bash cd API python start_server.py # In one terminal

cd demo/cli python api_cli.py # In another terminal (English)

or

python api_cli_ZH.py # In another terminal (Chinese) ```
The CLI provides a Rich-based beautiful interface with file upload support and real-time streaming responses.
Supports both English and Chinese interfaces .

[!TIP]

Clone this repository to deploy DeepAnalyze locally as your data analyst, completing any data science tasks without any workflow or closed-source APIs.

🔥 The UI of the demo is an initial version. Welcome to further develop it, and we will include you as a contributor.

🚀 Quick Start

🔑 Use the DeepAnalyze API

API keys are now available!

To request your key, please fill out one of the following application forms: * Primary Form (Google) * Alternative Form (Feishu)

📚 For comprehensive usage instructions, please refer to the API guide:

Documentation
Feishu Wiki

Model Download

Download model in RUC-DataLab/DeepAnalyze-8B · Hugging Face or DeepAnalyze-8B · 模型库

📊 Memory Configuration Recommended Parameters Table

GPU Memory	Model Type	Recommended max-model-len	Use FP8 KV Cache
16GB	8-bit Quantized	8192	✓
16GB	4-bit Quantized	49152	✓
24GB	Original Model	16384	✓
24GB	8-bit Quantized	98304	✓
24GB	4-bit Quantized	131072	✓
40GB	Original Model	131072	✓
40GB	8-bit Quantized	131072
80GB	Original Model	131072

To obtain the quantized model, you can use ./quantize.py .

🚀 vLLM Launch Command Template

General Command Template

python -m vllm.entrypoints.openai.api_server \
  --model <model_path> \
  --served-model-name DeepAnalyze-8B \
  --max-model-len <select_from_table_above> \
  --gpu-memory-utilization 0.95 \
  --port 8000 \
  <add_fp8_if_required> \
  --trust-remote-code

Command Examples by Scenario

Scenario 1: 16GB GPU Memory Users (Recommended: 4-bit Quantized Version)

python -m vllm.entrypoints.openai.api_server \
  --model /path/to/deepanalyze/4bit \
  --served-model-name DeepAnalyze-8B \
  --max-model-len 49152 \
  --gpu-memory-utilization 0.95 \
  --port 8000 \
  --kv-cache-dtype fp8 \
  --trust-remote-code

Scenario 2: 24GB GPU Memory Users (For Maximum Context Length)

python -m vllm.entrypoints.openai.api_server \
  --model /path/to/deepanalyze/4bit \
  --served-model-name DeepAnalyze-8B \
  --max-model-len 131072 \
  --gpu-memory-utilization 0.95 \
  --port 8000 \
  --kv-cache-dtype fp8 \
  --trust-remote-code

Scenario 3: 80GB GPU Memory Users (Best Performance)

python -m vllm.entrypoints.openai.api_server \
  --model /path/to/original/model \
  --served-model-name DeepAnalyze-8B \
  --max-model-len 131072 \
  --gpu-memory-utilization 0.95 \
  --port 8000 \
  --trust-remote-code

Quick Selection Guide

Limited Memory (<24GB): Use 4-bit Quantized Version + FP8 KV Cache
Balanced Configuration (24-40GB): Choose model type based on requirements
Sufficient Memory (≥40GB): Use Original Model for best precision

After launching, the API service can be accessed via http://localhost:8000/v1/completions.

Requirements

Install packages: torch, transformers, vllm>=0.8.5 ```bash conda create -n deepanalyze python=3.12 -y conda activate deepanalyze pip install -r requirements.txt

For training

(cd ./deepanalyze/ms-swift/ && pip install -e .) (cd ./deepanalyze/SkyRL/ && pip install -e .) `` - [requirements.txt`](requirements.txt) lists the minimal dependencies required for DeepAnalyze inference. For

Extension points exported contracts — how you extend this code

FileIconProps (Interface)

(no doc)

demo/chat/frontend/types/react-file-icon.d.ts

FileIconProps (Interface)

(no doc)

demo/chat_v2/frontend/types/react-file-icon.d.ts

Message (Interface)

(no doc)

demo/chat/frontend/components/three-panel-interface.tsx

Message (Interface)

(no doc)

demo/chat_v2/frontend/components/three-panel-interface.tsx

FileAttachment (Interface)

(no doc)

demo/chat/frontend/components/three-panel-interface.tsx

FileAttachment (Interface)

(no doc)

demo/chat_v2/frontend/components/three-panel-interface.tsx

WorkspaceFile (Interface)

(no doc)

demo/chat/frontend/components/three-panel-interface.tsx

WorkspaceFile (Interface)

(no doc)

demo/chat_v2/frontend/components/three-panel-interface.tsx

Core symbols most depended-on inside this repo

get

called by 1263

deepanalyze/SkyRL/skyrl-train/skyrl_train/utils/ppo_utils.py

append

called by 1018

deepanalyze/ms-swift/swift/utils/io_utils.py

called by 274

deepanalyze/SkyRL/skyrl-train/skyrl_train/training_batch.py

element

called by 253

deepanalyze/ms-swift/swift/ui/base.py

called by 229

demo/chat/frontend/lib/utils.ts

called by 229

demo/chat_v2/frontend/lib/utils.ts

append

called by 222

deepanalyze/SkyRL/skyrl-train/skyrl_train/dataset/replay_buffer.py

register_model

called by 194

deepanalyze/ms-swift/swift/llm/model/register.py

Shape

Function 3,246

Method 2,779

Class 896

Route 86

Interface 20

Languages

Python90%

TypeScript10%

Modules by API surface

deepanalyze/SkyRL/skyrl-train/import_utils.py189 symbols

deepanalyze/ms-swift/swift/llm/template/base.py91 symbols

deepanalyze/ms-swift/swift/llm/dataset/dataset/mllm.py83 symbols

deepanalyze/ms-swift/swift/llm/dataset/dataset/llm.py65 symbols

demo/chat_v2/frontend/components/three-panel-interface.tsx64 symbols

deepanalyze/ms-swift/swift/trainers/rlhf_trainer/grpo_trainer.py61 symbols

deepanalyze/SkyRL/skyrl-train/skyrl_train/workers/worker.py59 symbols

deepanalyze/ms-swift/swift/llm/template/template/qwen.py57 symbols

deepanalyze/ms-swift/tests/test_align/test_template/test_vision.py53 symbols

demo/chat/frontend/components/three-panel-interface.tsx51 symbols

demo/chat/backend.py47 symbols

deepanalyze/ms-swift/examples/train/grpo/plugin/plugin.py47 symbols

Dependencies from manifests, versioned

@hookform/resolvers3.10.0 · 1×

@monaco-editor/react4.7.0 · 1×

@radix-ui/react-accordion1.2.2 · 1×

@radix-ui/react-alert-dialog1.1.4 · 1×

@radix-ui/react-aspect-ratio1.1.1 · 1×

@radix-ui/react-avatar1.1.2 · 1×

@radix-ui/react-checkbox1.1.3 · 1×

@radix-ui/react-collapsible1.1.2 · 1×

@radix-ui/react-context-menu2.2.4 · 1×

@radix-ui/react-dialog1.1.4 · 1×

@radix-ui/react-dropdown-menu2.1.4 · 1×

@radix-ui/react-hover-card1.1.4 · 1×

For agents

$ claude mcp add DeepAnalyze \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/ruc-datalab/DeepAnalyze @main sqlite

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

🔥 News

🖥 Demo

WebUI

stop the api and interface

stop vllm if needed

WebUI v2

stop the api and interface

stop vllm if needed

JupyterUI

CLI

or

🚀 Quick Start

🔑 Use the DeepAnalyze API

Model Download

📊 Memory Configuration Recommended Parameters Table

🚀 vLLM Launch Command Template

General Command Template

Command Examples by Scenario

Quick Selection Guide

Requirements

For training

Extension points exported contracts — how you extend this code

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents