MCPcopy
hub / github.com/HKUDS/Paper2Slides

github.com/HKUDS/Paper2Slides @main sqlite

repository ↗ · DeepWiki ↗
456 symbols 1,699 edges 68 files 338 documented · 74%
README

Paper2Slides Logo

Paper2Slides: From Paper to Presentation in One Click

Python License Feishu WeChat

Never Build Slides from Scratch Again

| 📄 Universal File Support  |  🎯 RAG-Powered Precision  |  🎨 Custom Styling  |  ⚡ Lightning Speed |


🎯 What is Paper2Slides?

Turns your research papers, reports, and documents into professional slides & posters in minutes.

✨ Key Features

  • 📄 Universal Document Support

Seamlessly process PDF, Word, Excel, PowerPoint, Markdown, and multiple file formats simultaneously.

  • 🎯 Comprehensive Content Extraction

RAG-powered mechanism ensures every critical insight, figure, and data point is captured with precision.

  • 🔗 Source-Linked Accuracy

Maintains direct traceability between generated content and original sources, eliminating information drift.

  • 🎨 Custom Styling Freedom

Choose from professional built-in themes or describe your vision in natural language for custom styling.

  • Lightning-Fast Generation

Instant preview mode enables rapid experimentation and real-time refinements.

  • 💾 Seamless Session Management

Advanced checkpoint system preserves all progress—pause, resume, or switch themes instantly without loss.

  • Professional-Grade Visuals

Deliver polished, presentation-ready slides and posters with publication-quality design standards.

⚡ Easy as One Command

# One command to generate slides from a paper
python -m paper2slides --input paper.pdf --output slides --style doraemon --length medium --fast --parallel 2

🔥 News

  • [2025.12.09] Added parallel slide generation (--parallel) for faster processing
  • [2025.12.08] Paper2Slides is now open source!

🎨 Custom Styling Showcase

doraemon academic custom
doraemon academic custom

✨ Multiple styles available — simply modify the --style parameter

Examples from DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

💡 Custom Style Example: Totoro Theme

--style "Studio Ghibli anime style with warm whimsical aesthetic. Use soft watercolor Morandi tones with light cream background, muted sage green and dusty pink accents. Totoro character can appear as a friendly guide relating to the content, with nature elements like soft clouds or leaves."

🌐 Paper2Slides Web Interface


📋 Table of Contents


🏃 Quick Start

1. Environment Setup

# Clone repository
git clone https://github.com/HKUDS/Paper2Slides.git
cd Paper2Slides

# Create and activate conda environment
conda create -n paper2slides python=3.12 -y
conda activate paper2slides

# Install dependencies
pip install -r requirements.txt

[!NOTE] Create a .env file in paper2slides/ directory with your API keys. Refer to paper2slides/.env.example for the required variables.

2. Command Line Usage

# Basic usage - generate slides from a paper
python -m paper2slides --input paper.pdf --output slides --length medium

# Generate poster with custom style
python -m paper2slides --input paper.pdf --output poster --style "minimalist with blue theme" --density medium

# Fast mode
python -m paper2slides --input paper.pdf --output slides --fast

# Enable parallel generation (2 workers by default)
python -m paper2slides --input paper.pdf --output slides --parallel 2

# List all processed outputs
python -m paper2slides --list

CLI Options:

Option Description Default
--input, -i Input file(s) or directory Required
--output Output type: slides or poster poster
--content Content type: paper or general paper
--style Style: academic, doraemon, or custom doraemon
--length Slides length: short, medium, long short
--density Poster density: sparse, medium, dense medium
--fast Fast mode: skip RAG indexing false
--parallel Enable parallel slide generation: --parallel uses 2 workers, --parallel N uses N workers 1 (sequential without this option)
--from-stage Force restart from stage: rag, summary, plan, generate Auto-detect
--debug Enable debug logging false

💾 Checkpoint & Resume:

Paper2Slides intelligently saves your progress at every key stage, allowing you to:

Scenario Command
Resume after interruption Just run the same command again — it auto-detects and continues
Change style only Add --from-stage plan to skip re-parsing
Regenerate images Add --from-stage generate to keep the same plan
Full restart Add --from-stage rag to start from scratch

[!TIP] Checkpoints are auto-saved. Just run the same command to resume. Use --from-stage only to force restart from a specific stage.

3. Web Interface

Launch both backend and frontend services:

./scripts/start.sh

Or start services independently:

# Terminal 1: Start backend API
./scripts/start_backend.sh

# Terminal 2: Start frontend
./scripts/start_frontend.sh

Access the web interface at http://localhost:5173 (default)


🏗️ Paper2Slides Framework

Paper2Slides transforms documents through a 4-stage pipeline designed for reliability and efficiency:

Stage Description Checkpoint Output
🔍 RAG Parse documents and construct intelligent retrieval index using RAG checkpoint_rag.json Searchable knowledge base
📊 Analysis Extract document structure, identify key figures, tables, and content hierarchy checkpoint_summary.json Structured content map
📋 Planning Generate optimized content layout and slide/poster organization strategy checkpoint_plan.json Presentation blueprint
🎨 Creation Render final high-quality slides and poster visuals Output directory Polished presentation materials

💾 Smart Recovery System

Each stage automatically saves progress checkpoints, enabling seamless resumption from any point if the process is interrupted—no need to start over.

Fast Mode vs Normal Mode

Mode Processing Pipeline Use Cases
Normal Complete RAG indexing with deep document analysis Complex research papers, lengthy documents, multi-section content
Fast Skip RAG indexing, direct LLM query Short documents, instant previews, quick revisions

Use --fast when: - Document (text + figures) is short enough to fit in LLM context - Quick preview/iteration needed - Don't want to wait for RAG indexing

Use normal mode (default) when: - Document is long or has many figures - Multiple files to process together - Need retrieval for better context selection


⚙️ Configuration

Output Directory Structure

outputs/
├── <project_name>/
│   ├── <content_type>/                   # paper or general
│   │   ├── <mode>/                       # fast or normal
│   │   │   ├── checkpoint_rag.json       # RAG query results & parsed file paths
│   │   │   ├── checkpoint_summary.json   # Extracted content, figures, tables
│   │   │   ├── summary.md                # Human-readable summary
│   │   │   └── <config_name>/            # e.g., slides_doraemon_medium
│   │   │       ├── state.json            # Current pipeline state
│   │   │       ├── checkpoint_plan.json  # Content plan for slides/poster
│   │   │       └── <timestamp>/          # Generated outputs
│   │   │           ├── slide_01.png
│   │   │           ├── slide_02.png
│   │   │           ├── ...
│   │   │           └── slides.pdf        # Final PDF output
│   │   └── rag_output/                   # RAG index storage
│   └── ...
└── ...

Checkpoint Files: | File | Description | Reusable When | |------|-------------|---------------| | checkpoint_rag.json | Parsed document content | Same input files | | checkpoint_summary.json | Figures, tables, structure | Same input files | | checkpoint_plan.json | Content layout plan | Same style & length/density |

Style Configuration

Style Description
academic Clean, professional academic presentation style
doraemon Colorful, friendly style with illustrations
custom Any text description for LLM-generated style

Image Generation Providers

  • Set IMAGE_GEN_PROVIDER in paper2slides/.env to choose the backend:
  • openrouter (default): uses IMAGE_GEN_API_KEY, IMAGE_GEN_BASE_URL, and IMAGE_GEN_MODEL (default google/gemini-3-pro-image-preview)
  • google: uses the official Gemini API at GOOGLE_GENAI_BASE_URL (default https://generativelanguage.googleapis.com/v1beta), IMAGE_GEN_API_KEY, IMAGE_GEN_MODEL (default models/gemini-3-pro-image-preview, must be image-capable), and IMAGE_GEN_RESPONSE_MIME_TYPE (default text/plain; use text types if your model does not support image responses)
  • Reference figures are sent as inline data when supported (Google) or as image_url attachments (OpenRouter).

Image Generation Notes

[!TIP] By default Paper2Slides uses gemini-3-pro-image-preview (OpenRouter) for image generation; you can switch to an image-capable Google Gemini model (e.g., models/gemini-1.5-flash) via IMAGE_GEN_PROVIDER=google. Key findings:

  • Mood Keywords: Words like "warm", "elegant", "vibrant" strongly influence the overall color palette
  • Layout vs Style: Fine-grained layout instructions ground well; fine-grained element styling does not
  • Prompt Length: Simple prompts generally outperform detailed ones
  • Multi-slide Generation: Native multi-image output is story-like; for consistent slides, we use iterative single-image generation

📁 Code Structure

Module Description
paper2slides/core/ Pipeline orchestration, 4-stage execution
paper2slides/raganything/ Document parsing & RAG indexing
paper2slides/summary/ Content extraction: figures, tables, paper structure
paper2slides/generator/ Content planning & image generation
api/ FastAPI backend for web interface
frontend/ React frontend (Vite + TailwindCSS)

Click to expand full project structure

Paper2Slides/
├── paper2slides/                 # Core library
│   ├── main.py                   # CLI entry point
│   ├── core/
│   │   ├── pipeline.py           # Main pipeline orchestration
│   │   ├── state.py              # Checkpoint state management
│   │   └── stages/
│   │       ├── rag_stage.py      # Stage 1: Parse & index
│   │       ├── summary_stage.py  # Stage 2: Extract content
│   │       ├── plan_stage.py     # Stage 3: Plan layout
│   │       └── generate_stage.py # Stage 4: Generate images
│   │
│   ├── raganything/
│   │   ├── raganything.py        # RAG processor
│   │   └── parser.py             # Document parser
│   │
│   ├── summary/
│   │   ├── paper.py              # Paper structure extraction
│   │   └── extractors/           # Figure/table extractors
│   │
│   ├── generator/
│   │   ├── content_planner.py    # Slide/poster planning
│   │   └── image_generator.py    # Image generation
│   │
│   ├── prompts/                  # LLM prompt templates
│   └── utils/                    # Utilities
│
├── api/server.py                 # FastAPI backend
├── frontend/src/                 # React frontend
└── scripts/                      # Shell scripts (start/stop)

🙏 Related Open-Sourced Projects


**🌟Found Paper

Core symbols most depended-on inside this repo

generateId
called by 13
frontend/src/components/ChatWindow.jsx
_get_rag
called by 11
paper2slides/rag/client.py
_ensure_lightrag_initialized
called by 9
paper2slides/raganything/batch.py
save_state
called by 8
paper2slides/core/state.py
check_installation
called by 6
paper2slides/raganything/parser.py
aquery
called by 6
paper2slides/raganything/query.py
load_json
called by 5
paper2slides/utils/file_utils.py
get_project_name
called by 5
paper2slides/utils/path_utils.py

Shape

Method 239
Function 155
Class 54
Route 8

Languages

Python85%
TypeScript15%

Modules by API surface

paper2slides/raganything/modalprocessors.py47 symbols
paper2slides/raganything/parser.py35 symbols
paper2slides/rag/client.py27 symbols
api/server.py27 symbols
paper2slides/raganything/processor.py25 symbols
paper2slides/generator/image_generator.py21 symbols
paper2slides/generator/content_planner.py20 symbols
paper2slides/raganything/query.py17 symbols
paper2slides/summary/models.py15 symbols
paper2slides/raganything/raganything.py15 symbols
paper2slides/raganything/enhanced_markdown.py13 symbols
frontend/src/components/ChatWindow.jsx13 symbols

Dependencies from manifests, versioned

@vitejs/plugin-react4.0.0 · 1×
autoprefixer10.4.14 · 1×
axios1.4.0 · 1×
lucide-react0.263.1 · 1×
postcss8.4.24 · 1×
react18.2.0 · 1×
react-dom18.2.0 · 1×
tailwindcss3.3.2 · 1×
vite4.3.9 · 1×
Pillow10.0.0 · 1×
fastapi0.122 · 1×
openai1.0.0 · 1×

For agents

$ claude mcp add Paper2Slides \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact