hub / github.com/BIT-DataLab/Edit-Banana

github.com/BIT-DataLab/Edit-Banana @main sqlite

365 symbols 1,197 edges 44 files 262 documented · 72%

README

Edit Banana Logo

🍌 Edit Banana

中文 | English

Universal Content Re-Editor: Make the Uneditable, Editable

Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Try It Now!

👆 Click above or https://www.editbanana.net/ to try Edit Banana online! Upload an image to get editable DrawIO (XML) in seconds.

[!WARNING] Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.

💬 Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

WeChat Group QR Code

Scan to join the Edit Banana community

[!TIP] If the QR code has expired, please submit an Issue to request an updated one.

👨‍🏫 Leader

Guoren Wang

Professor · Doctoral Supervisor

Database Systems Uncertain Data Management Multimedia Data Management Distributed Query Processing

Homepage →

Ye Yuan

Professor · Doctoral Supervisor

Big Data Management Graph Data Management Spatio-temporal Data Distributed Computing

Homepage →

Chengliang Chai

Associate Professor · Doctoral Supervisor

Data-centric AI Large Language Models Data Lakes Database Systems

Homepage →

📮 Contact Us

For academic cooperation, technical docking, commercial licensing, project customization and other business inquiries, please contact us via email:

E-mail: ccl@bit.edu.cn

📸 Effect Demonstration

High-Definition Input-Output Comparison (4 Typical Scenarios)

To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 4 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario 1: Figures to DrawIO

🔒 Original Static Diagram (Input · Non-editable)	🔓 DrawIO Reconstruction Result (Output · Fully Editable)

Example 1: Basic Flowchart

Original Diagram 1 |

✨ Editable Flowchart

Example 2: Multi-level Architecture

Original Diagram 2 |

✨ Editable Architecture

Example 3: Technical Schematic

Original Diagram 3 |

✨ Editable Schematic

Example 4: Scientific Formula

Original Diagram 4 |

✨ Editable Formula

Reconstruction Result 4 |

Scenario 2: Human in the Loop Modification

_{✨ Manual repair}

_{✨ Save locally}

[!NOTE] ✨ Conversion Highlights: 1. Preserves the layout logic, color matching, and element hierarchy of the original diagram. 2. 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness). 3. Accurate text recognition, supporting direct subsequent editing and format adjustment. 4. All elements are independently selectable, supporting native DrawIO template replacement and layout optimization.

🚀 Key Features

Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.
Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs.
Text Recognition:
Local OCR for text localization; easy to install, runs offline.
Pix2Text for mathematical formula recognition and LaTeX conversion .
Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to the formula engine.
User System:
Registration: New users receive 10 free credits.
Credit System: Pay-per-use model prevents resource abuse.
Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.

🛠️ Architecture Pipeline

Input: Image (PNG/JPG/BMP/TIFF/WebP).
Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
Text Extraction (Parallel):
- Local OCR (Tesseract) detects text bounding boxes.
- High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion.
DrawIO XML Generation: Merging spatial data from SAM3 and text OCR results.

📂 Project Structure

Click to expand project structure

text Edit-Banana/ ├── config/ # Configuration files (copy config.yaml.example → config.yaml) ├── flowchart_text/ # OCR & Text Extraction Module (standalone entry) │ ├── src/ │ └── main.py # OCR-only entry point ├── input/ # [Manual] Input images directory ├── models/ # [Manual] Model weights (SAM3) and optional BPE vocab ├── output/ # [Manual] Results directory ├── sam3/ # SAM3 library (see Installation: install from facebookresearch/sam3) ├── sam3_service/ # SAM3 HTTP service (optional, for multi-process deployment) ├── scripts/ # Setup and utility scripts │ ├── setup_sam3.sh # Install SAM3 lib and copy BPE to models/ │ ├── setup_rmbg.py # Download RMBG model from ModelScope │ └── merge_xml.py # XML merge utilities ├── main.py # CLI entry (modular pipeline) ├── server_pa.py # FastAPI backend server └── requirements.txt # Python dependencies

📦 Installation & Setup

Follow these core phases to set up the project locally.

Phase 1: Environment & Base Setup

Configure your base environment and directory structure.

1. Prerequisites & Environment

Python 3.10+** & CUDA-capable GPU (Highly recommended)
Install PyTorch with CUDA support (e.g., for CUDA 11.8):

bash pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

2. Clone Repository & Init Directories

bash git clone https://github.com/BIT-DataLab/Edit-Banana.git cd Edit-Banana mkdir -p input output sam3_output

Phase 2: Models & Core Dependencies

Next, install the required packages and download necessary model weights (which should be placed in models/ and not committed).

1. Base Dependencies

pip install -r requirements.txt

2. SAM3 & Model Assets

SAM3 Library & BPE: Run bash scripts/setup_sam3.shto install the lib and copy the BPE vocab to models/. Verify with:

bash python -c "from sam3.model_builder import build_sam3_image_model; print('OK')"

SAM3 Weights: Download sam3.pt from ModelScope or Hugging Face and place it under models/sam3_ms.
Text Local OCR (Tesseract):

bash sudo apt install tesseract-ocr tesseract-ocr-chi-sim

🧩 Optional Capabilities (OCR Engine, Formula, RMBG) - Click to expand

PaddleOCR (Alternative/Better for mixed text): Use paddlepaddle==3.2.2 (avoiding 3.3.0 bug).

bash pip install paddlepaddle==3.2.2 paddleocr.

Formula (Pix2Text):

bash pip install pix2text onnxruntime-gpu.

Background Removal (RMBG): pip install onnxruntime modelscope then run python scripts/setup_rmbg.py.

Phase 3: Configuration & Troubleshooting

1. Final Configuration

Copy the example config and adjust the asset paths:

bash cp config/config.yaml.example config/config.yaml

Edit config.yaml to ensure sam3.checkpoint_path and sam3.bpe_path match your models/ locations.

🛠️ Before First Run Checklist & Troubleshooting - Click to expand

Checklist:

[ ] Config files copied and model paths set in config.yaml
[ ] SAM3 weights (sam3.pt) and BPE vocab placed under models/
[ ] Extracted SAM3 library via scripts/setup_sam3.sh Tesseract or PaddleOCR installed

Common Issues:

"no kernel image is available...": GPU arch mismatch. Upgrade PyTorch or set sam3.device: "cpu".
"Model file not found at ...rmbg/...": RMBG is optional. Enable by downloading via script.
"PaddleOCR inference failed...": Use paddlepaddle==3.2.2 or fallback to Tesseract.

🔤 Usage

Command Line Interface (CLI)

Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory. For batch processing, put images in input/ and run python main.py without -i.

Run and test locally

One-time setup

bash git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana python3 -m venv .venv && source .venv/bin/activate # Linux/macOS; Windows: .venv\Scripts\activate pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 # or CPU build pip install -r requirements.txt sudo apt install tesseract-ocr tesseract-ocr-chi-sim # OCR (or equivalent on your OS)

Install the SAM3 library and download model weights + BPE. Then:

bash mkdir -p input output cp config/config.yaml.example config/config.yaml # Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths

Test with CLI

bash # Put a diagram image in input/, e.g. input/test.png python main.py -i input/test.png # Output appears under output/<image_stem>/ (DrawIO XML and intermediates)

Optional: test the web API

bash python server_pa.py # In another terminal: curl -X POST http://localhost:8000/convert -F "file=@input/test.png" # Or open http://localhost:8000/docs and use the /convert endpoint with a file upload

⚙️ Configuration

Customize the pipeline behavior in config/config.yaml:

sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
paths: Set input/output directories.
dominant_color: Fine-tune color extraction se

Core symbols most depended-on inside this repo

modules/data_types.py

has_xml

called by 9

modules/data_types.py

find

called by 8

modules/metric_evaluator.py

modules/sam3_info_extractor.py

modules/basic_shape_processor.py

Shape

Method 242

Function 66

Class 50

Route 7

Languages

Python100%

Modules by API surface

modules/sam3_info_extractor.py46 symbols

modules/metric_evaluator.py27 symbols

modules/basic_shape_processor.py25 symbols

modules/data_types.py23 symbols

modules/icon_picture_processor.py21 symbols

modules/base.py20 symbols

modules/text/restorer.py19 symbols

sam3_service/server.py17 symbols

modules/xml_merger.py16 symbols

sam3_service/rmbg_server.py14 symbols

main.py13 symbols

modules/text/processors/style.py12 symbols

Dependencies from manifests, versioned

fastapi0.110.0 · 1×

requests2.31.0 · 1×

For agents

$ claude mcp add Edit-Banana \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact