hub / github.com/PaddlePaddle/FastDeploy

github.com/PaddlePaddle/FastDeploy @v2.5.0 sqlite

repository ↗ · DeepWiki ↗ · release v2.5.0 ↗

12,004 symbols 47,008 edges 1,004 files 6,255 documented · 52%

README

<a href=""><img src="https://img.shields.io/badge/python-3.10-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux-pink.svg"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/FastDeploy?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/FastDeploy?color=3af"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/FastDeploy?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/FastDeploy?color=ccf"></a>








 <a href="https://trendshift.io/repositories/4046" target="_blank"><img src="https://trendshift.io/api/badge/repositories/4046" alt="PaddlePaddle%2FFastDeploy | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>


<a href="https://paddlepaddle.github.io/FastDeploy/get_started/installation/nvidia_gpu/"><b> Installation </b></a>
|
<a href="https://paddlepaddle.github.io/FastDeploy/get_started/quick_start"><b> Quick Start </b></a>
|
<a href="https://paddlepaddle.github.io/FastDeploy/supported_models/"><b> Supported Models </b></a>

FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

News

[2025-11] FastDeploy v2.3 is newly released! It adds deployment support for two major models, ERNIE-4.5-VL-28B-A3B-Thinking and PaddleOCR-VL-0.9B, across multiple hardware platforms. It further optimizes comprehensive inference performance and brings more deployment features and usability enhancements. For all the upgrade details, refer to the v2.3 Release Note.

[2025-09] FastDeploy v2.2: It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for baidu/ERNIE-21B-A3B-Thinking!

About

FastDeploy is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers production-ready, out-of-the-box deployment solutions with core acceleration technologies:

🚀 Load-Balanced PD Disaggregation: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput.
🔄 Unified KV Cache Transmission: Lightweight high-performance transport library with intelligent NVLink/RDMA selection.
🤝 OpenAI API Server and vLLM Compatible: One-command deployment with vLLM interface compatibility.
🧮 Comprehensive Quantization Format Support: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more.
⏩ Advanced Acceleration Techniques: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill.
🖥️ Multi-Hardware Support: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Iluvatar GPU, Enflame GCU, MetaX GPU, Intel Gaudi etc.

Requirements

OS: Linux
Python: 3.10 ~ 3.12

Installation

FastDeploy supports inference deployment on NVIDIA GPUs, Kunlunxin XPUs, Iluvatar GPUs, Enflame GCUs, Hygon DCUs and other hardware. For detailed installation instructions:

Get Started

Learn how to use FastDeploy through our documentation: - 10-Minutes Quick Deployment - ERNIE-4.5 Large Language Model Deployment - ERNIE-4.5-VL Multimodal Model Deployment - Offline Inference Development - Online Service Deployment - Best Practices

Supported Models

Learn how to download models, enable using the torch format, and more: - Full Supported Models List

Advanced Usage

Acknowledgement

FastDeploy is licensed under the Apache-2.0 open-source license. During development, portions of vLLM code were referenced and incorporated to maintain interface compatibility, for which we express our gratitude.

Extension points exported contracts — how you extend this code

TokenizerClient (Interface)

Abstract remote tokenizer interface [1 implementers]

fastdeploy/golang_router/internal/scheduler/handler/tokenizer.go

ManagerAPI (Interface)

(no doc) [2 implementers]

fastdeploy/golang_router/internal/common/interface.go

SelectStrategyFunc (FuncType)

(no doc)

fastdeploy/golang_router/internal/scheduler/common/types.go

PromptExtractor (FuncType)

(no doc)

fastdeploy/golang_router/internal/gateway/completions.go

Core symbols most depended-on inside this repo

called by 1827

fastdeploy/trace/trace_logger.py

info

called by 1053

fastdeploy/engine/resource_manager.py

get

called by 1042

fastdeploy/engine/request.py

get

called by 368

fastdeploy/inter_communicator/fmq.py

split

called by 287

fastdeploy/model_executor/layers/normalization.py

sleep

called by 264

fastdeploy/worker/gpu_worker.py

pop

called by 252

fastdeploy/worker/input_batch.py

create

called by 227

fastdeploy/inter_communicator/fmq.py

Shape

Method 7,627

Function 2,400

Class 1,595

Route 350

Struct 25

TypeAlias 3

FuncType 2

Interface 2

Languages

Python98%

Go2%

Modules by API surface

tests/entrypoints/openai/test_run_batch.py122 symbols

tests/scheduler/test_splitwise_scheduler.py113 symbols

fastdeploy/config.py112 symbols

tests/entrypoints/test_engine_client.py110 symbols

tests/model_executor/test_paddleformers_base.py97 symbols

fastdeploy/engine/request.py97 symbols

tests/cache_manager/test_prefix_cache_manager.py95 symbols

tests/metrics/test_trace.py90 symbols

tests/input/test_paddleocr_vl_processor.py87 symbols

tests/engine/test_common_engine.py87 symbols

tests/input/v1/test_paddleocr_vl_processor.py86 symbols

fastdeploy/utils.py84 symbols

Dependencies from manifests, versioned

github.com/beorn7/perksv1.0.1 · 1×

github.com/bytedance/sonicv1.9.1 · 1×

github.com/cespare/xxhash/v2v2.3.0 · 1×

github.com/chenzhuoyu/base64xv0.0.0-2022111506244 · 1×

github.com/davecgh/go-spewv1.1.1 · 1×

github.com/gabriel-vasile/mimetypev1.4.2 · 1×

github.com/gin-contrib/ssev0.1.0 · 1×

github.com/gin-gonic/ginv1.9.1 · 1×

github.com/go-playground/localesv0.14.1 · 1×

github.com/go-playground/universal-translatorv0.18.1 · 1×

github.com/go-playground/validator/v10v10.14.0 · 1×

github.com/goccy/go-jsonv0.10.2 · 1×

For agents

$ claude mcp add FastDeploy \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact