MCPcopy Index your code
hub / github.com/PaddlePaddle/FastDeploy

github.com/PaddlePaddle/FastDeploy @v2.5.0 sqlite

repository ↗ · DeepWiki ↗ · release v2.5.0 ↗
12,004 symbols 47,008 edges 1,004 files 6,255 documented · 52%
README

English | 简体中文

<a href=""><img src="https://img.shields.io/badge/python-3.10-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux-pink.svg"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/FastDeploy?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/FastDeploy?color=3af"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/FastDeploy?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/FastDeploy?color=ccf"></a>








 <a href="https://trendshift.io/repositories/4046" target="_blank"><img src="https://trendshift.io/api/badge/repositories/4046" alt="PaddlePaddle%2FFastDeploy | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>


<a href="https://paddlepaddle.github.io/FastDeploy/get_started/installation/nvidia_gpu/"><b> Installation </b></a>
|
<a href="https://paddlepaddle.github.io/FastDeploy/get_started/quick_start"><b> Quick Start </b></a>
|
<a href="https://paddlepaddle.github.io/FastDeploy/supported_models/"><b> Supported Models </b></a>

FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

News

[2025-11] FastDeploy v2.3 is newly released! It adds deployment support for two major models, ERNIE-4.5-VL-28B-A3B-Thinking and PaddleOCR-VL-0.9B, across multiple hardware platforms. It further optimizes comprehensive inference performance and brings more deployment features and usability enhancements. For all the upgrade details, refer to the v2.3 Release Note.

[2025-09] FastDeploy v2.2: It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for baidu/ERNIE-21B-A3B-Thinking!

About

FastDeploy is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers production-ready, out-of-the-box deployment solutions with core acceleration technologies:

  • 🚀 Load-Balanced PD Disaggregation: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput.
  • 🔄 Unified KV Cache Transmission: Lightweight high-performance transport library with intelligent NVLink/RDMA selection.
  • 🤝 OpenAI API Server and vLLM Compatible: One-command deployment with vLLM interface compatibility.
  • 🧮 Comprehensive Quantization Format Support: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more.
  • Advanced Acceleration Techniques: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill.
  • 🖥️ Multi-Hardware Support: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Iluvatar GPU, Enflame GCU, MetaX GPU, Intel Gaudi etc.

Requirements

  • OS: Linux
  • Python: 3.10 ~ 3.12

Installation

FastDeploy supports inference deployment on NVIDIA GPUs, Kunlunxin XPUs, Iluvatar GPUs, Enflame GCUs, Hygon DCUs and other hardware. For detailed installation instructions:

Get Started

Learn how to use FastDeploy through our documentation: - 10-Minutes Quick Deployment - ERNIE-4.5 Large Language Model Deployment - ERNIE-4.5-VL Multimodal Model Deployment - Offline Inference Development - Online Service Deployment - Best Practices

Supported Models

Learn how to download models, enable using the torch format, and more: - Full Supported Models List

Advanced Usage

Acknowledgement

FastDeploy is licensed under the Apache-2.0 open-source license. During development, portions of vLLM code were referenced and incorporated to maintain interface compatibility, for which we express our gratitude.

Extension points exported contracts — how you extend this code

TokenizerClient (Interface)
Abstract remote tokenizer interface [1 implementers]
fastdeploy/golang_router/internal/scheduler/handler/tokenizer.go
ManagerAPI (Interface)
(no doc) [2 implementers]
fastdeploy/golang_router/internal/common/interface.go
SelectStrategyFunc (FuncType)
(no doc)
fastdeploy/golang_router/internal/scheduler/common/types.go
PromptExtractor (FuncType)
(no doc)
fastdeploy/golang_router/internal/gateway/completions.go

Core symbols most depended-on inside this repo

print
called by 1827
fastdeploy/trace/trace_logger.py
info
called by 1053
fastdeploy/engine/resource_manager.py
get
called by 1042
fastdeploy/engine/request.py
get
called by 368
fastdeploy/inter_communicator/fmq.py
split
called by 287
fastdeploy/model_executor/layers/normalization.py
sleep
called by 264
fastdeploy/worker/gpu_worker.py
pop
called by 252
fastdeploy/worker/input_batch.py
create
called by 227
fastdeploy/inter_communicator/fmq.py

Shape

Method 7,627
Function 2,400
Class 1,595
Route 350
Struct 25
TypeAlias 3
FuncType 2
Interface 2

Languages

Python98%
Go2%

Modules by API surface

tests/entrypoints/openai/test_run_batch.py122 symbols
tests/scheduler/test_splitwise_scheduler.py113 symbols
fastdeploy/config.py112 symbols
tests/entrypoints/test_engine_client.py110 symbols
tests/model_executor/test_paddleformers_base.py97 symbols
fastdeploy/engine/request.py97 symbols
tests/cache_manager/test_prefix_cache_manager.py95 symbols
tests/metrics/test_trace.py90 symbols
tests/input/test_paddleocr_vl_processor.py87 symbols
tests/engine/test_common_engine.py87 symbols
tests/input/v1/test_paddleocr_vl_processor.py86 symbols
fastdeploy/utils.py84 symbols

Dependencies from manifests, versioned

github.com/beorn7/perksv1.0.1 · 1×
github.com/cespare/xxhash/v2v2.3.0 · 1×
github.com/chenzhuoyu/base64xv0.0.0-2022111506244 · 1×
github.com/gin-contrib/ssev0.1.0 · 1×
github.com/go-playground/localesv0.14.1 · 1×
github.com/go-playground/universal-translatorv0.18.1 · 1×

For agents

$ claude mcp add FastDeploy \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact