MCPcopy
hub / github.com/ModelTC/LightLLM

github.com/ModelTC/LightLLM @v1.1.0 sqlite

repository ↗ · DeepWiki ↗ · release v1.1.0 ↗
3,497 symbols 14,196 edges 644 files 265 documented · 8%
README
<img alt="LightLLM" src="https://github.com/ModelTC/LightLLM/raw/v1.1.0/assets/logo_new.png" width=90%>

docs Docker stars visitors Discord Banner license

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. LightLLM harnesses the strengths of numerous well-regarded open-source implementations, including but not limited to FasterTransformer, TGI, vLLM, and FlashAttention.

English Docs | 中文文档 | Blogs

News

  • [2025/09] 🔥 LightLLM v1.1.0 release!
  • [2025/08] Pre $^3$ achieves the outstanding paper award of ACL2025.
  • [2025/05] LightLLM paper on constrained decoding accepted by ACL2025 (Pre $^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation). For a more accessible overview of the research with key insights and examples, check out our blog post: LightLLM Blog
  • [2025/04] LightLLM paper on request scheduler published in ASPLOS’25 (Past-Future Scheduler for LLM Serving under SLA Guarantees)
  • [2025/02] 🔥 LightLLM v1.0.0 release, achieving the fastest DeepSeek-R1 serving performance on single H200 machine.

Get started

Performance

Learn more in the release blogs: v1.0.0 blog.

FAQ

Please refer to the FAQ for more information.

Projects using LightLLM

We welcome any coopoeration and contribution. If there is a project requires LightLLM's support, please contact us via email or create a pull request.

Projects based on LightLLM or referenced LightLLM components: - LazyLLM - LoongServe, Peking University - OmniKV, Ant Group - vLLM (some LightLLM's kernel used) - SGLang (some LightLLM's kernel used) - ParrotServe, Microsoft - Aphrodite (some LightLLM's kernel used) - S-LoRA

Also, LightLLM's pure-python design and token-level KC Cache management make it easy to use as the basis for research projects.

Academia works based on or use part of LightLLM: - ParrotServe (OSDI’24) - SLoRA (MLSys’24) - LoongServe (SOSP’24) - ByteDance’s CXL (Eurosys’24) - VTC (OSDI’24) - OmniKV (ICLR’25) - CaraServe, LoRATEE, FastSwitch ...

Community

For further information and discussion, join our discord server. Welcome to be a member and look forward to your contribution!

License

This repository is released under the Apache-2.0 license.

Acknowledgement

We learned a lot from the following projects when developing LightLLM. - Faster Transformer - Text Generation Inference - vLLM - SGLang - flashinfer - Flash Attention 1&2 - OpenAI Triton

Citation

We have published a number of papers around components or features of LightLLM, if you use LightLLM in your work, please consider citing the relevant paper.

constrained decoding: accepted by ACL2025 and achieved the outstanding paper award.

@inproceedings{
anonymous2025pre,
title={Pre\${\textasciicircum}3\$: Enabling Deterministic Pushdown Automata for Faster Structured {LLM} Generation},
author={Anonymous},
booktitle={Submitted to ACL Rolling Review - February 2025},
year={2025},
url={https://openreview.net/forum?id=g1aBeiyZEi},
note={under review}
}

Request scheduler: accepted by ASPLOS’25:

@inproceedings{gong2025past,
  title={Past-Future Scheduler for LLM Serving under SLA Guarantees},
  author={Gong, Ruihao and Bai, Shihao and Wu, Siyu and Fan, Yunqian and Wang, Zaijun and Li, Xiuhong and Yang, Hailong and Liu, Xianglong},
  booktitle={Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
  pages={798--813},
  year={2025}
}

Core symbols most depended-on inside this repo

load
called by 583
lightllm/utils/petrel_helper.py
get
called by 272
lightllm/common/quantization/registry.py
cuda
called by 254
lightllm/models/vit/model.py
empty
called by 224
lightllm/common/basemodel/layer_infer/cache_tensor_manager.py
init_logger
called by 161
lightllm/utils/log_utils.py
get
called by 159
lightllm/server/core/objs/out_token_circlequeue.py
alloc_tensor
called by 90
lightllm/common/basemodel/layer_infer/base_layer_infer.py
get_env_start_args
called by 84
lightllm/utils/envs_utils.py

Shape

Method 2,041
Function 945
Class 483
Route 28

Languages

Python100%

Modules by API surface

format_out/grammer/core.py55 symbols
lightllm/models/llama/layer_infer/transformer_layer_infer.py42 symbols
lightllm/server/router/model_infer/infer_batch.py41 symbols
lightllm/server/router/dynamic_prompt/radix_cache.py38 symbols
lightllm/server/core/objs/sampling_params.py36 symbols
lightllm/common/basemodel/basemodel.py35 symbols
lightllm/server/core/objs/req.py34 symbols
lightllm/server/api_http.py33 symbols
lightllm/common/basemodel/layer_weights/meta_weights/mm_weight/rowmm_weight.py32 symbols
test/test_api/test_openai_api.py30 symbols
lightllm/server/function_call_parser.py30 symbols
format_out/grammer/dpda.py30 symbols

Dependencies from manifests, versioned

Brotli1.0.9 · 1×
Jinja23.1.2 · 1×
MarkupSafe2.1.3 · 1×
Pillow10.4.0 · 1×
PySocks1.7.1 · 1×
PyYAML6.0.1 · 1×
anyio3.7.1 · 1×
atomics1.0.3 · 1×
black23.12.0 · 1×
boltons23.0.0 · 1×
boto31.28.7 · 1×
botocore1.31.7 · 1×

For agents

$ claude mcp add LightLLM \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact