hub / github.com/ModelTC/LightLLM

github.com/ModelTC/LightLLM @v1.1.0 sqlite

repository ↗ · DeepWiki ↗ · release v1.1.0 ↗

3,497 symbols 14,196 edges 644 files 265 documented · 8%

README

<img alt="LightLLM" src="https://github.com/ModelTC/LightLLM/raw/v1.1.0/assets/logo_new.png" width=90%>

visitors

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. LightLLM harnesses the strengths of numerous well-regarded open-source implementations, including but not limited to FasterTransformer, TGI, vLLM, and FlashAttention.

English Docs | 中文文档 | Blogs

News

[2025/09] 🔥 LightLLM v1.1.0 release!
[2025/08] Pre $^3$ achieves the outstanding paper award of ACL2025.
[2025/05] LightLLM paper on constrained decoding accepted by ACL2025 (Pre $^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation). For a more accessible overview of the research with key insights and examples, check out our blog post: LightLLM Blog
[2025/04] LightLLM paper on request scheduler published in ASPLOS’25 (Past-Future Scheduler for LLM Serving under SLA Guarantees)
[2025/02] 🔥 LightLLM v1.0.0 release, achieving the fastest DeepSeek-R1 serving performance on single H200 machine.

Get started

Performance

Learn more in the release blogs: v1.0.0 blog.

FAQ

Please refer to the FAQ for more information.

Projects using LightLLM

We welcome any coopoeration and contribution. If there is a project requires LightLLM's support, please contact us via email or create a pull request.

Projects based on LightLLM or referenced LightLLM components: - LazyLLM - LoongServe, Peking University - OmniKV, Ant Group - vLLM (some LightLLM's kernel used) - SGLang (some LightLLM's kernel used) - ParrotServe, Microsoft - Aphrodite (some LightLLM's kernel used) - S-LoRA

Also, LightLLM's pure-python design and token-level KC Cache management make it easy to use as the basis for research projects.

Academia works based on or use part of LightLLM: - ParrotServe (OSDI’24) - SLoRA (MLSys’24) - LoongServe (SOSP’24) - ByteDance’s CXL (Eurosys’24) - VTC (OSDI’24) - OmniKV (ICLR’25) - CaraServe, LoRATEE, FastSwitch ...

Community

For further information and discussion, join our discord server. Welcome to be a member and look forward to your contribution!

License

This repository is released under the Apache-2.0 license.

Acknowledgement

We learned a lot from the following projects when developing LightLLM. - Faster Transformer - Text Generation Inference - vLLM - SGLang - flashinfer - Flash Attention 1&2 - OpenAI Triton

Citation

We have published a number of papers around components or features of LightLLM, if you use LightLLM in your work, please consider citing the relevant paper.

constrained decoding: accepted by ACL2025 and achieved the outstanding paper award.

@inproceedings{
anonymous2025pre,
title={Pre\${\textasciicircum}3\$: Enabling Deterministic Pushdown Automata for Faster Structured {LLM} Generation},
author={Anonymous},
booktitle={Submitted to ACL Rolling Review - February 2025},
year={2025},
url={https://openreview.net/forum?id=g1aBeiyZEi},
note={under review}
}

Request scheduler: accepted by ASPLOS’25:

@inproceedings{gong2025past,
  title={Past-Future Scheduler for LLM Serving under SLA Guarantees},
  author={Gong, Ruihao and Bai, Shihao and Wu, Siyu and Fan, Yunqian and Wang, Zaijun and Li, Xiuhong and Yang, Hailong and Liu, Xianglong},
  booktitle={Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
  pages={798--813},
  year={2025}
}

Core symbols most depended-on inside this repo

load

called by 583

lightllm/utils/petrel_helper.py

get

called by 272

lightllm/common/quantization/registry.py

cuda

called by 254

lightllm/models/vit/model.py

empty

called by 224

lightllm/common/basemodel/layer_infer/cache_tensor_manager.py

init_logger

called by 161

lightllm/utils/log_utils.py

get

called by 159

lightllm/server/core/objs/out_token_circlequeue.py

alloc_tensor

called by 90

lightllm/common/basemodel/layer_infer/base_layer_infer.py

get_env_start_args

called by 84

lightllm/utils/envs_utils.py

Shape

Method 2,041

Function 945

Class 483

Route 28

Languages

Python100%

Modules by API surface

format_out/grammer/core.py55 symbols

lightllm/models/llama/layer_infer/transformer_layer_infer.py42 symbols

lightllm/server/router/model_infer/infer_batch.py41 symbols

lightllm/server/router/dynamic_prompt/radix_cache.py38 symbols

lightllm/server/core/objs/sampling_params.py36 symbols

lightllm/common/basemodel/basemodel.py35 symbols

lightllm/server/core/objs/req.py34 symbols

lightllm/server/api_http.py33 symbols

lightllm/common/basemodel/layer_weights/meta_weights/mm_weight/rowmm_weight.py32 symbols

test/test_api/test_openai_api.py30 symbols

lightllm/server/function_call_parser.py30 symbols

format_out/grammer/dpda.py30 symbols

Dependencies from manifests, versioned

Brotli1.0.9 · 1×

Jinja23.1.2 · 1×

MarkupSafe2.1.3 · 1×

Pillow10.4.0 · 1×

PySocks1.7.1 · 1×

PyYAML6.0.1 · 1×

anyio3.7.1 · 1×

atomics1.0.3 · 1×

black23.12.0 · 1×

boltons23.0.0 · 1×

boto31.28.7 · 1×

botocore1.31.7 · 1×

For agents

$ claude mcp add LightLLM \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact