DeepSpeed hosts regular office hours on the last Tuesday of each month at 12:00 America/New_York to discuss development plans, features, etc. This meeting is public for anyone to join and ask questions. The meeting is hosted on Zoom and can be joined here.
[2026/05] Using Muon Optimizer with DeepSpeed
[2026/05] System DMA (SDMA) for ZeRO-3: offload collectives off compute units on AMD GPUs for better overlap
[2026/03] DeepSpeed Team gave a tutorial at ASPLOS 2026 titled "Building Efficient Large-Scale Model Systems with DeepSpeed: From Open-Source Foundations to Emerging Research"
[2026/03] Our SuperOffload work received an Honorable Mention for the ASPLOS 2026 Best Paper Award
[2025/12] DeepSpeed Core API updates: PyTorch-style backward and low-precision master states
[2025/10] We hosted the Ray x DeepSpeed Meetup at Anyscale. We shared our most recent work on SuperOffload, ZenFlow, Muon Optimizer Support, Arctic Long Sequence Training and DeepCompile. Please find the meetup slides here.
[2025/10] SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
[2025/10] Study of ZenFlow and ZeRO offload performance with DeepSpeed CPU core binding
[2025/08] ZenFlow: Stall-Free Offloading Engine for LLM Training
[2025/06] DeepNVMe: Affordable I/O scaling for Deep Learning Applications
More news
DeepSpeed enabled the world's most powerful language models (at the time of this writing) such as MT-530B and BLOOM. DeepSpeed offers a confluence of system innovations, that has made large scale DL training effective, and efficient, greatly improved ease of use, and redefined the DL training landscape in terms of scale that is possible. These innovations include ZeRO, ZeRO-Infinity, 3D-Parallelism, Ulysses Sequence Parallelism, DeepSpeed-MoE, etc.
DeepSpeed was an important part of Microsoft’s AI at Scale initiative to enable next-generation AI capabilities at scale, where you can find more information here.
DeepSpeed has been used to train many different large-scale models, below is a list of several examples that we are aware of (if you'd like to include your model please submit a PR):
DeepSpeed has been integrated with several different popular open-source DL frameworks such as:
| Documentation | |
|---|---|
![]() ![]() |
Transformers with DeepSpeed |
![]() ![]() |
Accelerate with DeepSpeed |
| Lightning with DeepSpeed | |
| MosaicML with DeepSpeed | |
| Determined with DeepSpeed | |
![]() |
MMEngine with DeepSpeed |
| Description | Status |
|---|---|
| NVIDIA | |
| AMD | |
| CPU | |
| Intel Gaudi | |
| Intel XPU | |
| Integrations | |
| Misc | |
| Huawei Ascend NPU |
The quickest way to get started with DeepSpeed is via pip, this will install the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA versions. DeepSpeed includes several C++/CUDA extensions that we commonly refer to as our 'ops'. By default, all of these extensions/ops will be built just-in-time (JIT) using torch's JIT C++ extension loader that relies on ninja to build and dynamically link them at runtime.
| Contributor | Hardware | Accelerator Name | Contributor validated | Upstream validated |
|---|---|---|---|---|
| Huawei | Huawei Ascend NPU | npu | Yes | No |
| Intel | Intel(R) Gaudi(R) 2 AI accelerator | hpu | Yes | Yes |
| Intel | Intel(R) Xeon(R) Processors | cpu | Yes | Yes |
| Intel | Intel(R) Data Center GPU Max series | xpu | Yes | Yes |
| Tecorigin | Scalable Data Analytics Accelerator | sdaa | Yes | No |
We regularly push releases to PyPI and encourage users to install from there in most cases.
pip install deepspeed
After installation, you can validate your install and see which extensions/ops your machine is compatible with via the DeepSpeed environment report.
ds_report
If you would like to
$ claude mcp add DeepSpeed \
-- python -m otcore.mcp_server <graph>