MCPcopy
hub / github.com/deepspeedai/DeepSpeed

github.com/deepspeedai/DeepSpeed @v0.19.2 sqlite

repository ↗ · DeepWiki ↗ · release v0.19.2 ↗
10,482 symbols 42,289 edges 1,030 files 1,920 documented · 18%
README

License Apache 2.0 PyPI version Downloads Build OpenSSF Best Practices Twitter Japanese Twitter Chinese Zhihu Slack

Office Hours

DeepSpeed hosts regular office hours on the last Tuesday of each month at 12:00 America/New_York to discuss development plans, features, etc. This meeting is public for anyone to join and ask questions. The meeting is hosted on Zoom and can be joined here.

Latest News

More news


Extreme Speed and Scale for DL Training

DeepSpeed enabled the world's most powerful language models (at the time of this writing) such as MT-530B and BLOOM. DeepSpeed offers a confluence of system innovations, that has made large scale DL training effective, and efficient, greatly improved ease of use, and redefined the DL training landscape in terms of scale that is possible. These innovations include ZeRO, ZeRO-Infinity, 3D-Parallelism, Ulysses Sequence Parallelism, DeepSpeed-MoE, etc.


DeepSpeed Adoption

DeepSpeed was an important part of Microsoft’s AI at Scale initiative to enable next-generation AI capabilities at scale, where you can find more information here.

DeepSpeed has been used to train many different large-scale models, below is a list of several examples that we are aware of (if you'd like to include your model please submit a PR):

DeepSpeed has been integrated with several different popular open-source DL frameworks such as:

Documentation
Transformers with DeepSpeed
Accelerate with DeepSpeed
Lightning with DeepSpeed
MosaicML with DeepSpeed
Determined with DeepSpeed
MMEngine with DeepSpeed

Build Pipeline Status

Description Status
NVIDIA nv-pre-compile-ops aws-torch-latest
AMD amd-mi200
CPU torch-latest-cpu
Intel Gaudi hpu-gaudi2
Intel XPU xpu-max1100
Integrations aws-accelerate
Misc Formatting pages-build-deployment Documentation Statuspython
Huawei Ascend NPU Huawei Ascend NPU

Installation

The quickest way to get started with DeepSpeed is via pip, this will install the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA versions. DeepSpeed includes several C++/CUDA extensions that we commonly refer to as our 'ops'. By default, all of these extensions/ops will be built just-in-time (JIT) using torch's JIT C++ extension loader that relies on ninja to build and dynamically link them at runtime.

Requirements

  • PyTorch must be installed before installing DeepSpeed.
  • For full feature support we recommend a version of PyTorch that is >= 2.0 and ideally the latest PyTorch stable release.
  • A CUDA or ROCm compiler such as nvcc or hipcc used to compile C++/CUDA/HIP extensions.
  • Specific GPUs we develop and test against are listed below, this doesn't mean your GPU will not work if it doesn't fall into this category it's just DeepSpeed is most well tested on the following:
  • NVIDIA: Pascal, Volta, Ampere, and Hopper architectures
  • AMD: MI100 and MI200

Contributed HW support

  • DeepSpeed now support various HW accelerators.
Contributor Hardware Accelerator Name Contributor validated Upstream validated
Huawei Huawei Ascend NPU npu Yes No
Intel Intel(R) Gaudi(R) 2 AI accelerator hpu Yes Yes
Intel Intel(R) Xeon(R) Processors cpu Yes Yes
Intel Intel(R) Data Center GPU Max series xpu Yes Yes
Tecorigin Scalable Data Analytics Accelerator sdaa Yes No

PyPI

We regularly push releases to PyPI and encourage users to install from there in most cases.

pip install deepspeed

After installation, you can validate your install and see which extensions/ops your machine is compatible with via the DeepSpeed environment report.

ds_report

If you would like to

Core symbols most depended-on inside this repo

get_accelerator
called by 1649
accelerator/real_accelerator.py
append
called by 1248
deepspeed/utils/comms_logging.py
to
called by 856
deepspeed/ops/fp_quantizer/quantize.py
numel
called by 519
deepspeed/runtime/swap_tensor/optimizer_utils.py
parameters
called by 423
deepspeed/inference/v2/checkpoint/base_engine.py
device_name
called by 357
accelerator/hpu_accelerator.py
initialize
called by 318
deepspeed/inference/v2/inference_parameter.py
reshape
called by 311
deepspeed/checkpoint/reshape_3d_utils.py

Shape

Method 6,473
Function 2,504
Class 1,473
Route 32

Languages

Python100%

Modules by API surface

deepspeed/runtime/engine.py311 symbols
deepspeed/runtime/zero/stage3.py169 symbols
deepspeed/runtime/zero/stage_1_and_2.py145 symbols
deepspeed/runtime/zero/partition_parameters.py119 symbols
deepspeed/module_inject/layers.py114 symbols
tests/unit/v1/zero/test_zero.py113 symbols
deepspeed/runtime/utils.py83 symbols
deepspeed/runtime/data_pipeline/data_sampling/indexed_dataset.py77 symbols
deepspeed/runtime/config.py76 symbols
deepspeed/profiling/flops_profiler/profiler.py75 symbols
accelerator/cuda_accelerator.py74 symbols
accelerator/hpu_accelerator.py72 symbols

Dependencies from manifests, versioned

autodoc_pydantic2.0.0 · 1×
clang-format18.1.3 · 1×
comet_ml3.41.0 · 1×
diffusers0.25.0 · 1×
importlib-metadata4 · 1×
lm-eval0.3.0 · 1×
neural-compressor2.1.0 · 1×
packaging20.0 · 1×
pre-commit3.2.0 · 1×
pydantic2.0.0 · 1×
pytest7.2.0 · 1×
qtorch0.3.0 · 1×

For agents

$ claude mcp add DeepSpeed \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact