MCPcopy
hub / github.com/deepspeedai/DeepSpeed / __init__

Method __init__

deepspeed/runtime/zero/mics.py:65–174  ·  view source on GitHub ↗

A context manager to partition the model parameters during the model construction with MiCS partition strategy. Model states are partitioned to the number of devices specified via ``mics_shard_size`` field in the deepspeed config json file. The context manager also introduces

(self,
                 module=None,
                 data_parallel_group=None,
                 sequence_data_parallel_group=None,
                 mem_efficient_linear=True,
                 remote_device=None,
                 pin_memory=False,
                 config_dict_or_path=None,
                 config=None,
                 enabled=True,
                 dtype=None,
                 mpu=None)

Source from the content-addressed store, hash-verified

63class MiCS_Init(Init):
64
65 def __init__(self,
66 module=None,
67 data_parallel_group=None,
68 sequence_data_parallel_group=None,
69 mem_efficient_linear=True,
70 remote_device=None,
71 pin_memory=False,
72 config_dict_or_path=None,
73 config=None,
74 enabled=True,
75 dtype=None,
76 mpu=None):
77 """A context manager to partition the model parameters during the model
78 construction with MiCS partition strategy. Model states are partitioned
79 to the number of devices specified via ``mics_shard_size`` field in the
80 deepspeed config json file. The context manager also introduces
81 hierarchical communication method to reduce the cost of inter-node
82 communications, which can be enabled with
83 ``mics_hierarchical_params_gather`` field in deepspeed config.
84
85 Args:
86 module (``torch.nn.Module``, optional): If provided, partition the model as
87 if it was constructed in the context.
88 data_parallel_group (``deepspeed.comm`` process group, optional):
89 The group of processes to partition among. Defaults to all processes.
90 Synonymous with sequence data parallel group for param partitioning
91 across both sequence and data parallel groups.
92 mem_efficient_linear (bool, optional): Replace
93 torch.nn.functional.linear with an implementation that allows
94 DeepSpeed to partition parameters. Defaults to ``True``.
95 remote_device (string, optional): The initial device to store model
96 weights e.g., ``cpu``, ``nvme``. Passing ``"cpu"`` will create the model in CPU
97 memory. The model may still be moved to GPU based on the
98 offload settings for training. Defaults to param offload device if a config is
99 defined, otherwise GPU.
100 pin_memory (bool, optional): Potentially increase performance by
101 using pinned memory for model weights. ``remote_device`` must be
102 ``"cpu"``. Defaults to pin_memory value in config, otherwise ``False``.
103 config_dict_or_path (dict or ``json file``, optional): If provided, provides configuration
104 for swapping fp16 params to NVMe.
105 config (dict or ``json file``, optional): Deprecated, use config_dict_or_path instead.
106 enabled (bool, optional): If ``False``, this context has no
107 effect. Defaults to ``True``.
108 dtype (``dtype``, optional): Can be used to change the data type of the parameters.
109 Supported options are ``torch.half`` and ``torch.float``. Defaults to ``None``
110 mpu (``object``, optional): A model parallelism unit object that implements get_{model,data}_parallel_{rank,group,world_size}.
111
112 This context follows the same logic as ``deepspeed.zero.Init()``, but
113 with the modification for partition size of each parameter.
114
115 Examples
116 --------
117
118 #. Allocate a model and partition it among all processes:
119
120 .. code-block:: python
121 # the config_dict_or_path is required to let the context manager know
122 # how partition the parameters.

Callers 2

__init__Method · 0.45
__init__Method · 0.45

Calls 4

create_mics_comm_groupsFunction · 0.90
get_world_groupMethod · 0.80
warningMethod · 0.80
is_initializedMethod · 0.45

Tested by

no test coverage detected