hub / github.com/deepspeedai/DeepSpeed / __init__

Method init

deepspeed/runtime/zero/mics.py:65–174 · view source on GitHub ↗

A context manager to partition the model parameters during the model construction with MiCS partition strategy. Model states are partitioned to the number of devices specified via ``mics_shard_size`` field in the deepspeed config json file. The context manager also introduces

(self,
                 module=None,
                 data_parallel_group=None,
                 sequence_data_parallel_group=None,
                 mem_efficient_linear=True,
                 remote_device=None,
                 pin_memory=False,
                 config_dict_or_path=None,
                 config=None,
                 enabled=True,
                 dtype=None,
                 mpu=None)

Source from the content-addressed store, hash-verified

63	class MiCS_Init(Init):
64
65	def __init__(self,
66	module=None,
67	data_parallel_group=None,
68	sequence_data_parallel_group=None,
69	mem_efficient_linear=True,
70	remote_device=None,
71	pin_memory=False,
72	config_dict_or_path=None,
73	config=None,
74	enabled=True,
75	dtype=None,
76	mpu=None):
77	"""A context manager to partition the model parameters during the model
78	construction with MiCS partition strategy. Model states are partitioned
79	to the number of devices specified via ``mics_shard_size`` field in the
80	deepspeed config json file. The context manager also introduces
81	hierarchical communication method to reduce the cost of inter-node
82	communications, which can be enabled with
83	``mics_hierarchical_params_gather`` field in deepspeed config.
84
85	Args:
86	module (``torch.nn.Module``, optional): If provided, partition the model as
87	if it was constructed in the context.
88	data_parallel_group (``deepspeed.comm`` process group, optional):
89	The group of processes to partition among. Defaults to all processes.
90	Synonymous with sequence data parallel group for param partitioning
91	across both sequence and data parallel groups.
92	mem_efficient_linear (bool, optional): Replace
93	torch.nn.functional.linear with an implementation that allows
94	DeepSpeed to partition parameters. Defaults to ``True``.
95	remote_device (string, optional): The initial device to store model
96	weights e.g., ``cpu``, ``nvme``. Passing ``"cpu"`` will create the model in CPU
97	memory. The model may still be moved to GPU based on the
98	offload settings for training. Defaults to param offload device if a config is
99	defined, otherwise GPU.
100	pin_memory (bool, optional): Potentially increase performance by
101	using pinned memory for model weights. ``remote_device`` must be
102	``"cpu"``. Defaults to pin_memory value in config, otherwise ``False``.
103	config_dict_or_path (dict or ``json file``, optional): If provided, provides configuration
104	for swapping fp16 params to NVMe.
105	config (dict or ``json file``, optional): Deprecated, use config_dict_or_path instead.
106	enabled (bool, optional): If ``False``, this context has no
107	effect. Defaults to ``True``.
108	dtype (``dtype``, optional): Can be used to change the data type of the parameters.
109	Supported options are ``torch.half`` and ``torch.float``. Defaults to ``None``
110	mpu (``object``, optional): A model parallelism unit object that implements get_{model,data}_parallel_{rank,group,world_size}.
111
112	This context follows the same logic as ``deepspeed.zero.Init()``, but
113	with the modification for partition size of each parameter.
114
115	Examples
116	--------
117
118	#. Allocate a model and partition it among all processes:
119
120	.. code-block:: python
121	# the config_dict_or_path is required to let the context manager know
122	# how partition the parameters.

Callers 2

__init__Method · 0.45

Calls 4

create_mics_comm_groupsFunction · 0.90

get_world_groupMethod · 0.80

warningMethod · 0.80

is_initializedMethod · 0.45

Tested by

no test coverage detected

Method __init__

Source from the content-addressed store, hash-verified

Callers 2

Calls 4

Tested by

Method init