hub / github.com/deepspeedai/DeepSpeed / tp_model_init

Function tp_model_init

deepspeed/__init__.py:391–454 · view source on GitHub ↗

Record tensor-parallel initialization arguments for training. Note (compatibility and initialization behavior): AutoTP sharding is applied during ``deepspeed.initialize(...)``. This function exists for backward compatibility and only records TP arguments so they can be validate

(model, tp_size, dtype, config=None, **kwargs)

Source from the content-addressed store, hash-verified

389
390
391	def tp_model_init(model, tp_size, dtype, config=None, **kwargs):
392	"""
393	Record tensor-parallel initialization arguments for training.
394
395	Note (compatibility and initialization behavior):
396	AutoTP sharding is applied during ``deepspeed.initialize(...)``. This
397	function exists for backward compatibility and only records TP arguments so
398	they can be validated and merged with the DeepSpeed config at initialization.
399	When you use both (i.e., calling ``set_autotp_mode(training=True)`` and
400	``deepspeed.tp_model_init`` while also passing the config to
401	``deepspeed.initialize``), DeepSpeed merges the settings at initialization.
402	Conflicting settings raise an error. The table below summarizes the behavior
403	across input combinations.
404
405	Inputs:
406	- TPI: tp_model_init was called? (Y/N)
407	- TPG: tp_model_init provided tp_group? (Y/N)
408	- CFG: tensor_parallel in DeepSpeed config? (Y/N)
409	- MPU: mpu passed to deepspeed.initialize()? (Y/N)
410
411	\| TPI \| TPG \| CFG \| MPU \| Outcome \| Notes \|
412	\|-----\|-----\|-----\|-----\|----------------------------------------\|-------\|
413	\| N \| N \| N \| N \| Error \| No TP intent; nothing to initialize \|
414	\| N \| N \| N \| Y \| No AutoTP \| mpu may be used for other MP, but TP not enabled \|
415	\| N \| N \| Y \| N \| Init AutoTP from config \| Use config; need TP group via config-driven init \|
416	\| N \| N \| Y \| Y \| Init AutoTP from config \| mpu used to build TP group \|
417	\| Y \| N \| N \| N \| Error \| No TP group source \|
418	\| Y \| N \| N \| Y \| Init AutoTP from tp_model_init \| Use recorded args + mpu for TP group \|
419	\| Y \| N \| Y \| N \| Init AutoTP from config \| Fill missing from TPI; error on mismatches; need TP group source \|
420	\| Y \| N \| Y \| Y \| Init AutoTP from config \| Fill missing from TPI; error on mismatches \|
421	\| Y \| Y \| N \| N \| Init AutoTP from tp_model_init \| Use recorded tp_group; config absent \|
422	\| Y \| Y \| N \| Y \| Error \| tp_group + mpu conflict \|
423	\| Y \| Y \| Y \| N \| Init AutoTP from config \| Error on mismatches; use tp_group from TPI; reject mpu \|
424	\| Y \| Y \| Y \| Y \| Error \| tp_group + mpu conflict \|
425
426	Field-level merge rules when both tp_model_init and config exist:
427	- Canonical source: config
428	- Allowed: fill missing config fields from tp_model_init
429	- Error on mismatch: autotp_size, dtype, tp_group size or identity
430
431	Extra checks:
432	- If tp_group is provided, reject mpu.
433	- If tp_group is not provided, require mpu (or another TP group source).
434	- If tensor_parallel is absent and only tp_model_init was called, require
435	a TP group source (direct tp_group or mpu).
436
437	Args:
438	model (torch.nn.Module): The model to be initialized.
439	tp_size (int): The tensor parallelism size.
440	dtype (torch.dtype): The data type to be used for the model.
441
442	Returns:
443	torch.nn.Module: The original model (no sharding applied here).
444	"""
445	if hasattr(model, 'ds_autotp_parsed'):
446	logger.warning("ds_autotp_parsed' attribute already exists in the model; tp_model_init is now record-only.")
447
448	tp_group = kwargs.get("tp_group")

Callers

nothing calls this directly

Calls 4

record_tp_model_init_argsFunction · 0.85

set_autotp_modeFunction · 0.85

warningMethod · 0.80

getMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…