MCPcopy
hub / github.com/deepspeedai/DeepSpeed / tp_model_init

Function tp_model_init

deepspeed/__init__.py:391–454  ·  view source on GitHub ↗

Record tensor-parallel initialization arguments for training. Note (compatibility and initialization behavior): AutoTP sharding is applied during ``deepspeed.initialize(...)``. This function exists for backward compatibility and only records TP arguments so they can be validate

(model, tp_size, dtype, config=None, **kwargs)

Source from the content-addressed store, hash-verified

389
390
391def tp_model_init(model, tp_size, dtype, config=None, **kwargs):
392 """
393 Record tensor-parallel initialization arguments for training.
394
395 Note (compatibility and initialization behavior):
396 AutoTP sharding is applied during ``deepspeed.initialize(...)``. This
397 function exists for backward compatibility and only records TP arguments so
398 they can be validated and merged with the DeepSpeed config at initialization.
399 When you use both (i.e., calling ``set_autotp_mode(training=True)`` and
400 ``deepspeed.tp_model_init`` while also passing the config to
401 ``deepspeed.initialize``), DeepSpeed merges the settings at initialization.
402 Conflicting settings raise an error. The table below summarizes the behavior
403 across input combinations.
404
405 Inputs:
406 - TPI: tp_model_init was called? (Y/N)
407 - TPG: tp_model_init provided tp_group? (Y/N)
408 - CFG: tensor_parallel in DeepSpeed config? (Y/N)
409 - MPU: mpu passed to deepspeed.initialize()? (Y/N)
410
411 | TPI | TPG | CFG | MPU | Outcome | Notes |
412 |-----|-----|-----|-----|----------------------------------------|-------|
413 | N | N | N | N | Error | No TP intent; nothing to initialize |
414 | N | N | N | Y | No AutoTP | mpu may be used for other MP, but TP not enabled |
415 | N | N | Y | N | Init AutoTP from config | Use config; need TP group via config-driven init |
416 | N | N | Y | Y | Init AutoTP from config | mpu used to build TP group |
417 | Y | N | N | N | Error | No TP group source |
418 | Y | N | N | Y | Init AutoTP from tp_model_init | Use recorded args + mpu for TP group |
419 | Y | N | Y | N | Init AutoTP from config | Fill missing from TPI; error on mismatches; need TP group source |
420 | Y | N | Y | Y | Init AutoTP from config | Fill missing from TPI; error on mismatches |
421 | Y | Y | N | N | Init AutoTP from tp_model_init | Use recorded tp_group; config absent |
422 | Y | Y | N | Y | Error | tp_group + mpu conflict |
423 | Y | Y | Y | N | Init AutoTP from config | Error on mismatches; use tp_group from TPI; reject mpu |
424 | Y | Y | Y | Y | Error | tp_group + mpu conflict |
425
426 Field-level merge rules when both tp_model_init and config exist:
427 - Canonical source: config
428 - Allowed: fill missing config fields from tp_model_init
429 - Error on mismatch: autotp_size, dtype, tp_group size or identity
430
431 Extra checks:
432 - If tp_group is provided, reject mpu.
433 - If tp_group is not provided, require mpu (or another TP group source).
434 - If tensor_parallel is absent and only tp_model_init was called, require
435 a TP group source (direct tp_group or mpu).
436
437 Args:
438 model (torch.nn.Module): The model to be initialized.
439 tp_size (int): The tensor parallelism size.
440 dtype (torch.dtype): The data type to be used for the model.
441
442 Returns:
443 torch.nn.Module: The original model (no sharding applied here).
444 """
445 if hasattr(model, 'ds_autotp_parsed'):
446 logger.warning("ds_autotp_parsed' attribute already exists in the model; tp_model_init is now record-only.")
447
448 tp_group = kwargs.get("tp_group")

Callers

nothing calls this directly

Calls 4

set_autotp_modeFunction · 0.85
warningMethod · 0.80
getMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…