Record tensor-parallel initialization arguments for training. Note (compatibility and initialization behavior): AutoTP sharding is applied during ``deepspeed.initialize(...)``. This function exists for backward compatibility and only records TP arguments so they can be validate
(model, tp_size, dtype, config=None, **kwargs)
| 389 | |
| 390 | |
| 391 | def tp_model_init(model, tp_size, dtype, config=None, **kwargs): |
| 392 | """ |
| 393 | Record tensor-parallel initialization arguments for training. |
| 394 | |
| 395 | Note (compatibility and initialization behavior): |
| 396 | AutoTP sharding is applied during ``deepspeed.initialize(...)``. This |
| 397 | function exists for backward compatibility and only records TP arguments so |
| 398 | they can be validated and merged with the DeepSpeed config at initialization. |
| 399 | When you use both (i.e., calling ``set_autotp_mode(training=True)`` and |
| 400 | ``deepspeed.tp_model_init`` while also passing the config to |
| 401 | ``deepspeed.initialize``), DeepSpeed merges the settings at initialization. |
| 402 | Conflicting settings raise an error. The table below summarizes the behavior |
| 403 | across input combinations. |
| 404 | |
| 405 | Inputs: |
| 406 | - TPI: tp_model_init was called? (Y/N) |
| 407 | - TPG: tp_model_init provided tp_group? (Y/N) |
| 408 | - CFG: tensor_parallel in DeepSpeed config? (Y/N) |
| 409 | - MPU: mpu passed to deepspeed.initialize()? (Y/N) |
| 410 | |
| 411 | | TPI | TPG | CFG | MPU | Outcome | Notes | |
| 412 | |-----|-----|-----|-----|----------------------------------------|-------| |
| 413 | | N | N | N | N | Error | No TP intent; nothing to initialize | |
| 414 | | N | N | N | Y | No AutoTP | mpu may be used for other MP, but TP not enabled | |
| 415 | | N | N | Y | N | Init AutoTP from config | Use config; need TP group via config-driven init | |
| 416 | | N | N | Y | Y | Init AutoTP from config | mpu used to build TP group | |
| 417 | | Y | N | N | N | Error | No TP group source | |
| 418 | | Y | N | N | Y | Init AutoTP from tp_model_init | Use recorded args + mpu for TP group | |
| 419 | | Y | N | Y | N | Init AutoTP from config | Fill missing from TPI; error on mismatches; need TP group source | |
| 420 | | Y | N | Y | Y | Init AutoTP from config | Fill missing from TPI; error on mismatches | |
| 421 | | Y | Y | N | N | Init AutoTP from tp_model_init | Use recorded tp_group; config absent | |
| 422 | | Y | Y | N | Y | Error | tp_group + mpu conflict | |
| 423 | | Y | Y | Y | N | Init AutoTP from config | Error on mismatches; use tp_group from TPI; reject mpu | |
| 424 | | Y | Y | Y | Y | Error | tp_group + mpu conflict | |
| 425 | |
| 426 | Field-level merge rules when both tp_model_init and config exist: |
| 427 | - Canonical source: config |
| 428 | - Allowed: fill missing config fields from tp_model_init |
| 429 | - Error on mismatch: autotp_size, dtype, tp_group size or identity |
| 430 | |
| 431 | Extra checks: |
| 432 | - If tp_group is provided, reject mpu. |
| 433 | - If tp_group is not provided, require mpu (or another TP group source). |
| 434 | - If tensor_parallel is absent and only tp_model_init was called, require |
| 435 | a TP group source (direct tp_group or mpu). |
| 436 | |
| 437 | Args: |
| 438 | model (torch.nn.Module): The model to be initialized. |
| 439 | tp_size (int): The tensor parallelism size. |
| 440 | dtype (torch.dtype): The data type to be used for the model. |
| 441 | |
| 442 | Returns: |
| 443 | torch.nn.Module: The original model (no sharding applied here). |
| 444 | """ |
| 445 | if hasattr(model, 'ds_autotp_parsed'): |
| 446 | logger.warning("ds_autotp_parsed' attribute already exists in the model; tp_model_init is now record-only.") |
| 447 | |
| 448 | tp_group = kwargs.get("tp_group") |
nothing calls this directly
no test coverage detected
searching dependent graphs…