This function first parses the configuration arguments, using :func:`parse_args()` in case one of the input arguments are not given. Then initialize and set distributed environment by calling global_context's functions. Args: config (Union[str, dict, Config]): Config file or config
(
rank: int,
world_size: int,
host: str,
port: int,
backend: str = "nccl",
local_rank: int = None,
seed: int = 1024,
verbose: bool = True,
)
| 18 | |
| 19 | |
| 20 | def launch( |
| 21 | rank: int, |
| 22 | world_size: int, |
| 23 | host: str, |
| 24 | port: int, |
| 25 | backend: str = "nccl", |
| 26 | local_rank: int = None, |
| 27 | seed: int = 1024, |
| 28 | verbose: bool = True, |
| 29 | ): |
| 30 | """This function first parses the configuration arguments, using :func:`parse_args()` in case one of the input |
| 31 | arguments are not given. Then initialize and set distributed environment by calling global_context's functions. |
| 32 | |
| 33 | Args: |
| 34 | config (Union[str, dict, Config]): Config file or config file path are both acceptable |
| 35 | rank (int): Rank for the default process group |
| 36 | world_size (int): World size of the default process group |
| 37 | host (str): The master address for distributed training |
| 38 | port (str): The master port for distributed training |
| 39 | backend (str, optional): Backend for ``torch.distributed``, defaults to ``nccl`` |
| 40 | local_rank (int, optional): |
| 41 | Rank for the process on the node and is used to set the default CUDA device, |
| 42 | defaults to None. If local_rank = None, the default device ordinal will be calculated automatically. |
| 43 | seed (int, optional): Specified random seed for every process. Defaults to 1024. |
| 44 | verbose (bool, optional): Whether to print logs. Defaults to True. |
| 45 | |
| 46 | Raises: |
| 47 | Exception: Raise exception when config type is wrong |
| 48 | """ |
| 49 | |
| 50 | cur_accelerator = get_accelerator() |
| 51 | |
| 52 | backend = cur_accelerator.communication_backend |
| 53 | |
| 54 | # init default process group |
| 55 | if ":" in host: # IPv6 |
| 56 | init_method = f"tcp://[{host}]:{port}" |
| 57 | else: # IPv4 |
| 58 | init_method = f"tcp://{host}:{port}" |
| 59 | dist.init_process_group(rank=rank, world_size=world_size, backend=backend, init_method=init_method) |
| 60 | |
| 61 | # set cuda device |
| 62 | # if local rank is not given, calculate automatically |
| 63 | if cur_accelerator.support_set_device: |
| 64 | cur_accelerator.set_device(local_rank) |
| 65 | |
| 66 | set_seed(seed) |
| 67 | |
| 68 | try: |
| 69 | torch._dynamo.config.optimize_ddp = world_size > 1 |
| 70 | except AttributeError: |
| 71 | pass |
| 72 | |
| 73 | if verbose: |
| 74 | logger = get_dist_logger() |
| 75 | logger.info(f"Distributed environment is initialized, world size: {dist.get_world_size()}", ranks=[0]) |
| 76 | |
| 77 |
searching dependent graphs…