hub / github.com/hpcaitech/ColossalAI / launch

Function launch

colossalai/initialize.py:20–75 · view source on GitHub ↗

This function first parses the configuration arguments, using :func:`parse_args()` in case one of the input arguments are not given. Then initialize and set distributed environment by calling global_context's functions. Args: config (Union[str, dict, Config]): Config file or config

(
    rank: int,
    world_size: int,
    host: str,
    port: int,
    backend: str = "nccl",
    local_rank: int = None,
    seed: int = 1024,
    verbose: bool = True,
)

Source from the content-addressed store, hash-verified

18
19
20	def launch(
21	rank: int,
22	world_size: int,
23	host: str,
24	port: int,
25	backend: str = "nccl",
26	local_rank: int = None,
27	seed: int = 1024,
28	verbose: bool = True,
29	):
30	"""This function first parses the configuration arguments, using :func:`parse_args()` in case one of the input
31	arguments are not given. Then initialize and set distributed environment by calling global_context's functions.
32
33	Args:
34	config (Union[str, dict, Config]): Config file or config file path are both acceptable
35	rank (int): Rank for the default process group
36	world_size (int): World size of the default process group
37	host (str): The master address for distributed training
38	port (str): The master port for distributed training
39	backend (str, optional): Backend for ``torch.distributed``, defaults to ``nccl``
40	local_rank (int, optional):
41	Rank for the process on the node and is used to set the default CUDA device,
42	defaults to None. If local_rank = None, the default device ordinal will be calculated automatically.
43	seed (int, optional): Specified random seed for every process. Defaults to 1024.
44	verbose (bool, optional): Whether to print logs. Defaults to True.
45
46	Raises:
47	Exception: Raise exception when config type is wrong
48	"""
49
50	cur_accelerator = get_accelerator()
51
52	backend = cur_accelerator.communication_backend
53
54	# init default process group
55	if ":" in host: # IPv6
56	init_method = f"tcp://[{host}]:{port}"
57	else: # IPv4
58	init_method = f"tcp://{host}:{port}"
59	dist.init_process_group(rank=rank, world_size=world_size, backend=backend, init_method=init_method)
60
61	# set cuda device
62	# if local rank is not given, calculate automatically
63	if cur_accelerator.support_set_device:
64	cur_accelerator.set_device(local_rank)
65
66	set_seed(seed)
67
68	try:
69	torch._dynamo.config.optimize_ddp = world_size > 1
70	except AttributeError:
71	pass
72
73	if verbose:
74	logger = get_dist_logger()
75	logger.info(f"Distributed environment is initialized, world size: {dist.get_world_size()}", ranks=[0])
76
77

Callers 15

check_extract_alpha_betaFunction · 0.90

check_alpha_betaFunction · 0.90

check_layerFunction · 0.90

check_commFunction · 0.90

check_applyFunction · 0.90

check_padded_tensorFunction · 0.90

check_commFunction · 0.90

check_one_step_transformFunction · 0.90

check_layout_convertingFunction · 0.90

Calls 6

get_acceleratorFunction · 0.90

set_seedFunction · 0.90

get_dist_loggerFunction · 0.90

set_deviceMethod · 0.45

infoMethod · 0.45

get_world_sizeMethod · 0.45

Tested by 15

check_extract_alpha_betaFunction · 0.72

check_alpha_betaFunction · 0.72

check_layerFunction · 0.72

check_commFunction · 0.72

check_applyFunction · 0.72

check_padded_tensorFunction · 0.72

check_commFunction · 0.72

check_one_step_transformFunction · 0.72

check_layout_convertingFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…