hub / github.com/dmlc/dgl / sparse_all_to_all_pull

Function sparse_all_to_all_pull

python/dgl/cuda/nccl.py:98–189 · view source on GitHub ↗

Perform an all-to-all-v operation, where by all processors request the values corresponding to their set of indices. Note: This method requires 'torch.distributed.get_backend() == "nccl"'. Parameters ---------- req_idx : torch.Tensor The set of indices this processor is

(req_idx, value, partition)

Source from the content-addressed store, hash-verified

96
97
98	def sparse_all_to_all_pull(req_idx, value, partition):
99	"""Perform an all-to-all-v operation, where by all processors request
100	the values corresponding to their set of indices.
101
102	Note: This method requires 'torch.distributed.get_backend() == "nccl"'.
103
104	Parameters
105	----------
106	req_idx : torch.Tensor
107	The set of indices this processor is requesting.
108	value : torch.Tensor
109	The multi-dimension set of values that can be requested from
110	this processor.
111	partition : NDArrayPartition
112	The object containing information for assigning indices to
113	processors.
114
115	Returns
116	-------
117	torch.Tensor
118	The set of recieved values, corresponding to `req_idx`.
119
120	Examples
121	--------
122
123	To perform a sparse_all_to_all_pull(), a partition object must be
124	provided. A partition of a homgeonous graph, where the vertices are
125	striped across processes can be generated via:
126
127	>>> from dgl.partition import NDArrayPartition
128	>>> part = NDArrayPartition(g.num_nodes(), world_size, mode='remainder')
129
130	With this partition, each processor can request values/features
131	associated with vertices in the graph. So in the case where we have
132	a set of neighbors 'nbr_idxs' we need features for, and each process
133	has a tensor 'node_feat' storing the features of nodes it owns in
134	the partition, the features can be requested via:
135
136	>>> nbr_values = nccl.sparse_all_to_all_pull(nbr_idxs, node_feat, part)
137
138	Then two the arrays 'nbr_idxs' and 'nbr_values' forms the sparse
139	set of features, where 'nbr_idxs[i]' is the global node id, and
140	'nbr_values[i]' is the feature vector for that node. This
141	communication pattern is useful for node features or node
142	embeddings.
143	"""
144	if not dist.is_initialized() or dist.get_world_size() == 1:
145	return value[req_idx.long()]
146	assert (
147	dist.get_backend() == "nccl"
148	), "requires NCCL backend to communicate CUDA tensors."
149
150	perm, req_splits = partition.generate_permutation(req_idx)
151	perm = perm.long()
152
153	# Get response splits.
154	resp_splits = torch.empty_like(req_splits)
155	dist.all_to_all_single(resp_splits, req_splits)

Callers

nothing calls this directly

Calls 5

generate_permutationMethod · 0.80

map_to_localMethod · 0.80

longMethod · 0.45

toMethod · 0.45

sizeMethod · 0.45

Tested by

no test coverage detected