MCPcopy
hub / github.com/dmlc/dgl / sparse_all_to_all_pull

Function sparse_all_to_all_pull

python/dgl/cuda/nccl.py:98–189  ·  view source on GitHub ↗

Perform an all-to-all-v operation, where by all processors request the values corresponding to their set of indices. Note: This method requires 'torch.distributed.get_backend() == "nccl"'. Parameters ---------- req_idx : torch.Tensor The set of indices this processor is

(req_idx, value, partition)

Source from the content-addressed store, hash-verified

96
97
98def sparse_all_to_all_pull(req_idx, value, partition):
99 """Perform an all-to-all-v operation, where by all processors request
100 the values corresponding to their set of indices.
101
102 Note: This method requires 'torch.distributed.get_backend() == "nccl"'.
103
104 Parameters
105 ----------
106 req_idx : torch.Tensor
107 The set of indices this processor is requesting.
108 value : torch.Tensor
109 The multi-dimension set of values that can be requested from
110 this processor.
111 partition : NDArrayPartition
112 The object containing information for assigning indices to
113 processors.
114
115 Returns
116 -------
117 torch.Tensor
118 The set of recieved values, corresponding to `req_idx`.
119
120 Examples
121 --------
122
123 To perform a sparse_all_to_all_pull(), a partition object must be
124 provided. A partition of a homgeonous graph, where the vertices are
125 striped across processes can be generated via:
126
127 >>> from dgl.partition import NDArrayPartition
128 >>> part = NDArrayPartition(g.num_nodes(), world_size, mode='remainder')
129
130 With this partition, each processor can request values/features
131 associated with vertices in the graph. So in the case where we have
132 a set of neighbors 'nbr_idxs' we need features for, and each process
133 has a tensor 'node_feat' storing the features of nodes it owns in
134 the partition, the features can be requested via:
135
136 >>> nbr_values = nccl.sparse_all_to_all_pull(nbr_idxs, node_feat, part)
137
138 Then two the arrays 'nbr_idxs' and 'nbr_values' forms the sparse
139 set of features, where 'nbr_idxs[i]' is the global node id, and
140 'nbr_values[i]' is the feature vector for that node. This
141 communication pattern is useful for node features or node
142 embeddings.
143 """
144 if not dist.is_initialized() or dist.get_world_size() == 1:
145 return value[req_idx.long()]
146 assert (
147 dist.get_backend() == "nccl"
148 ), "requires NCCL backend to communicate CUDA tensors."
149
150 perm, req_splits = partition.generate_permutation(req_idx)
151 perm = perm.long()
152
153 # Get response splits.
154 resp_splits = torch.empty_like(req_splits)
155 dist.all_to_all_single(resp_splits, req_splits)

Callers

nothing calls this directly

Calls 5

generate_permutationMethod · 0.80
map_to_localMethod · 0.80
longMethod · 0.45
toMethod · 0.45
sizeMethod · 0.45

Tested by

no test coverage detected