Perform an all-to-all-v operation, where by all processors request the values corresponding to their set of indices. Note: This method requires 'torch.distributed.get_backend() == "nccl"'. Parameters ---------- req_idx : torch.Tensor The set of indices this processor is
(req_idx, value, partition)
| 96 | |
| 97 | |
| 98 | def sparse_all_to_all_pull(req_idx, value, partition): |
| 99 | """Perform an all-to-all-v operation, where by all processors request |
| 100 | the values corresponding to their set of indices. |
| 101 | |
| 102 | Note: This method requires 'torch.distributed.get_backend() == "nccl"'. |
| 103 | |
| 104 | Parameters |
| 105 | ---------- |
| 106 | req_idx : torch.Tensor |
| 107 | The set of indices this processor is requesting. |
| 108 | value : torch.Tensor |
| 109 | The multi-dimension set of values that can be requested from |
| 110 | this processor. |
| 111 | partition : NDArrayPartition |
| 112 | The object containing information for assigning indices to |
| 113 | processors. |
| 114 | |
| 115 | Returns |
| 116 | ------- |
| 117 | torch.Tensor |
| 118 | The set of recieved values, corresponding to `req_idx`. |
| 119 | |
| 120 | Examples |
| 121 | -------- |
| 122 | |
| 123 | To perform a sparse_all_to_all_pull(), a partition object must be |
| 124 | provided. A partition of a homgeonous graph, where the vertices are |
| 125 | striped across processes can be generated via: |
| 126 | |
| 127 | >>> from dgl.partition import NDArrayPartition |
| 128 | >>> part = NDArrayPartition(g.num_nodes(), world_size, mode='remainder') |
| 129 | |
| 130 | With this partition, each processor can request values/features |
| 131 | associated with vertices in the graph. So in the case where we have |
| 132 | a set of neighbors 'nbr_idxs' we need features for, and each process |
| 133 | has a tensor 'node_feat' storing the features of nodes it owns in |
| 134 | the partition, the features can be requested via: |
| 135 | |
| 136 | >>> nbr_values = nccl.sparse_all_to_all_pull(nbr_idxs, node_feat, part) |
| 137 | |
| 138 | Then two the arrays 'nbr_idxs' and 'nbr_values' forms the sparse |
| 139 | set of features, where 'nbr_idxs[i]' is the global node id, and |
| 140 | 'nbr_values[i]' is the feature vector for that node. This |
| 141 | communication pattern is useful for node features or node |
| 142 | embeddings. |
| 143 | """ |
| 144 | if not dist.is_initialized() or dist.get_world_size() == 1: |
| 145 | return value[req_idx.long()] |
| 146 | assert ( |
| 147 | dist.get_backend() == "nccl" |
| 148 | ), "requires NCCL backend to communicate CUDA tensors." |
| 149 | |
| 150 | perm, req_splits = partition.generate_permutation(req_idx) |
| 151 | perm = perm.long() |
| 152 | |
| 153 | # Get response splits. |
| 154 | resp_splits = torch.empty_like(req_splits) |
| 155 | dist.all_to_all_single(resp_splits, req_splits) |
nothing calls this directly
no test coverage detected