Split nodes and return a subset for the local rank. This function splits the input nodes based on the partition book and returns a subset of nodes for the local rank. This method is used for dividing workloads for distributed training. The input nodes are stored as a vector of mask
(
nodes,
partition_book=None,
ntype="_N",
rank=None,
force_even=True,
node_trainer_ids=None,
)
| 1863 | |
| 1864 | |
| 1865 | def node_split( |
| 1866 | nodes, |
| 1867 | partition_book=None, |
| 1868 | ntype="_N", |
| 1869 | rank=None, |
| 1870 | force_even=True, |
| 1871 | node_trainer_ids=None, |
| 1872 | ): |
| 1873 | """Split nodes and return a subset for the local rank. |
| 1874 | |
| 1875 | This function splits the input nodes based on the partition book and |
| 1876 | returns a subset of nodes for the local rank. This method is used for |
| 1877 | dividing workloads for distributed training. |
| 1878 | |
| 1879 | The input nodes are stored as a vector of masks. The length of the vector is |
| 1880 | the same as the number of nodes in a graph; 1 indicates that the vertex in |
| 1881 | the corresponding location exists. |
| 1882 | |
| 1883 | There are two strategies to split the nodes. By default, it splits the nodes |
| 1884 | in a way to maximize data locality. That is, all nodes that belong to a process |
| 1885 | are returned. If ``force_even`` is set to true, the nodes are split evenly so |
| 1886 | that each process gets almost the same number of nodes. |
| 1887 | |
| 1888 | When ``force_even`` is True, the data locality is still preserved if a graph is partitioned |
| 1889 | with Metis and the node/edge IDs are shuffled. |
| 1890 | In this case, majority of the nodes returned for a process are the ones that |
| 1891 | belong to the process. If node/edge IDs are not shuffled, data locality is not guaranteed. |
| 1892 | |
| 1893 | Parameters |
| 1894 | ---------- |
| 1895 | nodes : 1D tensor or DistTensor |
| 1896 | A boolean mask vector that indicates input nodes. |
| 1897 | partition_book : GraphPartitionBook, optional |
| 1898 | The graph partition book |
| 1899 | ntype : str, optional |
| 1900 | The node type of the input nodes. |
| 1901 | rank : int, optional |
| 1902 | The rank of a process. If not given, the rank of the current process is used. |
| 1903 | force_even : bool, optional |
| 1904 | Force the nodes are split evenly. |
| 1905 | node_trainer_ids : 1D tensor or DistTensor, optional |
| 1906 | If not None, split the nodes to the trainers on the same machine according to |
| 1907 | trainer IDs assigned to each node. Otherwise, split randomly. |
| 1908 | |
| 1909 | Returns |
| 1910 | ------- |
| 1911 | 1D-tensor |
| 1912 | The vector of node IDs that belong to the rank. |
| 1913 | """ |
| 1914 | if not isinstance(nodes, DistTensor): |
| 1915 | assert ( |
| 1916 | partition_book is not None |
| 1917 | ), "Regular tensor requires a partition book." |
| 1918 | elif partition_book is None: |
| 1919 | partition_book = nodes.part_policy.partition_book |
| 1920 | |
| 1921 | assert len(nodes) == partition_book._num_nodes( |
| 1922 | ntype |