MCPcopy
hub / github.com/dmlc/dgl / edge_split

Function edge_split

python/dgl/distributed/dist_graph.py:1964–2059  ·  view source on GitHub ↗

Split edges and return a subset for the local rank. This function splits the input edges based on the partition book and returns a subset of edges for the local rank. This method is used for dividing workloads for distributed training. The input edges can be stored as a vector of m

(
    edges,
    partition_book=None,
    etype="_E",
    rank=None,
    force_even=True,
    edge_trainer_ids=None,
)

Source from the content-addressed store, hash-verified

1962
1963
1964def edge_split(
1965 edges,
1966 partition_book=None,
1967 etype="_E",
1968 rank=None,
1969 force_even=True,
1970 edge_trainer_ids=None,
1971):
1972 """Split edges and return a subset for the local rank.
1973
1974 This function splits the input edges based on the partition book and
1975 returns a subset of edges for the local rank. This method is used for
1976 dividing workloads for distributed training.
1977
1978 The input edges can be stored as a vector of masks. The length of the vector is
1979 the same as the number of edges in a graph; 1 indicates that the edge in
1980 the corresponding location exists.
1981
1982 There are two strategies to split the edges. By default, it splits the edges
1983 in a way to maximize data locality. That is, all edges that belong to a process
1984 are returned. If ``force_even`` is set to true, the edges are split evenly so
1985 that each process gets almost the same number of edges.
1986
1987 When ``force_even`` is True, the data locality is still preserved if a graph is partitioned
1988 with Metis and the node/edge IDs are shuffled.
1989 In this case, majority of the nodes returned for a process are the ones that
1990 belong to the process. If node/edge IDs are not shuffled, data locality is not guaranteed.
1991
1992 Parameters
1993 ----------
1994 edges : 1D tensor or DistTensor
1995 A boolean mask vector that indicates input edges.
1996 partition_book : GraphPartitionBook, optional
1997 The graph partition book
1998 etype : str or (str, str, str), optional
1999 The edge type of the input edges.
2000 rank : int, optional
2001 The rank of a process. If not given, the rank of the current process is used.
2002 force_even : bool, optional
2003 Force the edges are split evenly.
2004 edge_trainer_ids : 1D tensor or DistTensor, optional
2005 If not None, split the edges to the trainers on the same machine according to
2006 trainer IDs assigned to each edge. Otherwise, split randomly.
2007
2008 Returns
2009 -------
2010 1D-tensor
2011 The vector of edge IDs that belong to the rank.
2012 """
2013 if not isinstance(edges, DistTensor):
2014 assert (
2015 partition_book is not None
2016 ), "Regular tensor requires a partition book."
2017 elif partition_book is None:
2018 partition_book = edges.part_policy.partition_book
2019 assert len(edges) == partition_book._num_edges(
2020 etype
2021 ), "The length of boolean mask vector should be the number of edges in the graph."

Callers 3

run_client_hierarchyFunction · 0.90
test_splitFunction · 0.90
test_split_evenFunction · 0.90

Calls 7

_split_even_to_partFunction · 0.85
_split_by_trainer_idFunction · 0.85
_split_localFunction · 0.85
_num_edgesMethod · 0.80
num_partitionsMethod · 0.45
partid2eidsMethod · 0.45

Tested by 3

run_client_hierarchyFunction · 0.72
test_splitFunction · 0.72
test_split_evenFunction · 0.72