MCPcopy
hub / github.com/dmlc/dgl / gen_node_data

Function gen_node_data

tools/distpartitioning/data_shuffle.py:46–175  ·  view source on GitHub ↗

For this data processing pipeline, reading node files is not needed. All the needed information about the nodes can be found in the metadata json file. This function generates the nodes owned by a given process, using metis partitions. Parameters: ----------- rank : int

(
    rank, world_size, num_parts, id_lookup, ntid_ntype_map, schema_map
)

Source from the content-addressed store, hash-verified

44
45
46def gen_node_data(
47 rank, world_size, num_parts, id_lookup, ntid_ntype_map, schema_map
48):
49 """
50 For this data processing pipeline, reading node files is not needed. All the needed information about
51 the nodes can be found in the metadata json file. This function generates the nodes owned by a given
52 process, using metis partitions.
53
54 Parameters:
55 -----------
56 rank : int
57 rank of the process
58 world_size : int
59 total no. of processes
60 num_parts : int
61 total no. of partitions
62 id_lookup : instance of class DistLookupService
63 Distributed lookup service used to map global-nids to respective partition-ids and
64 shuffle-global-nids
65 ntid_ntype_map :
66 a dictionary where keys are node_type ids(integers) and values are node_type names(strings).
67 schema_map:
68 dictionary formed by reading the input metadata json file for the input dataset.
69
70 Please note that, it is assumed that for the input graph files, the nodes of a particular node-type are
71 split into `p` files (because of `p` partitions to be generated). On a similar node, edges of a particular
72 edge-type are split into `p` files as well.
73
74 #assuming m nodetypes present in the input graph
75 "num_nodes_per_chunk" : [
76 [a0, a1, a2, ... a<p-1>],
77 [b0, b1, b2, ... b<p-1>],
78 ...
79 [m0, m1, m2, ... m<p-1>]
80 ]
81 Here, each sub-list, corresponding a nodetype in the input graph, has `p` elements. For instance [a0, a1, ... a<p-1>]
82 where each element represents the number of nodes which are to be processed by a process during distributed partitioning.
83
84 In addition to the above key-value pair for the nodes in the graph, the node-features are captured in the
85 "node_data" key-value pair. In this dictionary the keys will be nodetype names and value will be a dictionary which
86 is used to capture all the features present for that particular node-type. This is shown in the following example:
87
88 "node_data" : {
89 "paper": { # node type
90 "feat": { # feature key
91 "format": {"name": "numpy"},
92 "data": ["node_data/paper-feat-part1.npy", "node_data/paper-feat-part2.npy"]
93 },
94 "label": { # feature key
95 "format": {"name": "numpy"},
96 "data": ["node_data/paper-label-part1.npy", "node_data/paper-label-part2.npy"]
97 },
98 "year": { # feature key
99 "format": {"name": "numpy"},
100 "data": ["node_data/paper-year-part1.npy", "node_data/paper-year-part2.npy"]
101 }
102 }
103 }

Callers 1

exchange_graph_dataFunction · 0.85

Calls 7

get_idrangesFunction · 0.90
get_ntype_counts_mapFunction · 0.90
get_partition_idsMethod · 0.80
appendMethod · 0.80
itemsMethod · 0.45
barrierMethod · 0.45
keysMethod · 0.45

Tested by

no test coverage detected