hub / github.com/dmlc/dgl / gen_node_data

Function gen_node_data

tools/distpartitioning/data_shuffle.py:46–175 · view source on GitHub ↗

For this data processing pipeline, reading node files is not needed. All the needed information about the nodes can be found in the metadata json file. This function generates the nodes owned by a given process, using metis partitions. Parameters: ----------- rank : int

(
    rank, world_size, num_parts, id_lookup, ntid_ntype_map, schema_map
)

Source from the content-addressed store, hash-verified

44
45
46	def gen_node_data(
47	rank, world_size, num_parts, id_lookup, ntid_ntype_map, schema_map
48	):
49	"""
50	For this data processing pipeline, reading node files is not needed. All the needed information about
51	the nodes can be found in the metadata json file. This function generates the nodes owned by a given
52	process, using metis partitions.
53
54	Parameters:
55	-----------
56	rank : int
57	rank of the process
58	world_size : int
59	total no. of processes
60	num_parts : int
61	total no. of partitions
62	id_lookup : instance of class DistLookupService
63	Distributed lookup service used to map global-nids to respective partition-ids and
64	shuffle-global-nids
65	ntid_ntype_map :
66	a dictionary where keys are node_type ids(integers) and values are node_type names(strings).
67	schema_map:
68	dictionary formed by reading the input metadata json file for the input dataset.
69
70	Please note that, it is assumed that for the input graph files, the nodes of a particular node-type are
71	split into `p` files (because of `p` partitions to be generated). On a similar node, edges of a particular
72	edge-type are split into `p` files as well.
73
74	#assuming m nodetypes present in the input graph
75	"num_nodes_per_chunk" : [
76	[a0, a1, a2, ... a<p-1>],
77	[b0, b1, b2, ... b<p-1>],
78	...
79	[m0, m1, m2, ... m<p-1>]
80	]
81	Here, each sub-list, corresponding a nodetype in the input graph, has `p` elements. For instance [a0, a1, ... a<p-1>]
82	where each element represents the number of nodes which are to be processed by a process during distributed partitioning.
83
84	In addition to the above key-value pair for the nodes in the graph, the node-features are captured in the
85	"node_data" key-value pair. In this dictionary the keys will be nodetype names and value will be a dictionary which
86	is used to capture all the features present for that particular node-type. This is shown in the following example:
87
88	"node_data" : {
89	"paper": { # node type
90	"feat": { # feature key
91	"format": {"name": "numpy"},
92	"data": ["node_data/paper-feat-part1.npy", "node_data/paper-feat-part2.npy"]
93	},
94	"label": { # feature key
95	"format": {"name": "numpy"},
96	"data": ["node_data/paper-label-part1.npy", "node_data/paper-label-part2.npy"]
97	},
98	"year": { # feature key
99	"format": {"name": "numpy"},
100	"data": ["node_data/paper-year-part1.npy", "node_data/paper-year-part2.npy"]
101	}
102	}
103	}

Callers 1

exchange_graph_dataFunction · 0.85

Calls 7

get_idrangesFunction · 0.90

get_ntype_counts_mapFunction · 0.90

get_partition_idsMethod · 0.80

appendMethod · 0.80

itemsMethod · 0.45

barrierMethod · 0.45

keysMethod · 0.45

Tested by

no test coverage detected