Split the graph into multiple chunks. A directory will be created at :attr:`output_path` with the metadata and chunked edge list as well as the node/edge data. Parameters ---------- g : DGLGraph The graph. name : str The name of the graph, to be used la
(
g,
name,
ndata_paths,
edata_paths,
num_chunks,
output_path,
data_fmt="numpy",
edges_fmt="csv",
vector_rows=False,
**kwargs,
)
| 277 | |
| 278 | |
| 279 | def chunk_graph( |
| 280 | g, |
| 281 | name, |
| 282 | ndata_paths, |
| 283 | edata_paths, |
| 284 | num_chunks, |
| 285 | output_path, |
| 286 | data_fmt="numpy", |
| 287 | edges_fmt="csv", |
| 288 | vector_rows=False, |
| 289 | **kwargs, |
| 290 | ): |
| 291 | """ |
| 292 | Split the graph into multiple chunks. |
| 293 | |
| 294 | A directory will be created at :attr:`output_path` with the metadata and |
| 295 | chunked edge list as well as the node/edge data. |
| 296 | |
| 297 | Parameters |
| 298 | ---------- |
| 299 | g : DGLGraph |
| 300 | The graph. |
| 301 | name : str |
| 302 | The name of the graph, to be used later in DistDGL training. |
| 303 | ndata_paths : dict[str, pathlike] or dict[ntype, dict[str, pathlike]] |
| 304 | The dictionary of paths pointing to the corresponding numpy array file |
| 305 | for each node data key. |
| 306 | edata_paths : dict[etype, pathlike] or dict[etype, dict[str, pathlike]] |
| 307 | The dictionary of paths pointing to the corresponding numpy array file |
| 308 | for each edge data key. ``etype`` could be canonical or non-canonical. |
| 309 | num_chunks : int |
| 310 | The number of chunks |
| 311 | output_path : pathlike |
| 312 | The output directory saving the chunked graph. |
| 313 | data_fmt : str |
| 314 | Format of node/edge data: 'numpy' or 'parquet'. |
| 315 | edges_fmt : str |
| 316 | Format of edges files: 'csv' or 'parquet'. |
| 317 | vector_rows : str |
| 318 | When true will write parquet files as single-column vector row files. |
| 319 | kwargs : dict |
| 320 | Key word arguments to control chunk details. |
| 321 | """ |
| 322 | for ntype, ndata in ndata_paths.items(): |
| 323 | for key in ndata.keys(): |
| 324 | ndata[key] = os.path.abspath(ndata[key]) |
| 325 | for etype, edata in edata_paths.items(): |
| 326 | for key in edata.keys(): |
| 327 | edata[key] = os.path.abspath(edata[key]) |
| 328 | with setdir(output_path): |
| 329 | _chunk_graph( |
| 330 | g, |
| 331 | name, |
| 332 | ndata_paths, |
| 333 | edata_paths, |
| 334 | num_chunks, |
| 335 | data_fmt, |
| 336 | edges_fmt, |
no test coverage detected