MCPcopy
hub / github.com/dmlc/dgl / create_chunked_dataset

Function create_chunked_dataset

tests/tools/pytest_utils.py:342–593  ·  view source on GitHub ↗

This function creates a sample dataset, based on MAG240 dataset. Parameters: ----------- root_dir : string directory in which all the files for the chunked dataset will be stored.

(
    root_dir,
    num_chunks,
    data_fmt="numpy",
    edges_fmt="csv",
    vector_rows=False,
    **kwargs,
)

Source from the content-addressed store, hash-verified

340
341
342def create_chunked_dataset(
343 root_dir,
344 num_chunks,
345 data_fmt="numpy",
346 edges_fmt="csv",
347 vector_rows=False,
348 **kwargs,
349):
350 """
351 This function creates a sample dataset, based on MAG240 dataset.
352
353 Parameters:
354 -----------
355 root_dir : string
356 directory in which all the files for the chunked dataset will be stored.
357 """
358 # Step0: prepare chunked graph data format.
359 # A synthetic mini MAG240.
360 num_institutions = 1200
361 num_authors = 1200
362 num_papers = 1200
363
364 def rand_edges(num_src, num_dst, num_edges):
365 eids = np.random.choice(num_src * num_dst, num_edges, replace=False)
366 src = torch.from_numpy(eids // num_dst)
367 dst = torch.from_numpy(eids % num_dst)
368
369 return src, dst
370
371 num_cite_edges = 24 * 1000
372 num_write_edges = 12 * 1000
373 num_affiliate_edges = 2400
374
375 # Structure.
376 data_dict = {
377 ("paper", "cites", "paper"): rand_edges(
378 num_papers, num_papers, num_cite_edges
379 ),
380 ("author", "writes", "paper"): rand_edges(
381 num_authors, num_papers, num_write_edges
382 ),
383 ("author", "affiliated_with", "institution"): rand_edges(
384 num_authors, num_institutions, num_affiliate_edges
385 ),
386 ("institution", "writes", "paper"): rand_edges(
387 num_institutions, num_papers, num_write_edges
388 ),
389 }
390 src, dst = data_dict[("author", "writes", "paper")]
391 data_dict[("paper", "rev_writes", "author")] = (dst, src)
392 g = dgl.heterograph(data_dict)
393
394 # paper feat, label, year
395 num_paper_feats = 3
396 paper_feat = np.random.randn(num_papers, num_paper_feats)
397 num_classes = 4
398 paper_label = np.random.choice(num_classes, num_papers)
399 paper_year = np.random.choice(2022, num_papers)

Callers 6

test_lookup_serviceFunction · 0.90
test_parmetis_wrapperFunction · 0.90
_test_chunk_graphFunction · 0.90
_test_pipelineFunction · 0.90

Calls 5

rand_edgesFunction · 0.85
debugMethod · 0.80
chunk_graphFunction · 0.70
joinMethod · 0.45
saveMethod · 0.45

Tested by

no test coverage detected