MCPcopy
hub / github.com/dask/dask / HighLevelGraph

Class HighLevelGraph

dask/highlevelgraph.py:364–864  ·  view source on GitHub ↗

Task graph composed of layers of dependent subgraphs This object encodes a Dask task graph that is composed of layers of dependent subgraphs, such as commonly occurs when building task graphs using high level collections like Dask array, bag, or dataframe. Typically each high level

Source from the content-addressed store, hash-verified

362
363
364class HighLevelGraph(Graph):
365 """Task graph composed of layers of dependent subgraphs
366
367 This object encodes a Dask task graph that is composed of layers of
368 dependent subgraphs, such as commonly occurs when building task graphs
369 using high level collections like Dask array, bag, or dataframe.
370
371 Typically each high level array, bag, or dataframe operation takes the task
372 graphs of the input collections, merges them, and then adds one or more new
373 layers of tasks for the new operation. These layers typically have at
374 least as many tasks as there are partitions or chunks in the collection.
375 The HighLevelGraph object stores the subgraphs for each operation
376 separately in sub-graphs, and also stores the dependency structure between
377 them.
378
379 Parameters
380 ----------
381 layers : Mapping[str, Mapping]
382 The subgraph layers, keyed by a unique name
383 dependencies : Mapping[str, set[str]]
384 The set of layers on which each layer depends
385 key_dependencies : dict[Key, set], optional
386 Mapping (some) keys in the high level graph to their dependencies. If
387 a key is missing, its dependencies will be calculated on-the-fly.
388
389 Examples
390 --------
391 Here is an idealized example that shows the internal state of a
392 HighLevelGraph
393
394 >>> import dask.dataframe as dd
395
396 >>> df = dd.read_csv('myfile.*.csv') # doctest: +SKIP
397 >>> df = df + 100 # doctest: +SKIP
398 >>> df = df[df.name == 'Alice'] # doctest: +SKIP
399
400 >>> graph = df.__dask_graph__() # doctest: +SKIP
401 >>> graph.layers # doctest: +SKIP
402 {
403 'read-csv': {('read-csv', 0): (pandas.read_csv, 'myfile.0.csv'),
404 ('read-csv', 1): (pandas.read_csv, 'myfile.1.csv'),
405 ('read-csv', 2): (pandas.read_csv, 'myfile.2.csv'),
406 ('read-csv', 3): (pandas.read_csv, 'myfile.3.csv')},
407 'add': {('add', 0): (operator.add, ('read-csv', 0), 100),
408 ('add', 1): (operator.add, ('read-csv', 1), 100),
409 ('add', 2): (operator.add, ('read-csv', 2), 100),
410 ('add', 3): (operator.add, ('read-csv', 3), 100)}
411 'filter': {('filter', 0): (lambda part: part[part.name == 'Alice'], ('add', 0)),
412 ('filter', 1): (lambda part: part[part.name == 'Alice'], ('add', 1)),
413 ('filter', 2): (lambda part: part[part.name == 'Alice'], ('add', 2)),
414 ('filter', 3): (lambda part: part[part.name == 'Alice'], ('add', 3))}
415 }
416
417 >>> graph.dependencies # doctest: +SKIP
418 {
419 'read-csv': set(),
420 'add': {'read-csv'},
421 'filter': {'add'}

Callers 15

_checkpoint_oneFunction · 0.90
_bind_oneFunction · 0.90
hlgMethod · 0.90
_optimize_blockwiseFunction · 0.90
fuse_rootsFunction · 0.90

Calls

no outgoing calls

Tested by 15

test_basicFunction · 0.72
test_getitemFunction · 0.72
test_copyFunction · 0.72
test_cullFunction · 0.72
test_cull_layersFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…