hub / github.com/dask/dask / HighLevelGraph

Class HighLevelGraph

dask/highlevelgraph.py:364–864 · view source on GitHub ↗

Task graph composed of layers of dependent subgraphs This object encodes a Dask task graph that is composed of layers of dependent subgraphs, such as commonly occurs when building task graphs using high level collections like Dask array, bag, or dataframe. Typically each high level

Source from the content-addressed store, hash-verified

362
363
364	class HighLevelGraph(Graph):
365	"""Task graph composed of layers of dependent subgraphs
366
367	This object encodes a Dask task graph that is composed of layers of
368	dependent subgraphs, such as commonly occurs when building task graphs
369	using high level collections like Dask array, bag, or dataframe.
370
371	Typically each high level array, bag, or dataframe operation takes the task
372	graphs of the input collections, merges them, and then adds one or more new
373	layers of tasks for the new operation. These layers typically have at
374	least as many tasks as there are partitions or chunks in the collection.
375	The HighLevelGraph object stores the subgraphs for each operation
376	separately in sub-graphs, and also stores the dependency structure between
377	them.
378
379	Parameters
380	----------
381	layers : Mapping[str, Mapping]
382	The subgraph layers, keyed by a unique name
383	dependencies : Mapping[str, set[str]]
384	The set of layers on which each layer depends
385	key_dependencies : dict[Key, set], optional
386	Mapping (some) keys in the high level graph to their dependencies. If
387	a key is missing, its dependencies will be calculated on-the-fly.
388
389	Examples
390	--------
391	Here is an idealized example that shows the internal state of a
392	HighLevelGraph
393
394	>>> import dask.dataframe as dd
395
396	>>> df = dd.read_csv('myfile.*.csv') # doctest: +SKIP
397	>>> df = df + 100 # doctest: +SKIP
398	>>> df = df[df.name == 'Alice'] # doctest: +SKIP
399
400	>>> graph = df.__dask_graph__() # doctest: +SKIP
401	>>> graph.layers # doctest: +SKIP
402	{
403	'read-csv': {('read-csv', 0): (pandas.read_csv, 'myfile.0.csv'),
404	('read-csv', 1): (pandas.read_csv, 'myfile.1.csv'),
405	('read-csv', 2): (pandas.read_csv, 'myfile.2.csv'),
406	('read-csv', 3): (pandas.read_csv, 'myfile.3.csv')},
407	'add': {('add', 0): (operator.add, ('read-csv', 0), 100),
408	('add', 1): (operator.add, ('read-csv', 1), 100),
409	('add', 2): (operator.add, ('read-csv', 2), 100),
410	('add', 3): (operator.add, ('read-csv', 3), 100)}
411	'filter': {('filter', 0): (lambda part: part[part.name == 'Alice'], ('add', 0)),
412	('filter', 1): (lambda part: part[part.name == 'Alice'], ('add', 1)),
413	('filter', 2): (lambda part: part[part.name == 'Alice'], ('add', 2)),
414	('filter', 3): (lambda part: part[part.name == 'Alice'], ('add', 3))}
415	}
416
417	>>> graph.dependencies # doctest: +SKIP
418	{
419	'read-csv': set(),
420	'add': {'read-csv'},
421	'filter': {'add'}

Callers 15

_checkpoint_oneFunction · 0.90

_bind_oneFunction · 0.90

hlgMethod · 0.90

_finalize_args_collectionsFunction · 0.90

test_with_HighLevelGraphMethod · 0.90

_optimize_blockwiseFunction · 0.90

fuse_rootsFunction · 0.90

test_hlg_expr_sequence_finalizeFunction · 0.90

test_hlg_expr_sequence_nested_keysFunction · 0.90

optimizer_with_annotationsFunction · 0.90

test_hlg_sequence_uses_annotations_of_optimized_dskFunction · 0.90

test_hlg_blockwise_fusionFunction · 0.90

Calls

no outgoing calls

Tested by 15

test_with_HighLevelGraphMethod · 0.72

test_hlg_expr_sequence_finalizeFunction · 0.72

test_hlg_expr_sequence_nested_keysFunction · 0.72

optimizer_with_annotationsFunction · 0.72

test_hlg_sequence_uses_annotations_of_optimized_dskFunction · 0.72

test_hlg_blockwise_fusionFunction · 0.72

test_basicFunction · 0.72

test_getitemFunction · 0.72

test_copyFunction · 0.72

test_cullFunction · 0.72

test_cull_layersFunction · 0.72

test_repr_html_hlg_layersFunction · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…