Task graph composed of layers of dependent subgraphs This object encodes a Dask task graph that is composed of layers of dependent subgraphs, such as commonly occurs when building task graphs using high level collections like Dask array, bag, or dataframe. Typically each high level
| 362 | |
| 363 | |
| 364 | class HighLevelGraph(Graph): |
| 365 | """Task graph composed of layers of dependent subgraphs |
| 366 | |
| 367 | This object encodes a Dask task graph that is composed of layers of |
| 368 | dependent subgraphs, such as commonly occurs when building task graphs |
| 369 | using high level collections like Dask array, bag, or dataframe. |
| 370 | |
| 371 | Typically each high level array, bag, or dataframe operation takes the task |
| 372 | graphs of the input collections, merges them, and then adds one or more new |
| 373 | layers of tasks for the new operation. These layers typically have at |
| 374 | least as many tasks as there are partitions or chunks in the collection. |
| 375 | The HighLevelGraph object stores the subgraphs for each operation |
| 376 | separately in sub-graphs, and also stores the dependency structure between |
| 377 | them. |
| 378 | |
| 379 | Parameters |
| 380 | ---------- |
| 381 | layers : Mapping[str, Mapping] |
| 382 | The subgraph layers, keyed by a unique name |
| 383 | dependencies : Mapping[str, set[str]] |
| 384 | The set of layers on which each layer depends |
| 385 | key_dependencies : dict[Key, set], optional |
| 386 | Mapping (some) keys in the high level graph to their dependencies. If |
| 387 | a key is missing, its dependencies will be calculated on-the-fly. |
| 388 | |
| 389 | Examples |
| 390 | -------- |
| 391 | Here is an idealized example that shows the internal state of a |
| 392 | HighLevelGraph |
| 393 | |
| 394 | >>> import dask.dataframe as dd |
| 395 | |
| 396 | >>> df = dd.read_csv('myfile.*.csv') # doctest: +SKIP |
| 397 | >>> df = df + 100 # doctest: +SKIP |
| 398 | >>> df = df[df.name == 'Alice'] # doctest: +SKIP |
| 399 | |
| 400 | >>> graph = df.__dask_graph__() # doctest: +SKIP |
| 401 | >>> graph.layers # doctest: +SKIP |
| 402 | { |
| 403 | 'read-csv': {('read-csv', 0): (pandas.read_csv, 'myfile.0.csv'), |
| 404 | ('read-csv', 1): (pandas.read_csv, 'myfile.1.csv'), |
| 405 | ('read-csv', 2): (pandas.read_csv, 'myfile.2.csv'), |
| 406 | ('read-csv', 3): (pandas.read_csv, 'myfile.3.csv')}, |
| 407 | 'add': {('add', 0): (operator.add, ('read-csv', 0), 100), |
| 408 | ('add', 1): (operator.add, ('read-csv', 1), 100), |
| 409 | ('add', 2): (operator.add, ('read-csv', 2), 100), |
| 410 | ('add', 3): (operator.add, ('read-csv', 3), 100)} |
| 411 | 'filter': {('filter', 0): (lambda part: part[part.name == 'Alice'], ('add', 0)), |
| 412 | ('filter', 1): (lambda part: part[part.name == 'Alice'], ('add', 1)), |
| 413 | ('filter', 2): (lambda part: part[part.name == 'Alice'], ('add', 2)), |
| 414 | ('filter', 3): (lambda part: part[part.name == 'Alice'], ('add', 3))} |
| 415 | } |
| 416 | |
| 417 | >>> graph.dependencies # doctest: +SKIP |
| 418 | { |
| 419 | 'read-csv': set(), |
| 420 | 'add': {'read-csv'}, |
| 421 | 'filter': {'add'} |
no outgoing calls
searching dependent graphs…