Fuse tasks that form reductions; more advanced than ``fuse_linear`` This trades parallelism opportunities for faster scheduling by making tasks less granular. It can replace ``fuse_linear`` in optimization passes. This optimization applies to all reductions--tasks that have at most on
(
dsk,
keys=None,
dependencies=None,
ave_width=_default,
max_width=_default,
max_height=_default,
max_depth_new_edges=_default,
rename_keys=_default,
)
| 456 | |
| 457 | |
| 458 | def fuse( |
| 459 | dsk, |
| 460 | keys=None, |
| 461 | dependencies=None, |
| 462 | ave_width=_default, |
| 463 | max_width=_default, |
| 464 | max_height=_default, |
| 465 | max_depth_new_edges=_default, |
| 466 | rename_keys=_default, |
| 467 | ): |
| 468 | """Fuse tasks that form reductions; more advanced than ``fuse_linear`` |
| 469 | |
| 470 | This trades parallelism opportunities for faster scheduling by making tasks |
| 471 | less granular. It can replace ``fuse_linear`` in optimization passes. |
| 472 | |
| 473 | This optimization applies to all reductions--tasks that have at most one |
| 474 | dependent--so it may be viewed as fusing "multiple input, single output" |
| 475 | groups of tasks into a single task. There are many parameters to fine |
| 476 | tune the behavior, which are described below. ``ave_width`` is the |
| 477 | natural parameter with which to compare parallelism to granularity, so |
| 478 | it should always be specified. Reasonable values for other parameters |
| 479 | will be determined using ``ave_width`` if necessary. |
| 480 | |
| 481 | Parameters |
| 482 | ---------- |
| 483 | dsk: dict |
| 484 | dask graph |
| 485 | keys: list or set, optional |
| 486 | Keys that must remain in the returned dask graph |
| 487 | dependencies: dict, optional |
| 488 | {key: [list-of-keys]}. Must be a list to provide count of each key |
| 489 | This optional input often comes from ``cull`` |
| 490 | ave_width: float (default 1) |
| 491 | Upper limit for ``width = num_nodes / height``, a good measure of |
| 492 | parallelizability. |
| 493 | dask.config key: ``optimization.fuse.ave-width`` |
| 494 | max_width: int (default infinite) |
| 495 | Don't fuse if total width is greater than this. |
| 496 | dask.config key: ``optimization.fuse.max-width`` |
| 497 | max_height: int or None (default None) |
| 498 | Don't fuse more than this many levels. Set to None to dynamically |
| 499 | adjust to ``1.5 + ave_width * log(ave_width + 1)``. |
| 500 | dask.config key: ``optimization.fuse.max-height`` |
| 501 | max_depth_new_edges: int or None (default None) |
| 502 | Don't fuse if new dependencies are added after this many levels. |
| 503 | Set to None to dynamically adjust to ave_width * 1.5. |
| 504 | dask.config key: ``optimization.fuse.max-depth-new-edges`` |
| 505 | rename_keys: bool or func, optional (default True) |
| 506 | Whether to rename the fused keys with ``default_fused_keys_renamer`` |
| 507 | or not. Renaming fused keys can keep the graph more understandable |
| 508 | and comprehensive, but it comes at the cost of additional processing. |
| 509 | If False, then the top-most key will be used. For advanced usage, a |
| 510 | function to create the new name is also accepted. |
| 511 | dask.config key: ``optimization.fuse.rename-keys`` |
| 512 | |
| 513 | Returns |
| 514 | ------- |
| 515 | dsk |
searching dependent graphs…