MCPcopy
hub / github.com/ray-project/ray / ScalingConfig

Class ScalingConfig

python/ray/train/v2/api/config.py:31–279  ·  view source on GitHub ↗

Configuration for scaling training. Args: num_workers: The number of workers (Ray actors) to launch. Each worker will reserve 1 CPU by default. The number of CPUs reserved by each worker can be overridden with the ``resources_per_worker`` argument. If

Source from the content-addressed store, hash-verified

29
30@dataclass
31class ScalingConfig(ScalingConfigV1):
32 """Configuration for scaling training.
33
34 Args:
35 num_workers: The number of workers (Ray actors) to launch.
36 Each worker will reserve 1 CPU by default. The number of CPUs
37 reserved by each worker can be overridden with the
38 ``resources_per_worker`` argument. If the number of workers is 0,
39 the training function will run in local mode, meaning the training
40 function runs in the same process. To enable elasticity, provide a
41 ``(min_workers, max_workers)`` tuple of ints.
42 elastic_resize_monitor_interval_s: While the worker group is healthy,
43 consider resizing the worker group every
44 ``elastic_resize_monitor_interval_s`` seconds.
45 use_gpu: If True, training will be done on GPUs (1 per worker).
46 Defaults to False. The number of GPUs reserved by each
47 worker can be overridden with the ``resources_per_worker``
48 argument.
49 resources_per_worker: If specified, the resources
50 defined in this Dict is reserved for each worker.
51 Define the ``"CPU"`` and ``"GPU"`` keys (case-sensitive) to
52 override the number of CPU or GPUs used by each worker.
53 placement_strategy: The placement strategy to use for the
54 placement group of the Ray actors. See :ref:`Placement Group
55 Strategies <pgroup-strategy>` for the possible options.
56 label_selector: A list of label selectors for Ray Train worker placement.
57 If a single label selector is provided, it will be applied to all Ray Train workers.
58 If a list is provided, it must be the same length as the max number of Ray Train workers.
59 accelerator_type: [Experimental] If specified, Ray Train will launch the
60 training coordinator and workers on the nodes with the specified type
61 of accelerators.
62 See :ref:`the available accelerator types <accelerator_types>`.
63 Ensure that your cluster has instances with the specified accelerator type
64 or is able to autoscale to fulfill the request. This field is required
65 when `use_tpu` is True and `num_workers` is greater than 1.
66 use_tpu: [Experimental] If True, training will be done on TPUs (1 TPU VM
67 per worker). Defaults to False. The number of TPUs reserved by each
68 worker can be overridden with the ``resources_per_worker``
69 argument. This arg enables SPMD execution of the training workload.
70 topology: [Experimental] If specified, Ray Train will launch the training
71 coordinator and workers on nodes with the specified topology. Topology is
72 auto-detected for TPUs and added as Ray node labels. This arg enables
73 SPMD execution of the training workload. This field is required
74 when `use_tpu` is True and `num_workers` is greater than 1.
75 """
76
77 num_workers: Union[int, Tuple[int, int]] = 1
78 trainer_resources: Optional[dict] = None
79 label_selector: Optional[Union[Dict[str, str], List[Dict[str, str]]]] = None
80
81 # Accelerator specific fields.
82 use_tpu: Union[bool] = False
83 topology: Optional[str] = None
84
85 # Elasticity specific fields.
86 elastic_resize_monitor_interval_s: float = 60.0
87
88 def __post_init__(self):

Callers 15

test_torch_trainer_crashFunction · 0.90
test_trainingMethod · 0.90
__init__Method · 0.90
__init__Method · 0.90
__init__Method · 0.90
test_keras_callback_e2eFunction · 0.90
__init__Method · 0.90
__repr__Method · 0.90
setupMethod · 0.90

Calls

no outgoing calls

Used in the wild real call sites across dependent graphs

searching dependent graphs…