MCPcopy Index your code
hub / github.com/pytorch/pytorch / CheckpointManager

Class CheckpointManager

caffe2/python/checkpoint.py:149–429  ·  view source on GitHub ↗

Controls saving and loading of workspaces on every epoch boundary of a job. If a CheckpointManager instance is passed to JobRunner, then JobRunner will call `init`, `read` and `save` at different moments in between epoch runs. Args: db_prefix: The prefix used to construct f

Source from the content-addressed store, hash-verified

147
148
149class CheckpointManager:
150 """
151 Controls saving and loading of workspaces on every epoch boundary of a job.
152 If a CheckpointManager instance is passed to JobRunner, then JobRunner will
153 call `init`, `read` and `save` at different moments in between epoch runs.
154
155 Args:
156 db_prefix: The prefix used to construct full db name. Since `absolute_path`
157 is set to True, this will be used as db_name in SaveOp.
158 node_name: Name of the node where this checkpoint_manager is used.
159 db_type: Type of database to use for storing checkpoint.
160 metadata_handler: An optional object capable of reading/writing
161 checkpoint info in storage of choice.
162 """
163
164 BLOB_NAMES = "blob_names"
165
166 def __init__(self, db_prefix, node_name, db_type, metadata_handler=None):
167 self._db_prefix = db_prefix
168 self._node_name = node_name
169 self._db_type = db_type
170 self._metadata_handler = metadata_handler
171 # make sure these blobs are the first in the checkpoint file.
172 self._net = core.Net('!!checkpoint_mngr')
173 self._blob_names = self._net.AddExternalInput(self.BLOB_NAMES)
174 self._names_output = None
175 self._path_prefix = None
176 self._path_type = None
177 self._current_db_name = None
178 self._current_checkpoint_duration = None
179
180 """
181 Initialize the checkpoint manager. Determines all blobs that need to be saved
182 or loads from a checkpoint.
183
184 Args:
185 nodes: An array of nodes where this checkpoint manager is running. Should
186 only contain a single node.
187 retrieve_from_epoch: Set to a number to load blobs from this epoch.
188 path_prefix: Used to construct db name or path where checkpoint files are
189 stored.
190 path_type: Indicate the type of path where checkpoint files are stored.
191 """
192 def init(
193 self,
194 nodes=None,
195 retrieve_from_epoch=None,
196 path_prefix=None,
197 path_type=None
198 ):
199 """
200 Build a Task that will be run once after the job's `init_group` is run.
201 This task will determine which blobs need to be checkpointed.
202 If retrieve_from_epoch is not None, then the checkpoint metadata is
203 retrieved from a previously saved checkpoint.
204 """
205 assert nodes is None or len(nodes) == 1, (
206 'CheckpointManager only supports single node.')

Callers 3

builderMethod · 0.90
initMethod · 0.85
load_blobs_locallyMethod · 0.85

Calls

no outgoing calls

Tested by 1

builderMethod · 0.72

Used in the wild real call sites across dependent graphs

searching dependent graphs…