hub / github.com/tensorflow/models / train_and_evaluate

Method train_and_evaluate

orbit/controller.py:353–398 · view source on GitHub ↗

Runs interleaved training and evaluation. This method interleaves calls to `self.train()` and `self.evaluate()`, training the model until the global step count equals `train_steps`, and running an evaluation for `eval_steps` every `eval_interval` training steps. In addition, this me

(
      self,
      train_steps: int,
      eval_steps: int = -1,
      eval_interval: Optional[int] = None,
  )

Source from the content-addressed store, hash-verified

351	return eval_output
352
353	def train_and_evaluate(
354	self,
355	train_steps: int,
356	eval_steps: int = -1,
357	eval_interval: Optional[int] = None,
358	) -> Optional[runner.Output]:
359	"""Runs interleaved training and evaluation.
360
361	This method interleaves calls to `self.train()` and `self.evaluate()`,
362	training the model until the global step count equals `train_steps`, and
363	running an evaluation for `eval_steps` every `eval_interval` training steps.
364	In addition, this method will run a final evaluation at the end of the
365	training sequence.
366
367	When async checkpointing is enabled, a sync is triggered at the end of this
368	method to make sure any ongoing async checkpoint saving is finished before
369	returning.
370
371	Args:
372	train_steps: The global step count to train up to.
373	eval_steps: The number of steps to run during an evaluation. If -1, this
374	method will evaluate over the entire evaluation dataset.
375	eval_interval: The number of training steps to run between evaluations. If
376	set, training will always stop every `eval_interval` steps, even if this
377	results in a shorter inner loop than specified by `steps_per_loop`
378	setting. If None, evaluation will only be performed after training is
379	complete.
380
381	Returns:
382	The evaluation results as a dictionary mapping names to NumPy values.
383	"""
384	self._require("trainer", for_method="train_and_evaluate")
385	self._require("evaluator", for_method="train_and_evaluate")
386
387	output = None
388	current_step = self.global_step.numpy() # Cache, since this is expensive.
389	eval_interval = eval_interval or (train_steps - current_step)
390	while current_step < train_steps:
391	interval = min(train_steps - current_step, eval_interval)
392	num_steps = current_step + interval
393	self.train(steps=num_steps, checkpoint_at_completion=False)
394	output = self.evaluate(steps=eval_steps)
395	current_step = self.global_step.numpy()
396	self._maybe_save_checkpoint(check_interval=False)
397	self._sync_on_async_checkpointing()
398	return output
399
400	def evaluate_continuously(
401	self,

Callers 15

run_experimentFunction · 0.95

run_experiment_with_multitask_evalFunction · 0.95

run_experimentFunction · 0.95

runFunction · 0.95

test_no_checkpointMethod · 0.95

test_no_checkpoint_and_summariesMethod · 0.95

test_has_checkpoint_no_summariesMethod · 0.95

test_has_checkpoint_eval_summary_onlyMethod · 0.95

test_train_and_evaluateMethod · 0.95

test_train_and_evaluate_with_same_summary_dirMethod · 0.95

test_train_and_evaluate_reset_datasetsMethod · 0.95

test_eval_and_checkpoint_intervalMethod · 0.95

Calls 5

_requireMethod · 0.95

trainMethod · 0.95

evaluateMethod · 0.95

_maybe_save_checkpointMethod · 0.95

_sync_on_async_checkpointingMethod · 0.95

Tested by 9

test_no_checkpointMethod · 0.76

test_no_checkpoint_and_summariesMethod · 0.76

test_has_checkpoint_no_summariesMethod · 0.76

test_has_checkpoint_eval_summary_onlyMethod · 0.76

test_train_and_evaluateMethod · 0.76

test_train_and_evaluate_with_same_summary_dirMethod · 0.76

test_train_and_evaluate_reset_datasetsMethod · 0.76

test_eval_and_checkpoint_intervalMethod · 0.76

test_actionsMethod · 0.76