hub / github.com/meta-pytorch/opacus / zero_grad

Method zero_grad

opacus/optimizers/optimizer.py:506–537 · view source on GitHub ↗

Clear gradients. Clears ``p.grad``, ``p.grad_sample`` and ``p.summed_grad`` for all of it's parameters Notes: ``set_to_none`` argument only affects ``p.grad``. ``p.grad_sample`` and ``p.summed_grad`` is never zeroed out and always set to None.

(self, set_to_none: bool = False)

Source from the content-addressed store, hash-verified

504	p.grad /= self.expected_batch_size * self.accumulated_iterations
505
506	def zero_grad(self, set_to_none: bool = False):
507	"""
508	Clear gradients.
509
510	Clears ``p.grad``, ``p.grad_sample`` and ``p.summed_grad`` for all of it's parameters
511
512	Notes:
513	``set_to_none`` argument only affects ``p.grad``. ``p.grad_sample`` and
514	``p.summed_grad`` is never zeroed out and always set to None.
515	Normal grads can do this, because their shape is always the same.
516	Grad samples do not behave like this, as we accumulate gradients from different
517	batches in a list
518
519	Args:
520	set_to_none: instead of setting to zero, set the grads to None. (only
521	affects regular gradients. Per sample gradients are always set to None)
522	"""
523
524	if set_to_none is False:
525	logger.debug(
526	"Despite set_to_none is set to False, "
527	"opacus will set p.grad_sample and p.summed_grad to None due to "
528	"non-trivial gradient accumulation behaviour"
529	)
530
531	for p in self.params:
532	p.grad_sample = None
533
534	if not self._is_last_step_skipped:
535	p.summed_grad = None
536
537	self.original_optimizer.zero_grad(set_to_none)
538
539	def pre_step(
540	self, closure: Optional[Callable[[], float]] = None

Callers 15

test_norm_calculation_fast_gradient_clippingMethod · 0.95

test_weight_update_fast_gradient_clippingMethod · 0.95

test_norm_calculationMethod · 0.95

test_gradient_calculationMethod · 0.95

compute_microbatch_grad_sampleFunction · 0.45

compute_opacus_grad_sampleFunction · 0.45

backwardMethod · 0.45

test_dpoptimizer_multidevice_clip_and_accumulateMethod · 0.45

test_adaclip_optimizer_multidevice_clip_and_accumulateMethod · 0.45

test_dpoptimizer_multidevice_full_stepMethod · 0.45

test_adaclip_optimizer_multidevice_full_stepMethod · 0.45

Calls

no outgoing calls

Tested by 15

test_norm_calculation_fast_gradient_clippingMethod · 0.76

test_weight_update_fast_gradient_clippingMethod · 0.76

test_norm_calculationMethod · 0.76

test_gradient_calculationMethod · 0.76

test_dpoptimizer_multidevice_clip_and_accumulateMethod · 0.36

test_adaclip_optimizer_multidevice_clip_and_accumulateMethod · 0.36

test_dpoptimizer_multidevice_full_stepMethod · 0.36

test_adaclip_optimizer_multidevice_full_stepMethod · 0.36

test_perlayer_optimizer_multidevice_clip_and_accumulateMethod · 0.36

test_perlayer_optimizer_multidevice_full_stepMethod · 0.36

_train_stepsMethod · 0.36

closureMethod · 0.36