MCPcopy
hub / github.com/meta-pytorch/opacus / zero_grad

Method zero_grad

opacus/optimizers/optimizer.py:506–537  ·  view source on GitHub ↗

Clear gradients. Clears ``p.grad``, ``p.grad_sample`` and ``p.summed_grad`` for all of it's parameters Notes: ``set_to_none`` argument only affects ``p.grad``. ``p.grad_sample`` and ``p.summed_grad`` is never zeroed out and always set to None.

(self, set_to_none: bool = False)

Source from the content-addressed store, hash-verified

504 p.grad /= self.expected_batch_size * self.accumulated_iterations
505
506 def zero_grad(self, set_to_none: bool = False):
507 """
508 Clear gradients.
509
510 Clears ``p.grad``, ``p.grad_sample`` and ``p.summed_grad`` for all of it's parameters
511
512 Notes:
513 ``set_to_none`` argument only affects ``p.grad``. ``p.grad_sample`` and
514 ``p.summed_grad`` is never zeroed out and always set to None.
515 Normal grads can do this, because their shape is always the same.
516 Grad samples do not behave like this, as we accumulate gradients from different
517 batches in a list
518
519 Args:
520 set_to_none: instead of setting to zero, set the grads to None. (only
521 affects regular gradients. Per sample gradients are always set to None)
522 """
523
524 if set_to_none is False:
525 logger.debug(
526 "Despite set_to_none is set to False, "
527 "opacus will set p.grad_sample and p.summed_grad to None due to "
528 "non-trivial gradient accumulation behaviour"
529 )
530
531 for p in self.params:
532 p.grad_sample = None
533
534 if not self._is_last_step_skipped:
535 p.summed_grad = None
536
537 self.original_optimizer.zero_grad(set_to_none)
538
539 def pre_step(
540 self, closure: Optional[Callable[[], float]] = None

Calls

no outgoing calls