Clear gradients. Clears ``p.grad``, ``p.grad_sample`` and ``p.summed_grad`` for all of it's parameters Notes: ``set_to_none`` argument only affects ``p.grad``. ``p.grad_sample`` and ``p.summed_grad`` is never zeroed out and always set to None.
(self, set_to_none: bool = False)
| 504 | p.grad /= self.expected_batch_size * self.accumulated_iterations |
| 505 | |
| 506 | def zero_grad(self, set_to_none: bool = False): |
| 507 | """ |
| 508 | Clear gradients. |
| 509 | |
| 510 | Clears ``p.grad``, ``p.grad_sample`` and ``p.summed_grad`` for all of it's parameters |
| 511 | |
| 512 | Notes: |
| 513 | ``set_to_none`` argument only affects ``p.grad``. ``p.grad_sample`` and |
| 514 | ``p.summed_grad`` is never zeroed out and always set to None. |
| 515 | Normal grads can do this, because their shape is always the same. |
| 516 | Grad samples do not behave like this, as we accumulate gradients from different |
| 517 | batches in a list |
| 518 | |
| 519 | Args: |
| 520 | set_to_none: instead of setting to zero, set the grads to None. (only |
| 521 | affects regular gradients. Per sample gradients are always set to None) |
| 522 | """ |
| 523 | |
| 524 | if set_to_none is False: |
| 525 | logger.debug( |
| 526 | "Despite set_to_none is set to False, " |
| 527 | "opacus will set p.grad_sample and p.summed_grad to None due to " |
| 528 | "non-trivial gradient accumulation behaviour" |
| 529 | ) |
| 530 | |
| 531 | for p in self.params: |
| 532 | p.grad_sample = None |
| 533 | |
| 534 | if not self._is_last_step_skipped: |
| 535 | p.summed_grad = None |
| 536 | |
| 537 | self.original_optimizer.zero_grad(set_to_none) |
| 538 | |
| 539 | def pre_step( |
| 540 | self, closure: Optional[Callable[[], float]] = None |
no outgoing calls