Backprop from layer outputs to inputs Parameters ---------- dLdy : :py:class:`ndarray ` of shape `(n_ex, n_out)` or list of arrays The gradient(s) of the loss wrt. the layer output(s). retain_grads : bool Whether to inc
(self, dLdy, retain_grads=True)
| 2494 | return Y, Z |
| 2495 | |
| 2496 | def backward(self, dLdy, retain_grads=True): |
| 2497 | """ |
| 2498 | Backprop from layer outputs to inputs |
| 2499 | |
| 2500 | Parameters |
| 2501 | ---------- |
| 2502 | dLdy : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, n_out)` or list of arrays |
| 2503 | The gradient(s) of the loss wrt. the layer output(s). |
| 2504 | retain_grads : bool |
| 2505 | Whether to include the intermediate parameter gradients computed |
| 2506 | during the backward pass in the final parameter update. Default is |
| 2507 | True. |
| 2508 | |
| 2509 | Returns |
| 2510 | ------- |
| 2511 | dLdX : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, n_in)` |
| 2512 | The gradient of the loss wrt. the layer input `X`. |
| 2513 | """ # noqa: E501 |
| 2514 | assert self.trainable, "Layer is frozen" |
| 2515 | if not isinstance(dLdy, list): |
| 2516 | dLdy = [dLdy] |
| 2517 | |
| 2518 | dX = [] |
| 2519 | X = self.X |
| 2520 | for dy, x in zip(dLdy, X): |
| 2521 | dx, dw, db = self._bwd(dy, x) |
| 2522 | dX.append(dx) |
| 2523 | |
| 2524 | if retain_grads: |
| 2525 | self.gradients["W"] += dw |
| 2526 | self.gradients["b"] += db |
| 2527 | |
| 2528 | return dX[0] if len(X) == 1 else dX |
| 2529 | |
| 2530 | def _bwd(self, dLdy, X): |
| 2531 | """Actual computation of gradient of the loss wrt. X, W, and b""" |