Backprop for a single timestep. Parameters ---------- dLdAt : :py:class:`ndarray ` of shape `(n_ex, n_out)` The gradient of the loss wrt. the layer outputs (ie., hidden states) at timestep `t`. Returns -------
(self, dLdAt)
| 3992 | return At, Ct |
| 3993 | |
| 3994 | def backward(self, dLdAt): |
| 3995 | """ |
| 3996 | Backprop for a single timestep. |
| 3997 | |
| 3998 | Parameters |
| 3999 | ---------- |
| 4000 | dLdAt : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, n_out)` |
| 4001 | The gradient of the loss wrt. the layer outputs (ie., hidden |
| 4002 | states) at timestep `t`. |
| 4003 | |
| 4004 | Returns |
| 4005 | ------- |
| 4006 | dLdXt : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, n_in)` |
| 4007 | The gradient of the loss wrt. the layer inputs at timestep `t`. |
| 4008 | """ |
| 4009 | assert self.trainable, "Layer is frozen" |
| 4010 | |
| 4011 | Wf, Wu, Wc, Wo, bf, bu, bc, bo = self._get_params() |
| 4012 | |
| 4013 | self.derived_variables["current_step"] -= 1 |
| 4014 | t = self.derived_variables["current_step"] |
| 4015 | |
| 4016 | Got = self.derived_variables["Go"][t] |
| 4017 | Gft = self.derived_variables["Gf"][t] |
| 4018 | Gut = self.derived_variables["Gu"][t] |
| 4019 | Cct = self.derived_variables["Cc"][t] |
| 4020 | At = self.derived_variables["A"][t + 1] |
| 4021 | Ct = self.derived_variables["C"][t + 1] |
| 4022 | C_prev = self.derived_variables["C"][t] |
| 4023 | A_prev = self.derived_variables["A"][t] |
| 4024 | |
| 4025 | Xt = self.X[t] |
| 4026 | Zt = np.hstack([A_prev, Xt]) |
| 4027 | |
| 4028 | dA_acc = self.derived_variables["dLdA_accumulator"] |
| 4029 | dC_acc = self.derived_variables["dLdC_accumulator"] |
| 4030 | |
| 4031 | # initialize accumulators |
| 4032 | if dA_acc is None: |
| 4033 | dA_acc = np.zeros_like(At) |
| 4034 | |
| 4035 | if dC_acc is None: |
| 4036 | dC_acc = np.zeros_like(Ct) |
| 4037 | |
| 4038 | # Gradient calculations |
| 4039 | # --------------------- |
| 4040 | |
| 4041 | dA = dLdAt + dA_acc |
| 4042 | dC = dC_acc + dA * Got * self.act_fn.grad(Ct) |
| 4043 | |
| 4044 | # compute the input to the gate functions at timestep t |
| 4045 | _Go = Zt @ Wo + bo |
| 4046 | _Gf = Zt @ Wf + bf |
| 4047 | _Gu = Zt @ Wu + bu |
| 4048 | _Gc = Zt @ Wc + bc |
| 4049 | |
| 4050 | # compute gradients wrt the *input* to each gate |
| 4051 | dGot = dA * self.act_fn(Ct) * self.gate_fn.grad(_Go) |