MCPcopy
hub / github.com/ddbourgin/numpy-ml / backward

Method backward

numpy_ml/neural_nets/layers/layers.py:3994–4070  ·  view source on GitHub ↗

Backprop for a single timestep. Parameters ---------- dLdAt : :py:class:`ndarray ` of shape `(n_ex, n_out)` The gradient of the loss wrt. the layer outputs (ie., hidden states) at timestep `t`. Returns -------

(self, dLdAt)

Source from the content-addressed store, hash-verified

3992 return At, Ct
3993
3994 def backward(self, dLdAt):
3995 """
3996 Backprop for a single timestep.
3997
3998 Parameters
3999 ----------
4000 dLdAt : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, n_out)`
4001 The gradient of the loss wrt. the layer outputs (ie., hidden
4002 states) at timestep `t`.
4003
4004 Returns
4005 -------
4006 dLdXt : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, n_in)`
4007 The gradient of the loss wrt. the layer inputs at timestep `t`.
4008 """
4009 assert self.trainable, "Layer is frozen"
4010
4011 Wf, Wu, Wc, Wo, bf, bu, bc, bo = self._get_params()
4012
4013 self.derived_variables["current_step"] -= 1
4014 t = self.derived_variables["current_step"]
4015
4016 Got = self.derived_variables["Go"][t]
4017 Gft = self.derived_variables["Gf"][t]
4018 Gut = self.derived_variables["Gu"][t]
4019 Cct = self.derived_variables["Cc"][t]
4020 At = self.derived_variables["A"][t + 1]
4021 Ct = self.derived_variables["C"][t + 1]
4022 C_prev = self.derived_variables["C"][t]
4023 A_prev = self.derived_variables["A"][t]
4024
4025 Xt = self.X[t]
4026 Zt = np.hstack([A_prev, Xt])
4027
4028 dA_acc = self.derived_variables["dLdA_accumulator"]
4029 dC_acc = self.derived_variables["dLdC_accumulator"]
4030
4031 # initialize accumulators
4032 if dA_acc is None:
4033 dA_acc = np.zeros_like(At)
4034
4035 if dC_acc is None:
4036 dC_acc = np.zeros_like(Ct)
4037
4038 # Gradient calculations
4039 # ---------------------
4040
4041 dA = dLdAt + dA_acc
4042 dC = dC_acc + dA * Got * self.act_fn.grad(Ct)
4043
4044 # compute the input to the gate functions at timestep t
4045 _Go = Zt @ Wo + bo
4046 _Gf = Zt @ Wf + bf
4047 _Gu = Zt @ Wu + bu
4048 _Gc = Zt @ Wc + bc
4049
4050 # compute gradients wrt the *input* to each gate
4051 dGot = dA * self.act_fn(Ct) * self.gate_fn.grad(_Go)

Callers 1

test_LSTMCellFunction · 0.95

Calls 3

_get_paramsMethod · 0.95
act_fnMethod · 0.80
gradMethod · 0.45

Tested by 1

test_LSTMCellFunction · 0.76