hub / github.com/ddbourgin/numpy-ml / backward

Method backward

numpy_ml/neural_nets/layers/layers.py:3994–4070 · view source on GitHub ↗

Backprop for a single timestep. Parameters ---------- dLdAt : :py:class:`ndarray ` of shape `(n_ex, n_out)` The gradient of the loss wrt. the layer outputs (ie., hidden states) at timestep `t`. Returns -------

(self, dLdAt)

Source from the content-addressed store, hash-verified

3992	return At, Ct
3993
3994	def backward(self, dLdAt):
3995	"""
3996	Backprop for a single timestep.
3997
3998	Parameters
3999	----------
4000	dLdAt : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, n_out)`
4001	The gradient of the loss wrt. the layer outputs (ie., hidden
4002	states) at timestep `t`.
4003
4004	Returns
4005	-------
4006	dLdXt : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, n_in)`
4007	The gradient of the loss wrt. the layer inputs at timestep `t`.
4008	"""
4009	assert self.trainable, "Layer is frozen"
4010
4011	Wf, Wu, Wc, Wo, bf, bu, bc, bo = self._get_params()
4012
4013	self.derived_variables["current_step"] -= 1
4014	t = self.derived_variables["current_step"]
4015
4016	Got = self.derived_variables["Go"][t]
4017	Gft = self.derived_variables["Gf"][t]
4018	Gut = self.derived_variables["Gu"][t]
4019	Cct = self.derived_variables["Cc"][t]
4020	At = self.derived_variables["A"][t + 1]
4021	Ct = self.derived_variables["C"][t + 1]
4022	C_prev = self.derived_variables["C"][t]
4023	A_prev = self.derived_variables["A"][t]
4024
4025	Xt = self.X[t]
4026	Zt = np.hstack([A_prev, Xt])
4027
4028	dA_acc = self.derived_variables["dLdA_accumulator"]
4029	dC_acc = self.derived_variables["dLdC_accumulator"]
4030
4031	# initialize accumulators
4032	if dA_acc is None:
4033	dA_acc = np.zeros_like(At)
4034
4035	if dC_acc is None:
4036	dC_acc = np.zeros_like(Ct)
4037
4038	# Gradient calculations
4039	# ---------------------
4040
4041	dA = dLdAt + dA_acc
4042	dC = dC_acc + dA * Got * self.act_fn.grad(Ct)
4043
4044	# compute the input to the gate functions at timestep t
4045	_Go = Zt @ Wo + bo
4046	_Gf = Zt @ Wf + bf
4047	_Gu = Zt @ Wu + bu
4048	_Gc = Zt @ Wc + bc
4049
4050	# compute gradients wrt the input to each gate
4051	dGot = dA * self.act_fn(Ct) * self.gate_fn.grad(_Go)

Callers 1

test_LSTMCellFunction · 0.95

Calls 3

_get_paramsMethod · 0.95

act_fnMethod · 0.80

gradMethod · 0.45

Tested by 1

test_LSTMCellFunction · 0.76