hub / github.com/ddbourgin/numpy-ml / _backward_naive

Method _backward_naive

numpy_ml/neural_nets/layers/layers.py:2833–2892 · view source on GitHub ↗

A slower (ie., non-vectorized) but more straightforward implementation of the gradient computations for a 2D conv layer. Parameters ---------- dLdy : :py:class:`ndarray ` of shape `(n_ex, l_out, out_ch)` or list of arrays The gradi

(self, dLdy, retain_grads=True)

Source from the content-addressed store, hash-verified

2831	return np.squeeze(dX, axis=1), np.squeeze(dW, axis=0), dB
2832
2833	def _backward_naive(self, dLdy, retain_grads=True):
2834	"""
2835	A slower (ie., non-vectorized) but more straightforward implementation
2836	of the gradient computations for a 2D conv layer.
2837
2838	Parameters
2839	----------
2840	dLdy : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, l_out, out_ch)` or list of arrays
2841	The gradient(s) of the loss with respect to the layer output(s).
2842	retain_grads : bool
2843	Whether to include the intermediate parameter gradients computed
2844	during the backward pass in the final parameter update. Default is
2845	True.
2846
2847	Returns
2848	-------
2849	dX : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, l_in, in_ch)`
2850	The gradient of the loss with respect to the layer input volume.
2851	""" # noqa: E501
2852	assert self.trainable, "Layer is frozen"
2853	if not isinstance(dLdy, list):
2854	dLdy = [dLdy]
2855
2856	W = self.parameters["W"]
2857	b = self.parameters["b"]
2858	Zs = self.derived_variables["Z"]
2859
2860	Xs, d = self.X, self.dilation
2861	fw, s, p = self.kernel_width, self.stride, self.pad
2862
2863	dXs = []
2864	for X, Z, dy in zip(Xs, Zs, dLdy):
2865	n_ex, l_out, out_ch = dy.shape
2866	X_pad, (pr1, pr2) = pad1D(X, p, self.kernel_width, s, d)
2867
2868	dX = np.zeros_like(X_pad)
2869	dZ = dy * self.act_fn.grad(Z)
2870
2871	dW, dB = np.zeros_like(W), np.zeros_like(b)
2872	for m in range(n_ex):
2873	for i in range(l_out):
2874	for c in range(out_ch):
2875	# compute window boundaries w. stride and dilation
2876	i0, i1 = i * s, (i * s) + fw * (d + 1) - d
2877
2878	wc = W[:, :, c]
2879	kernel = dZ[m, i, c]
2880	window = X_pad[m, i0 : i1 : (d + 1), :]
2881
2882	dB[:, :, c] += kernel
2883	dW[:, :, c] += window * kernel
2884	dX[m, i0 : i1 : (d + 1), :] += wc * kernel
2885
2886	if retain_grads:
2887	self.gradients["W"] += dW
2888	self.gradients["b"] += dB
2889
2890	pr2 = None if pr2 == 0 else -pr2

Callers

nothing calls this directly

Calls 2

pad1DFunction · 0.85

gradMethod · 0.45

Tested by

no test coverage detected