Backprop from layer outputs to embedding weights. Notes ----- Because the items in `X` are interpreted as indices, we cannot compute the gradient of the layer output wrt. `X`. Parameters ---------- dLdy : :py:class:`ndarray <numpy.nd
(self, dLdy, retain_grads=True)
| 1963 | return emb |
| 1964 | |
| 1965 | def backward(self, dLdy, retain_grads=True): |
| 1966 | """ |
| 1967 | Backprop from layer outputs to embedding weights. |
| 1968 | |
| 1969 | Notes |
| 1970 | ----- |
| 1971 | Because the items in `X` are interpreted as indices, we cannot compute |
| 1972 | the gradient of the layer output wrt. `X`. |
| 1973 | |
| 1974 | Parameters |
| 1975 | ---------- |
| 1976 | dLdy : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, n_in, n_out)` or list of arrays |
| 1977 | The gradient(s) of the loss wrt. the layer output(s) |
| 1978 | retain_grads : bool |
| 1979 | Whether to include the intermediate parameter gradients computed |
| 1980 | during the backward pass in the final parameter update. Default is |
| 1981 | True. |
| 1982 | """ # noqa: E501 |
| 1983 | assert self.trainable, "Layer is frozen" |
| 1984 | if not isinstance(dLdy, list): |
| 1985 | dLdy = [dLdy] |
| 1986 | |
| 1987 | for dy, x in zip(dLdy, self.X): |
| 1988 | dw = self._bwd(dy, x) |
| 1989 | |
| 1990 | if retain_grads: |
| 1991 | self.gradients["W"] += dw |
| 1992 | |
| 1993 | def _bwd(self, dLdy, X): |
| 1994 | """Actual computation of gradient of the loss wrt. W""" |