Actual computation of the gradient of the loss wrt. the input X. The Jacobian, J, of the softmax for input x = [x1, ..., xn] is: J[i, j] = softmax(x_i) * (1 - softmax(x_j)) if i = j -softmax(x_i) * softmax(x_j) if i != j
(self, dLdy, X)
| 2328 | return dX[0] if len(X) == 1 else dX |
| 2329 | |
| 2330 | def _bwd(self, dLdy, X): |
| 2331 | """ |
| 2332 | Actual computation of the gradient of the loss wrt. the input X. |
| 2333 | |
| 2334 | The Jacobian, J, of the softmax for input x = [x1, ..., xn] is: |
| 2335 | J[i, j] = |
| 2336 | softmax(x_i) * (1 - softmax(x_j)) if i = j |
| 2337 | -softmax(x_i) * softmax(x_j) if i != j |
| 2338 | where |
| 2339 | x_n is input example n (ie., the n'th row in X) |
| 2340 | """ |
| 2341 | dX = [] |
| 2342 | for dy, x in zip(dLdy, X): |
| 2343 | dxi = [] |
| 2344 | for dyi, xi in zip(*np.atleast_2d(dy, x)): |
| 2345 | yi = self._fwd(xi.reshape(1, -1)).reshape(-1, 1) |
| 2346 | dyidxi = np.diagflat(yi) - yi @ yi.T # jacobian wrt. input sample xi |
| 2347 | dxi.append(dyi @ dyidxi) |
| 2348 | dX.append(dxi) |
| 2349 | return np.array(dX).reshape(*X.shape) |
| 2350 | |
| 2351 | |
| 2352 | class SparseEvolution(LayerBase): |