Compute the layer output on a single minibatch. Notes ----- Equations [train]:: Y = scaler * norm(X) + intercept norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon) Equations [test]:: Y = scaler * running_norm(X) + interce
(self, X, retain_derived=True)
| 1093 | self.parameters["running_var"] = np.ones(self.in_ch) |
| 1094 | |
| 1095 | def forward(self, X, retain_derived=True): |
| 1096 | """ |
| 1097 | Compute the layer output on a single minibatch. |
| 1098 | |
| 1099 | Notes |
| 1100 | ----- |
| 1101 | Equations [train]:: |
| 1102 | |
| 1103 | Y = scaler * norm(X) + intercept |
| 1104 | norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon) |
| 1105 | |
| 1106 | Equations [test]:: |
| 1107 | |
| 1108 | Y = scaler * running_norm(X) + intercept |
| 1109 | running_norm(X) = (X - running_mean) / sqrt(running_var + epsilon) |
| 1110 | |
| 1111 | In contrast to :class:`LayerNorm2D`, the BatchNorm layer calculates the |
| 1112 | mean and var across the *batch* rather than the output features. |
| 1113 | |
| 1114 | Parameters |
| 1115 | ---------- |
| 1116 | X : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, in_rows, in_cols, in_ch)` |
| 1117 | Input volume containing the `in_rows` x `in_cols`-dimensional |
| 1118 | features for a minibatch of `n_ex` examples. |
| 1119 | retain_derived : bool |
| 1120 | Whether to use the current intput to adjust the running mean and |
| 1121 | running_var computations. Setting this to False is the same as |
| 1122 | freezing the layer for the current input. Default is True. |
| 1123 | |
| 1124 | Returns |
| 1125 | ------- |
| 1126 | Y : :py:class:`ndarray <numpy.ndarray>` of shape `(n_ex, in_rows, in_cols, in_ch)` |
| 1127 | Layer output for each of the `n_ex` examples. |
| 1128 | """ # noqa: E501 |
| 1129 | if not self.is_initialized: |
| 1130 | self.in_ch = self.out_ch = X.shape[3] |
| 1131 | self._init_params() |
| 1132 | |
| 1133 | ep = self.hyperparameters["epsilon"] |
| 1134 | mm = self.hyperparameters["momentum"] |
| 1135 | rm = self.parameters["running_mean"] |
| 1136 | rv = self.parameters["running_var"] |
| 1137 | |
| 1138 | scaler = self.parameters["scaler"] |
| 1139 | intercept = self.parameters["intercept"] |
| 1140 | |
| 1141 | # if the layer is frozen, use our running mean/std values rather |
| 1142 | # than the mean/std values for the new batch |
| 1143 | X_mean = self.parameters["running_mean"] |
| 1144 | X_var = self.parameters["running_var"] |
| 1145 | |
| 1146 | if self.trainable and retain_derived: |
| 1147 | X_mean, X_var = X.mean(axis=(0, 1, 2)), X.var(axis=(0, 1, 2)) # , ddof=1) |
| 1148 | self.parameters["running_mean"] = mm * rm + (1.0 - mm) * X_mean |
| 1149 | self.parameters["running_var"] = mm * rv + (1.0 - mm) * X_var |
| 1150 | |
| 1151 | if retain_derived: |
| 1152 | self.X.append(X) |