MCPcopy
hub / github.com/ddbourgin/numpy-ml / BatchNorm1D

Class BatchNorm1D

numpy_ml/neural_nets/layers/layers.py:1218–1441  ·  view source on GitHub ↗

Source from the content-addressed store, hash-verified

1216
1217
1218class BatchNorm1D(LayerBase):
1219 def __init__(self, momentum=0.9, epsilon=1e-5, optimizer=None):
1220 """
1221 A batch normalization layer for 1D inputs.
1222
1223 Notes
1224 -----
1225 BatchNorm is an attempt address the problem of internal covariate
1226 shift (ICS) during training by normalizing layer inputs.
1227
1228 ICS refers to the change in the distribution of layer inputs during
1229 training as a result of the changing parameters of the previous
1230 layer(s). ICS can make it difficult to train models with saturating
1231 nonlinearities, and in general can slow training by requiring a lower
1232 learning rate.
1233
1234 Equations [train]::
1235
1236 Y = scaler * norm(X) + intercept
1237 norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)
1238
1239 Equations [test]::
1240
1241 Y = scaler * running_norm(X) + intercept
1242 running_norm(X) = (X - running_mean) / sqrt(running_var + epsilon)
1243
1244 In contrast to :class:`LayerNorm1D`, the BatchNorm layer calculates
1245 the mean and var across the *batch* rather than the output features.
1246 This has two disadvantages:
1247
1248 1. It is highly affected by batch size: smaller mini-batch sizes
1249 increase the variance of the estimates for the global mean and
1250 variance.
1251
1252 2. It is difficult to apply in RNNs -- one must fit a separate
1253 BatchNorm layer for *each* time-step.
1254
1255 Parameters
1256 ----------
1257 momentum : float
1258 The momentum term for the running mean/running std calculations.
1259 The closer this is to 1, the less weight will be given to the
1260 mean/std of the current batch (i.e., higher smoothing). Default is
1261 0.9.
1262 epsilon : float
1263 A small smoothing constant to use during computation of ``norm(X)``
1264 to avoid divide-by-zero errors. Default is 1e-5.
1265 optimizer : str, :doc:`Optimizer <numpy_ml.neural_nets.optimizers>` object, or None
1266 The optimization strategy to use when performing gradient updates
1267 within the :meth:`update` method. If None, use the :class:`SGD
1268 <numpy_ml.neural_nets.optimizers.SGD>` optimizer with
1269 default parameters. Default is None.
1270
1271 Attributes
1272 ----------
1273 X : list
1274 Running list of inputs to the :meth:`forward <numpy_ml.neural_nets.LayerBase.forward>` method since the last call to :meth:`update <numpy_ml.neural_nets.LayerBase.update>`. Only updated if the `retain_derived` argument was set to True.
1275 gradients : dict

Callers 1

test_BatchNorm1DFunction · 0.90

Calls

no outgoing calls

Tested by 1

test_BatchNorm1DFunction · 0.72