MCPcopy
hub / github.com/ddbourgin/numpy-ml / BatchNorm2D

Class BatchNorm2D

numpy_ml/neural_nets/layers/layers.py:969–1215  ·  view source on GitHub ↗

Source from the content-addressed store, hash-verified

967
968
969class BatchNorm2D(LayerBase):
970 def __init__(self, momentum=0.9, epsilon=1e-5, optimizer=None):
971 """
972 A batch normalization layer for two-dimensional inputs with an
973 additional channel dimension.
974
975 Notes
976 -----
977 BatchNorm is an attempt address the problem of internal covariate
978 shift (ICS) during training by normalizing layer inputs.
979
980 ICS refers to the change in the distribution of layer inputs during
981 training as a result of the changing parameters of the previous
982 layer(s). ICS can make it difficult to train models with saturating
983 nonlinearities, and in general can slow training by requiring a lower
984 learning rate.
985
986 Equations [train]::
987
988 Y = scaler * norm(X) + intercept
989 norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)
990
991 Equations [test]::
992
993 Y = scaler * running_norm(X) + intercept
994 running_norm(X) = (X - running_mean) / sqrt(running_var + epsilon)
995
996 In contrast to :class:`LayerNorm2D`, the BatchNorm layer calculates
997 the mean and var across the *batch* rather than the output features.
998 This has two disadvantages:
999
1000 1. It is highly affected by batch size: smaller mini-batch sizes
1001 increase the variance of the estimates for the global mean and
1002 variance.
1003
1004 2. It is difficult to apply in RNNs -- one must fit a separate
1005 BatchNorm layer for *each* time-step.
1006
1007 Parameters
1008 ----------
1009 momentum : float
1010 The momentum term for the running mean/running std calculations.
1011 The closer this is to 1, the less weight will be given to the
1012 mean/std of the current batch (i.e., higher smoothing). Default is
1013 0.9.
1014 epsilon : float
1015 A small smoothing constant to use during computation of ``norm(X)``
1016 to avoid divide-by-zero errors. Default is 1e-5.
1017 optimizer : str, :doc:`Optimizer <numpy_ml.neural_nets.optimizers>` object, or None
1018 The optimization strategy to use when performing gradient updates
1019 within the :meth:`update` method. If None, use the :class:`SGD
1020 <numpy_ml.neural_nets.optimizers.SGD>` optimizer with
1021 default parameters. Default is None.
1022
1023 Attributes
1024 ----------
1025 X : list
1026 Running list of inputs to the :meth:`forward <numpy_ml.neural_nets.LayerBase.forward>` method since the last call to :meth:`update <numpy_ml.neural_nets.LayerBase.update>`. Only updated if the `retain_derived` argument was set to True.

Callers 3

test_BatchNorm2DFunction · 0.90
_init_paramsMethod · 0.85
_init_paramsMethod · 0.85

Calls

no outgoing calls

Tested by 1

test_BatchNorm2DFunction · 0.72