hub / github.com/ddbourgin/numpy-ml / BatchNorm2D

Class BatchNorm2D

numpy_ml/neural_nets/layers/layers.py:969–1215 · view source on GitHub ↗

Source from the content-addressed store, hash-verified

967
968
969	class BatchNorm2D(LayerBase):
970	def __init__(self, momentum=0.9, epsilon=1e-5, optimizer=None):
971	"""
972	A batch normalization layer for two-dimensional inputs with an
973	additional channel dimension.
974
975	Notes
976	-----
977	BatchNorm is an attempt address the problem of internal covariate
978	shift (ICS) during training by normalizing layer inputs.
979
980	ICS refers to the change in the distribution of layer inputs during
981	training as a result of the changing parameters of the previous
982	layer(s). ICS can make it difficult to train models with saturating
983	nonlinearities, and in general can slow training by requiring a lower
984	learning rate.
985
986	Equations [train]::
987
988	Y = scaler * norm(X) + intercept
989	norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)
990
991	Equations [test]::
992
993	Y = scaler * running_norm(X) + intercept
994	running_norm(X) = (X - running_mean) / sqrt(running_var + epsilon)
995
996	In contrast to :class:`LayerNorm2D`, the BatchNorm layer calculates
997	the mean and var across the batch rather than the output features.
998	This has two disadvantages:
999
1000	1. It is highly affected by batch size: smaller mini-batch sizes
1001	increase the variance of the estimates for the global mean and
1002	variance.
1003
1004	2. It is difficult to apply in RNNs -- one must fit a separate
1005	BatchNorm layer for each time-step.
1006
1007	Parameters
1008	----------
1009	momentum : float
1010	The momentum term for the running mean/running std calculations.
1011	The closer this is to 1, the less weight will be given to the
1012	mean/std of the current batch (i.e., higher smoothing). Default is
1013	0.9.
1014	epsilon : float
1015	A small smoothing constant to use during computation of ``norm(X)``
1016	to avoid divide-by-zero errors. Default is 1e-5.
1017	optimizer : str, :doc:`Optimizer <numpy_ml.neural_nets.optimizers>` object, or None
1018	The optimization strategy to use when performing gradient updates
1019	within the :meth:`update` method. If None, use the :class:`SGD
1020	<numpy_ml.neural_nets.optimizers.SGD>` optimizer with
1021	default parameters. Default is None.
1022
1023	Attributes
1024	----------
1025	X : list
1026	Running list of inputs to the :meth:`forward <numpy_ml.neural_nets.LayerBase.forward>` method since the last call to :meth:`update <numpy_ml.neural_nets.LayerBase.update>`. Only updated if the `retain_derived` argument was set to True.

Callers 3

test_BatchNorm2DFunction · 0.90

_init_paramsMethod · 0.85

Calls

no outgoing calls

Tested by 1

test_BatchNorm2DFunction · 0.72