hub / github.com/ddbourgin/numpy-ml / BatchNorm1D

Class BatchNorm1D

numpy_ml/neural_nets/layers/layers.py:1218–1441 · view source on GitHub ↗

Source from the content-addressed store, hash-verified

1216
1217
1218	class BatchNorm1D(LayerBase):
1219	def __init__(self, momentum=0.9, epsilon=1e-5, optimizer=None):
1220	"""
1221	A batch normalization layer for 1D inputs.
1222
1223	Notes
1224	-----
1225	BatchNorm is an attempt address the problem of internal covariate
1226	shift (ICS) during training by normalizing layer inputs.
1227
1228	ICS refers to the change in the distribution of layer inputs during
1229	training as a result of the changing parameters of the previous
1230	layer(s). ICS can make it difficult to train models with saturating
1231	nonlinearities, and in general can slow training by requiring a lower
1232	learning rate.
1233
1234	Equations [train]::
1235
1236	Y = scaler * norm(X) + intercept
1237	norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon)
1238
1239	Equations [test]::
1240
1241	Y = scaler * running_norm(X) + intercept
1242	running_norm(X) = (X - running_mean) / sqrt(running_var + epsilon)
1243
1244	In contrast to :class:`LayerNorm1D`, the BatchNorm layer calculates
1245	the mean and var across the batch rather than the output features.
1246	This has two disadvantages:
1247
1248	1. It is highly affected by batch size: smaller mini-batch sizes
1249	increase the variance of the estimates for the global mean and
1250	variance.
1251
1252	2. It is difficult to apply in RNNs -- one must fit a separate
1253	BatchNorm layer for each time-step.
1254
1255	Parameters
1256	----------
1257	momentum : float
1258	The momentum term for the running mean/running std calculations.
1259	The closer this is to 1, the less weight will be given to the
1260	mean/std of the current batch (i.e., higher smoothing). Default is
1261	0.9.
1262	epsilon : float
1263	A small smoothing constant to use during computation of ``norm(X)``
1264	to avoid divide-by-zero errors. Default is 1e-5.
1265	optimizer : str, :doc:`Optimizer <numpy_ml.neural_nets.optimizers>` object, or None
1266	The optimization strategy to use when performing gradient updates
1267	within the :meth:`update` method. If None, use the :class:`SGD
1268	<numpy_ml.neural_nets.optimizers.SGD>` optimizer with
1269	default parameters. Default is None.
1270
1271	Attributes
1272	----------
1273	X : list
1274	Running list of inputs to the :meth:`forward <numpy_ml.neural_nets.LayerBase.forward>` method since the last call to :meth:`update <numpy_ml.neural_nets.LayerBase.update>`. Only updated if the `retain_derived` argument was set to True.
1275	gradients : dict

Callers 1

test_BatchNorm1DFunction · 0.90

Calls

no outgoing calls

Tested by 1

test_BatchNorm1DFunction · 0.72