| 967 | |
| 968 | |
| 969 | class BatchNorm2D(LayerBase): |
| 970 | def __init__(self, momentum=0.9, epsilon=1e-5, optimizer=None): |
| 971 | """ |
| 972 | A batch normalization layer for two-dimensional inputs with an |
| 973 | additional channel dimension. |
| 974 | |
| 975 | Notes |
| 976 | ----- |
| 977 | BatchNorm is an attempt address the problem of internal covariate |
| 978 | shift (ICS) during training by normalizing layer inputs. |
| 979 | |
| 980 | ICS refers to the change in the distribution of layer inputs during |
| 981 | training as a result of the changing parameters of the previous |
| 982 | layer(s). ICS can make it difficult to train models with saturating |
| 983 | nonlinearities, and in general can slow training by requiring a lower |
| 984 | learning rate. |
| 985 | |
| 986 | Equations [train]:: |
| 987 | |
| 988 | Y = scaler * norm(X) + intercept |
| 989 | norm(X) = (X - mean(X)) / sqrt(var(X) + epsilon) |
| 990 | |
| 991 | Equations [test]:: |
| 992 | |
| 993 | Y = scaler * running_norm(X) + intercept |
| 994 | running_norm(X) = (X - running_mean) / sqrt(running_var + epsilon) |
| 995 | |
| 996 | In contrast to :class:`LayerNorm2D`, the BatchNorm layer calculates |
| 997 | the mean and var across the *batch* rather than the output features. |
| 998 | This has two disadvantages: |
| 999 | |
| 1000 | 1. It is highly affected by batch size: smaller mini-batch sizes |
| 1001 | increase the variance of the estimates for the global mean and |
| 1002 | variance. |
| 1003 | |
| 1004 | 2. It is difficult to apply in RNNs -- one must fit a separate |
| 1005 | BatchNorm layer for *each* time-step. |
| 1006 | |
| 1007 | Parameters |
| 1008 | ---------- |
| 1009 | momentum : float |
| 1010 | The momentum term for the running mean/running std calculations. |
| 1011 | The closer this is to 1, the less weight will be given to the |
| 1012 | mean/std of the current batch (i.e., higher smoothing). Default is |
| 1013 | 0.9. |
| 1014 | epsilon : float |
| 1015 | A small smoothing constant to use during computation of ``norm(X)`` |
| 1016 | to avoid divide-by-zero errors. Default is 1e-5. |
| 1017 | optimizer : str, :doc:`Optimizer <numpy_ml.neural_nets.optimizers>` object, or None |
| 1018 | The optimization strategy to use when performing gradient updates |
| 1019 | within the :meth:`update` method. If None, use the :class:`SGD |
| 1020 | <numpy_ml.neural_nets.optimizers.SGD>` optimizer with |
| 1021 | default parameters. Default is None. |
| 1022 | |
| 1023 | Attributes |
| 1024 | ---------- |
| 1025 | X : list |
| 1026 | Running list of inputs to the :meth:`forward <numpy_ml.neural_nets.LayerBase.forward>` method since the last call to :meth:`update <numpy_ml.neural_nets.LayerBase.update>`. Only updated if the `retain_derived` argument was set to True. |
no outgoing calls