Add a layer-norm operation on a tensor. That operation applies the layer-normalization to its input tensor. In its simplest form, for large language models, the 'normalized_shape' should be set to the hidden dimension of the activation tensor. Otherwise, it is the shape of the
(input: Tensor,
normalized_shape: Union[int, Tuple[int]],
weight: Optional[Tensor] = None,
bias: Optional[Tensor] = None,
eps: float = 1e-05,
use_diff_of_squares: bool = True)
| 6008 | |
| 6009 | |
| 6010 | def layer_norm(input: Tensor, |
| 6011 | normalized_shape: Union[int, Tuple[int]], |
| 6012 | weight: Optional[Tensor] = None, |
| 6013 | bias: Optional[Tensor] = None, |
| 6014 | eps: float = 1e-05, |
| 6015 | use_diff_of_squares: bool = True) -> Tensor: |
| 6016 | ''' |
| 6017 | Add a layer-norm operation on a tensor. |
| 6018 | |
| 6019 | That operation applies the layer-normalization to its input tensor. In its |
| 6020 | simplest form, for large language models, the 'normalized_shape' should be |
| 6021 | set to the hidden dimension of the activation tensor. Otherwise, it is the |
| 6022 | shape of the normalized fraction of the tensor (starting from the |
| 6023 | right-most dimension). |
| 6024 | |
| 6025 | The 'weight' tensor corresponds to 'gamma' in the layer-norm formula and |
| 6026 | 'bias' is 'beta'. The 'eps' value is added to the variance before computing |
| 6027 | the squared-root. |
| 6028 | |
| 6029 | This implementation (when using the plugin) supports an additional flag to |
| 6030 | enable/disable the use of a difference of squares ('Var = Mean(X^2) - |
| 6031 | Mean(X)^2'). |
| 6032 | |
| 6033 | Parameters: |
| 6034 | input : Tensor |
| 6035 | The tensor to normalize. |
| 6036 | |
| 6037 | normalized_shape : Union[int, Tuple[int]] |
| 6038 | The shape of the sub-tensor that is normalized. Use 'hidden_dim' to |
| 6039 | normalize the inner-most dimension of an activation tensor in LLMs. |
| 6040 | |
| 6041 | weight : Optional[Tensor] = None |
| 6042 | The 'gamma' term in layer-norm. Its shape must be |
| 6043 | 'normalized_shape'. |
| 6044 | |
| 6045 | bias : Optional[Tensor] = None |
| 6046 | The 'beta' term in layer-norm. Its shape must be |
| 6047 | 'normalized_shape'. |
| 6048 | |
| 6049 | eps : float |
| 6050 | The epsilon term to be added to the variance in the squared-root. |
| 6051 | |
| 6052 | use_diff_of_squares : bool |
| 6053 | Does the plugin use the difference of squares to compute the |
| 6054 | variance? |
| 6055 | |
| 6056 | Returns: |
| 6057 | The output tensor of that operation. |
| 6058 | ''' |
| 6059 | input, weight = broadcast_helper(input, weight) |
| 6060 | input, bias = broadcast_helper(input, bias) |
| 6061 | if isinstance(normalized_shape, int): # FIXME: better way? |
| 6062 | axis = input.ndim() - 1 |
| 6063 | else: |
| 6064 | axis = input.ndim() - len(normalized_shape) |
| 6065 | axes_mask = 0 |
| 6066 | for i in range(axis, input.ndim()): |
| 6067 | axes_mask |= 1 << i |
no test coverage detected