Add a SwiGLU (`x * SiLU(gate)`) operation. That function takes a tensor, splits it into two halves along the last dimension, applies SiLU to the second half and multiply the results. The behavior is undefined if the last dimension is not even. Parameters: input : Tenso
(input: Tensor)
| 839 | |
| 840 | |
| 841 | def swiglu(input: Tensor) -> Tensor: |
| 842 | ''' |
| 843 | Add a SwiGLU (`x * SiLU(gate)`) operation. |
| 844 | |
| 845 | That function takes a tensor, splits it into two halves along the last |
| 846 | dimension, applies SiLU to the second half and multiply the results. The |
| 847 | behavior is undefined if the last dimension is not even. |
| 848 | |
| 849 | Parameters: |
| 850 | input : Tensor |
| 851 | The input tensor on which the activation function is applied. |
| 852 | |
| 853 | Returns: |
| 854 | The tensor produced by the activation layer. |
| 855 | ''' |
| 856 | x, gate = chunk(input, 2, dim=-1) |
| 857 | return silu(gate) * x |
| 858 | |
| 859 | |
| 860 | def squared_relu(x: Tensor) -> Tensor: |