Add a Gated-GELU operation. That function takes a tensor, splits it into two halves along the last dimension, applies GELU to the second half and multiply the results. The behavior is undefined if the last dimension is not even. Parameters: input : Tensor T
(x: Tensor)
| 3363 | |
| 3364 | |
| 3365 | def geglu(x: Tensor) -> Tensor: |
| 3366 | ''' |
| 3367 | Add a Gated-GELU operation. |
| 3368 | |
| 3369 | That function takes a tensor, splits it into two halves along the last |
| 3370 | dimension, applies GELU to the second half and multiply the results. The |
| 3371 | behavior is undefined if the last dimension is not even. |
| 3372 | |
| 3373 | Parameters: |
| 3374 | input : Tensor |
| 3375 | The input tensor on which the activation function is applied. |
| 3376 | |
| 3377 | Returns: |
| 3378 | The tensor produced by the activation layer. |
| 3379 | ''' |
| 3380 | a, b = chunk(x, 2, dim=-1) |
| 3381 | return a * gelu(b) |
| 3382 | |
| 3383 | |
| 3384 | def quick_gelu(x: Tensor) -> Tensor: |