Moe Activation Kernels#
- void trt_edgellm::kernel::swiGluActivation(
- rt::Tensor const &gateUpInput,
- rt::Tensor &output,
- int64_t numTokens,
- int64_t intermediateDim,
- cudaStream_t stream
Apply SwiGLU activation: silu(gate) * up.
This kernel applies the SwiGLU (Swish-Gated Linear Unit) activation function, commonly used in MoE models like Qwen, Llama, and Mistral.
Given an input of shape [N, 2*D], it:
Splits the input into gate [N, D] and up [N, D]
Applies SiLU (Swish) activation to gate: silu(x) = x * sigmoid(x)
Multiplies element-wise: output = silu(gate) * up
Optimizations:
Vectorized 128-bit loads/stores (8 FP16 elements) for better memory bandwidth
Fused split, activation, and multiplication in a single pass
No intermediate storage required
- Parameters:
gateUpInput – Input tensor [numTokens, 2*intermediateDim] (FP16, GPU)
output – Output tensor [numTokens, intermediateDim] (FP16, GPU)
numTokens – Number of tokens
intermediateDim – Intermediate dimension (output will be this size)
stream – CUDA stream
- Throws:
std::runtime_error – If any of the following preconditions are violated:
gateUpInput is not 2D with shape [numTokens, 2*intermediateDim]
output is not 2D with shape [numTokens, intermediateDim]
Either tensor is not FP16 or not on GPU
intermediateDim is not a multiple of 8 (required for 128-bit vectorized access)
Data pointers are not 16-byte aligned