Moe Sigmoid Group Topk Kernels#
- void trt_edgellm::kernel::moeSigmoidGroupTopk(
- rt::Tensor const &gatingOutput,
- rt::Tensor &topkWeights,
- rt::Tensor &topkIndices,
- int32_t topK,
- int32_t nGroup,
- int32_t topkGroup,
- bool normTopkProb,
- float routedScalingFactor,
- cudaStream_t stream,
- rt::OptionalInputTensor correctionBias = std::nullopt
MoE Sigmoid Group TopK kernel implementing HuggingFace NemotronH routing.
This kernel implements the grouped top-k routing algorithm from NemotronHMoE:
Applies sigmoid to router logits: scores = sigmoid(logits)
Adds optional correction bias: biased = scores + bias
Groups experts, finds top-2 per group, sums -> groupScores
Selects topkGroup groups with highest groupScores
Masks experts NOT in selected groups
Selects topK experts from masked biased scores
Gathers weights from ORIGINAL sigmoid scores (not biased)
Optionally renormalizes weights to sum to 1
Scales weights by routedScalingFactor
- Parameters:
gatingOutput – Input router logits [numTokens, numExperts] (FP32, GPU)
topkWeights – Output selected expert weights [numTokens, topK] (FP32, GPU)
topkIndices – Output selected expert indices [numTokens, topK] (INT32, GPU)
topK – Number of experts to select per token
nGroup – Number of expert groups
topkGroup – Number of groups to select
normTopkProb – Whether to renormalize topK weights to sum to 1
routedScalingFactor – Scaling factor applied to final weights
stream – CUDA stream for execution
correctionBias – Optional bias tensor [numExperts] for expert load balancing (FP32, GPU)