Alpha Compute#

void trt_edgellm::kernel::computeFC1Alpha( float const *actGs, float const *weightGs, float *alpha, int32_t numLocalExperts, cudaStream_t stream )#

Compute per-expert FC1 alpha for the grouped-GEMM FP32 epilogue.

Forward-scale contract: actGs and weightGs are forward-direction global SFs. The kernel writes alpha[i] = (*actGs) * weightGs[i] — exactly the scalar applied before the activation inside the FC1 kernel (out[m,n] = act(alpha[e(m)] * acc)).