Alpha Compute#

void trt_edgellm::kernel::computeFC1Alpha(
float const *actGs,
float const *weightGs,
float *alpha,
int32_t numLocalExperts,
cudaStream_t stream
)#

Compute per-expert FC1 alpha for the grouped-GEMM FP32 epilogue.

Forward-scale contract: actGs and weightGs are forward-direction global SFs. The kernel writes alpha[i] = (*actGs) * weightGs[i] — exactly the scalar applied before the activation inside the FC1 kernel (out[m,n] = act(alpha[e(m)] * acc)).

Parameters:
  • actGs – [1] float32 on device — forward-direction activation GS.

  • weightGs – [L] float32 on device — forward-direction per-expert weight GS.

  • alpha – [L] float32 on device (output).

  • numLocalExperts – L — number of local experts.

  • stream – CUDA stream.

void trt_edgellm::kernel::computeFC2Alpha(
float const *actGs,
float const *weightGs,
float *alpha,
int32_t numLocalExperts,
cudaStream_t stream
)#

Compute per-expert FC2 alpha. Same shape as computeFC1Alpha; distinct symbol provided for clarity at callsites.