Alpha Compute#
- void trt_edgellm::kernel::computeFC1Alpha(
- float const *actGs,
- float const *weightGs,
- float *alpha,
- int32_t numLocalExperts,
- cudaStream_t stream
Compute per-expert FC1 alpha for the grouped-GEMM FP32 epilogue.
Forward-scale contract:
actGsandweightGsare forward-direction global SFs. The kernel writesalpha[i] = (*actGs) * weightGs[i]— exactly the scalar applied before the activation inside the FC1 kernel (out[m,n] = act(alpha[e(m)] * acc)).- Parameters:
actGs – [1] float32 on device — forward-direction activation GS.
weightGs – [L] float32 on device — forward-direction per-expert weight GS.
alpha – [L] float32 on device (output).
numLocalExperts – L — number of local experts.
stream – CUDA stream.
- void trt_edgellm::kernel::computeFC2Alpha(
- float const *actGs,
- float const *weightGs,
- float *alpha,
- int32_t numLocalExperts,
- cudaStream_t stream
Compute per-expert FC2 alpha. Same shape as
computeFC1Alpha; distinct symbol provided for clarity at callsites.