Build Layout#
- void trt_edgellm::kernel::buildLayoutGpu(
- MoELayoutBuffers &buf,
- int32_t const *tokenSelectedExperts,
- int32_t numTokens,
- int32_t topK,
- int32_t localNumExperts,
- int32_t tileSize,
- cudaStream_t stream
GPU-side layout builder via single-CTA kernel (~3-5 us). All device pointers in
buffersmust be pre-allocated by the caller. tokenSelectedExperts must contain LOCAL expert indices in [0, L).