gemm

Modules

modelopt.torch.kernels.quantization.gemm.fp4_kernel

NVFP4 Fake Quantization Triton Implementation.

modelopt.torch.kernels.quantization.gemm.fp4_kernel_hopper

NVFP4 Fake Quantization Triton kernels requiring compute capability >= 8.9 (Hopper+).

modelopt.torch.kernels.quantization.gemm.fp8_kernel

FP8 Triton Kernel Implementations.

modelopt.torch.kernels.quantization.gemm.gptq_fused_kernel

Fused Triton kernels for GPTQ blockwise weight-update.

modelopt.torch.kernels.quantization.gemm.nvfp4_quant

Composable Triton JIT functions for NVFP4 (E2M1) fake quantization.

Triton quantization kernels.