quantization

Modules

modelopt.torch.kernels.quantization.attention

Quantization-specific attention kernel pieces.

modelopt.torch.kernels.quantization.common

Shared composable Triton JIT fake-quantization functions.

modelopt.torch.kernels.quantization.conv

Implicit-GEMM CUDA kernel for quantized 3D convolution.

modelopt.torch.kernels.quantization.gemm

Triton quantization kernels.

Quantization kernels: conv (implicit GEMM) and gemm (tensor_quant + Triton FP4/FP8).