conv

Modules

modelopt.torch.kernels.quantization.conv.bench_implicit_gemm

Latency benchmark: implicit GEMM (quant / non-quant) vs cuDNN conv3d.

modelopt.torch.kernels.quantization.conv.implicit_gemm_cuda

Conv3D Implicit GEMM with BF16 WMMA Tensor Cores and optional fused FP4 quantization.

Implicit-GEMM CUDA kernel for quantized 3D convolution.