conv
Modules
|
Latency benchmark: implicit GEMM (quant / non-quant) vs cuDNN conv3d. |
Conv3D Implicit GEMM with BF16 WMMA Tensor Cores and optional fused FP4 quantization. |
Implicit-GEMM CUDA kernel for quantized 3D convolution.
Modules
|
Latency benchmark: implicit GEMM (quant / non-quant) vs cuDNN conv3d. |
Conv3D Implicit GEMM with BF16 WMMA Tensor Cores and optional fused FP4 quantization. |
Implicit-GEMM CUDA kernel for quantized 3D convolution.