Welcome to The cuDNN Blog
Hey there, fellow GPU enthusiast! This is the unofficial-feeling, totally-official corner of the internet where we talk about cuDNN — NVIDIA’s library for accelerating deep learning primitives on GPUs.
Whether you’re training a massive transformer, fine-tuning a convolutional network, or just trying to get GPUs to go brrr, cuDNN is the engine under the hood making it happen. This blog is where we share release notes, installation guides, and the occasional deep-dive into what makes cuDNN tick.
Check out the Installation Guides in the sidebar to get started, or read through the latest release notes below.
Latest Releases
-
cuDNN Frontend v1.22.1
PyTorch custom op for MoE Grouped GEMM, Blackwell SDPA forward kernel (head dim 256), and weight-gradient kernels.
-
cuDNN Frontend v1.22.0
PyTorch custom op for SDPA, preindexed execute, Blackwell bprop kernels, and Grouped GEMM improvements.
-
cuDNN Frontend v1.21.0
No more CUDA driver dependency, plus a wave of new Grouped GEMM fusion kernels for MoE workloads.
-
cuDNN Frontend v1.20.0
Fused RMSNorm + SiLU kernel for B200, GB300 support for GEMM kernels, and reproducer tool improvements.
-
cuDNN Frontend v1.19.1
Hotfix restoring CUDA 12 support, plus the full v1.19.0 feature set including open-source SDPA kernels.