Welcome to The cuDNN Blog
Hey there, fellow GPU enthusiast! This is the unofficial-feeling, totally-official corner of the internet where we talk about cuDNN — NVIDIA’s library for accelerating deep learning primitives on GPUs.
Whether you’re training a massive transformer, fine-tuning a convolutional network, or just trying to get GPUs to go brrr, cuDNN is the engine under the hood making it happen. This blog is where we share release notes, installation guides, and the occasional deep-dive into what makes cuDNN tick.
Check out the Installation Guides in the sidebar to get started, or read through the latest release notes below.
Latest Releases
-
cuDNN Frontend v1.23.0
Causal Conv1d, expanded Graph API (transpose, strided slice, in-place concat, reshape modes, compile-time scalars), new open-source GEMM/MoE kernels, and Python 3.14t wheels.
-
Watch our talk on GPU MODE: design choices in cuDNN attention
We joined the GPU MODE channel to walk through the design choices behind cuDNN's attention kernels. Watch on YouTube.
-
The 128×4 Tiled Layout for Block Scaling Factors
How cuDNN expects MXFP8 and NVFP4 block scaling factors to be laid out in the 128×4 tiled format on Blackwell GPUs, and how to convert to/from row-major.
-
How Scales Are Applied in MXFP8 Attention
A deep dive into how cuDNN applies block-wise and fixed scaling in microscaling FP8 (MXFP8) attention for Blackwell GPUs.
-
cuDNN Backend Now Has Preview Releases
Try upcoming cuDNN backend features early with pip install --pre. Stable releases remain unchanged.
-
cuDNN Frontend v1.22.1
PyTorch custom op for MoE Grouped GEMM, Blackwell SDPA forward kernel (head dim 256), and weight-gradient kernels.
-
cuDNN Frontend v1.22.0
PyTorch custom op for SDPA, preindexed execute, Blackwell bprop kernels, and Grouped GEMM improvements.
-
cuDNN Frontend v1.21.0
No more CUDA driver dependency, plus a wave of new Grouped GEMM fusion kernels for MoE workloads.
-
cuDNN Frontend v1.20.0
Fused RMSNorm + SiLU kernel for B200, GB300 support for GEMM kernels, and reproducer tool improvements.
-
cuDNN Frontend v1.19.1
Hotfix restoring CUDA 12 support, plus the full v1.19.0 feature set including open-source SDPA kernels.