Examples#
This page links to the cuda.bindings examples shipped in the
cuda-python repository.
Use it as a quick index when you want a runnable sample for a specific API area
or CUDA feature.
Introduction#
clock_nvrtc.py uses NVRTC-compiled CUDA code and the device clock to time a reduction kernel.
simple_cubemap_texture.py demonstrates cubemap texture sampling and transformation.
simple_p2p.py shows peer-to-peer memory access and transfers between multiple GPUs.
simple_zero_copy.py uses zero-copy mapped host memory for vector addition.
system_wide_atomics.py demonstrates system-wide atomic operations on managed memory.
vector_add_drv.py uses the CUDA Driver API and unified virtual addressing for vector addition.
vector_add_mmap.py uses virtual memory management APIs such as
cuMemCreateandcuMemMapfor vector addition.
Concepts and techniques#
stream_ordered_allocation.py demonstrates
cudaMallocAsyncandcudaFreeAsynctogether with memory-pool release thresholds.
CUDA features#
global_to_shmem_async_copy.py compares asynchronous global-to-shared-memory copy strategies in matrix multiplication kernels.
simple_cuda_graphs.py shows both manual CUDA graph construction and stream-capture-based replay.
Libraries and tools#
conjugate_gradient_multi_block_cg.py implements a conjugate-gradient solver with cooperative groups and multi-block synchronization.
nvidia_smi.py uses NVML to implement a Python subset of
nvidia-smi.
Advanced and interoperability#
iso_fd_modelling.py runs isotropic finite-difference wave propagation across multiple GPUs with peer-to-peer halo exchange.
jit_program.py JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver API.
numba_emm_plugin.py shows how to back Numba’s EMM interface with the NVIDIA CUDA Python Driver API.