Examples#
This page links to the cuda.bindings examples shipped in the
cuda-python repository.
Use it as a quick index when you want a runnable sample for a specific API area
or CUDA feature.
Introduction#
clock_nvrtc_test.py uses NVRTC-compiled CUDA code and the device clock to time a reduction kernel.
simpleCubemapTexture_test.py demonstrates cubemap texture sampling and transformation.
simpleP2P_test.py shows peer-to-peer memory access and transfers between multiple GPUs.
simpleZeroCopy_test.py uses zero-copy mapped host memory for vector addition.
systemWideAtomics_test.py demonstrates system-wide atomic operations on managed memory.
vectorAddDrv_test.py uses the CUDA Driver API and unified virtual addressing for vector addition.
vectorAddMMAP_test.py uses virtual memory management APIs such as
cuMemCreateandcuMemMapfor vector addition.
Concepts and techniques#
streamOrderedAllocation_test.py demonstrates
cudaMallocAsyncandcudaFreeAsynctogether with memory-pool release thresholds.
CUDA features#
globalToShmemAsyncCopy_test.py compares asynchronous global-to-shared-memory copy strategies in matrix multiplication kernels.
simpleCudaGraphs_test.py shows both manual CUDA graph construction and stream-capture-based replay.
Libraries and tools#
conjugateGradientMultiBlockCG_test.py implements a conjugate-gradient solver with cooperative groups and multi-block synchronization.
nvidia_smi.py uses NVML to implement a Python subset of
nvidia-smi.
Advanced and interoperability#
isoFDModelling_test.py runs isotropic finite-difference wave propagation across multiple GPUs with peer-to-peer halo exchange.
jit_program_test.py JIT-compiles a SAXPY kernel with NVRTC and launches it through the Driver API.
numba_emm_plugin.py shows how to back Numba’s EMM interface with the NVIDIA CUDA Python Driver API.