cuda#
Data types used by CUDA driver#
- class cuda.cuda.CUuuid_st(void_ptr _ptr=0)#
- bytes#
< CUDA definition of UUID
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmemFabricHandle_st(void_ptr _ptr=0)#
Fabric handle - An opaque handle representing a memory allocation that can be exported to processes in same or different nodes. For IPC between processes on different nodes they must be connected via the NVSwitch fabric.
- data#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUipcEventHandle_st(void_ptr _ptr=0)#
CUDA IPC event handle
- reserved#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUipcMemHandle_st(void_ptr _ptr=0)#
CUDA IPC mem handle
- reserved#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUstreamBatchMemOpParams_union(void_ptr _ptr=0)#
Per-operation parameters for cuStreamBatchMemOp
- operation#
- Type:
- waitValue#
- Type:
CUstreamMemOpWaitValueParams_st
- writeValue#
- Type:
CUstreamMemOpWriteValueParams_st
- flushRemoteWrites#
- Type:
CUstreamMemOpFlushRemoteWritesParams_st
- memoryBarrier#
- Type:
CUstreamMemOpMemoryBarrierParams_st
- pad#
- Type:
List[cuuint64_t]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_BATCH_MEM_OP_NODE_PARAMS_v1_st(void_ptr _ptr=0)#
-
- count#
- Type:
unsigned int
- paramArray#
- Type:
- flags#
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_BATCH_MEM_OP_NODE_PARAMS_v2_st(void_ptr _ptr=0)#
Batch memory operation node parameters
- count#
Number of operations in paramArray.
- Type:
unsigned int
- paramArray#
Array of batch memory operations.
- Type:
- flags#
Flags to control the node.
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUasyncNotificationInfo_st(void_ptr _ptr=0)#
Information passed to the user via the async notification callback
- type#
- Type:
- info#
- Type:
anon_union2
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUdevprop_st(void_ptr _ptr=0)#
Legacy device properties
- maxThreadsPerBlock#
Maximum number of threads per block
- Type:
int
- maxThreadsDim#
Maximum size of each dimension of a block
- Type:
List[int]
- maxGridSize#
Maximum size of each dimension of a grid
- Type:
List[int]
Shared memory available per block in bytes
- Type:
int
- totalConstantMemory#
Constant memory available on device in bytes
- Type:
int
- SIMDWidth#
Warp size in threads
- Type:
int
- memPitch#
Maximum pitch in bytes allowed by memory copies
- Type:
int
- regsPerBlock#
32-bit registers available per block
- Type:
int
- clockRate#
Clock frequency in kilohertz
- Type:
int
- textureAlign#
Alignment requirement for textures
- Type:
int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUaccessPolicyWindow_st(void_ptr _ptr=0)#
Specifies an access policy for a window, a contiguous extent of memory beginning at base_ptr and ending at base_ptr + num_bytes. num_bytes is limited by CU_DEVICE_ATTRIBUTE_MAX_ACCESS_POLICY_WINDOW_SIZE. Partition into many segments and assign segments such that: sum of “hit segments” / window == approx. ratio. sum of “miss segments” / window == approx 1-ratio. Segments and ratio specifications are fitted to the capabilities of the architecture. Accesses in a hit segment apply the hitProp access policy. Accesses in a miss segment apply the missProp access policy.
- base_ptr#
Starting address of the access policy window. CUDA driver may align it.
- Type:
Any
- num_bytes#
Size in bytes of the window policy. CUDA driver may restrict the maximum size and alignment.
- Type:
size_t
- hitRatio#
hitRatio specifies percentage of lines assigned hitProp, rest are assigned missProp.
- Type:
float
- hitProp#
CUaccessProperty set for hit.
- Type:
- missProp#
CUaccessProperty set for miss. Must be either NORMAL or STREAMING
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_KERNEL_NODE_PARAMS_st(void_ptr _ptr=0)#
GPU kernel node parameters
- func#
Kernel to launch
- Type:
- gridDimX#
Width of grid in blocks
- Type:
unsigned int
- gridDimY#
Height of grid in blocks
- Type:
unsigned int
- gridDimZ#
Depth of grid in blocks
- Type:
unsigned int
- blockDimX#
X dimension of each thread block
- Type:
unsigned int
- blockDimY#
Y dimension of each thread block
- Type:
unsigned int
- blockDimZ#
Z dimension of each thread block
- Type:
unsigned int
Dynamic shared-memory size per thread block in bytes
- Type:
unsigned int
- kernelParams#
Array of pointers to kernel parameters
- Type:
Any
- extra#
Extra options
- Type:
Any
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_KERNEL_NODE_PARAMS_v2_st(void_ptr _ptr=0)#
GPU kernel node parameters
- func#
Kernel to launch
- Type:
- gridDimX#
Width of grid in blocks
- Type:
unsigned int
- gridDimY#
Height of grid in blocks
- Type:
unsigned int
- gridDimZ#
Depth of grid in blocks
- Type:
unsigned int
- blockDimX#
X dimension of each thread block
- Type:
unsigned int
- blockDimY#
Y dimension of each thread block
- Type:
unsigned int
- blockDimZ#
Z dimension of each thread block
- Type:
unsigned int
Dynamic shared-memory size per thread block in bytes
- Type:
unsigned int
- kernelParams#
Array of pointers to kernel parameters
- Type:
Any
- extra#
Extra options
- Type:
Any
- ctx#
Context for the kernel task to run in. The value NULL will indicate the current context should be used by the api. This field is ignored if func is set.
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_KERNEL_NODE_PARAMS_v3_st(void_ptr _ptr=0)#
GPU kernel node parameters
- func#
Kernel to launch
- Type:
- gridDimX#
Width of grid in blocks
- Type:
unsigned int
- gridDimY#
Height of grid in blocks
- Type:
unsigned int
- gridDimZ#
Depth of grid in blocks
- Type:
unsigned int
- blockDimX#
X dimension of each thread block
- Type:
unsigned int
- blockDimY#
Y dimension of each thread block
- Type:
unsigned int
- blockDimZ#
Z dimension of each thread block
- Type:
unsigned int
Dynamic shared-memory size per thread block in bytes
- Type:
unsigned int
- kernelParams#
Array of pointers to kernel parameters
- Type:
Any
- extra#
Extra options
- Type:
Any
- ctx#
Context for the kernel task to run in. The value NULL will indicate the current context should be used by the api. This field is ignored if func is set.
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_MEMSET_NODE_PARAMS_st(void_ptr _ptr=0)#
Memset node parameters
- dst#
Destination device pointer
- Type:
- pitch#
Pitch of destination device pointer. Unused if height is 1
- Type:
size_t
- value#
Value to be set
- Type:
unsigned int
- elementSize#
Size of each element in bytes. Must be 1, 2, or 4.
- Type:
unsigned int
- width#
Width of the row in elements
- Type:
size_t
- height#
Number of rows
- Type:
size_t
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_MEMSET_NODE_PARAMS_v2_st(void_ptr _ptr=0)#
Memset node parameters
- dst#
Destination device pointer
- Type:
- pitch#
Pitch of destination device pointer. Unused if height is 1
- Type:
size_t
- value#
Value to be set
- Type:
unsigned int
- elementSize#
Size of each element in bytes. Must be 1, 2, or 4.
- Type:
unsigned int
- width#
Width of the row in elements
- Type:
size_t
- height#
Number of rows
- Type:
size_t
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_HOST_NODE_PARAMS_st(void_ptr _ptr=0)#
Host node parameters
- userData#
Argument to pass to the function
- Type:
Any
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_HOST_NODE_PARAMS_v2_st(void_ptr _ptr=0)#
Host node parameters
- userData#
Argument to pass to the function
- Type:
Any
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_CONDITIONAL_NODE_PARAMS(void_ptr _ptr=0)#
Conditional node parameters
- handle#
Conditional node handle. Handles must be created in advance of creating the node using cuGraphConditionalHandleCreate.
- Type:
- type#
Type of conditional node.
- size#
Size of graph output array. Must be 1.
- Type:
unsigned int
- phGraph_out#
CUDA-owned array populated with conditional node child graphs during creation of the node. Valid for the lifetime of the conditional node. The contents of the graph(s) are subject to the following constraints: - Allowed node types are kernel nodes, empty nodes, child graphs, memsets, memcopies, and conditionals. This applies recursively to child graphs and conditional bodies. - All kernels, including kernels in nested conditionals or child graphs at any level, must belong to the same CUDA context. These graphs may be populated using graph node creation APIs or cuStreamBeginCaptureToGraph.
- Type:
- ctx#
Context on which to run the node. Must match context used to create the handle and all body nodes.
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUgraphEdgeData_st(void_ptr _ptr=0)#
Optional annotation for edges in a CUDA graph. Note, all edges implicitly have annotations and default to a zero-initialized value if not specified. A zero-initialized struct indicates a standard full serialization of two nodes with memory visibility.
- from_port#
This indicates when the dependency is triggered from the upstream node on the edge. The meaning is specfic to the node type. A value of 0 in all cases means full completion of the upstream node, with memory visibility to the downstream node or portion thereof (indicated by to_port). Only kernel nodes define non-zero ports. A kernel node can use the following output port types: CU_GRAPH_KERNEL_NODE_PORT_DEFAULT, CU_GRAPH_KERNEL_NODE_PORT_PROGRAMMATIC, or CU_GRAPH_KERNEL_NODE_PORT_LAUNCH_ORDER.
- Type:
bytes
- to_port#
This indicates what portion of the downstream node is dependent on the upstream node or portion thereof (indicated by from_port). The meaning is specific to the node type. A value of 0 in all cases means the entirety of the downstream node is dependent on the upstream work. Currently no node types define non-zero ports. Accordingly, this field must be set to zero.
- Type:
bytes
- type#
This should be populated with a value from CUgraphDependencyType. (It is typed as char due to compiler-specific layout of bitfields.) See CUgraphDependencyType.
- Type:
bytes
- reserved#
These bytes are unused and must be zeroed. This ensures compatibility if additional fields are added in the future.
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_GRAPH_INSTANTIATE_PARAMS_st(void_ptr _ptr=0)#
Graph instantiation parameters
- flags#
Instantiation flags
- Type:
cuuint64_t
- hErrNode_out#
The node which caused instantiation to fail, if any
- Type:
- result_out#
Whether instantiation was successful. If it failed, the reason why
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUlaunchMemSyncDomainMap_st(void_ptr _ptr=0)#
Memory Synchronization Domain map See ::cudaLaunchMemSyncDomain. By default, kernels are launched in domain 0. Kernel launched with CU_LAUNCH_MEM_SYNC_DOMAIN_REMOTE will have a different domain ID. User may also alter the domain ID with CUlaunchMemSyncDomainMap for a specific stream / graph node / kernel launch. See CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN_MAP. Domain ID range is available through CU_DEVICE_ATTRIBUTE_MEM_SYNC_DOMAIN_COUNT.
- default_#
The default domain ID to use for designated kernels
- Type:
bytes
- remote#
The remote domain ID to use for designated kernels
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUlaunchAttributeValue_union(void_ptr _ptr=0)#
Launch attributes union; used as value field of CUlaunchAttribute
- pad#
- Type:
bytes
- accessPolicyWindow#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW.
- Type:
- cooperative#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_COOPERATIVE. Nonzero indicates a cooperative kernel (see cuLaunchCooperativeKernel).
- Type:
int
- syncPolicy#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY. ::CUsynchronizationPolicy for work queued up in this stream
- Type:
- clusterDim#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION that represents the desired cluster dimensions for the kernel. Opaque type with the following fields: - x - The X dimension of the cluster, in blocks. Must be a divisor of the grid X dimension. - y - The Y dimension of the cluster, in blocks. Must be a divisor of the grid Y dimension. - z - The Z dimension of the cluster, in blocks. Must be a divisor of the grid Z dimension.
- Type:
anon_struct1
- clusterSchedulingPolicyPreference#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE. Cluster scheduling policy preference for the kernel.
- programmaticStreamSerializationAllowed#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION.
- Type:
int
- programmaticEvent#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_EVENT with the following fields: - CUevent event - Event to fire when all blocks trigger it. - Event record flags, see cuEventRecordWithFlags. Does not accept :CU_EVENT_RECORD_EXTERNAL. - triggerAtBlockStart - If this is set to non-0, each block launch will automatically trigger the event.
- Type:
anon_struct2
- launchCompletionEvent#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_LAUNCH_COMPLETION_EVENT with the following fields: - CUevent event - Event to fire when the last block launches - int flags; - Event record flags, see cuEventRecordWithFlags. Does not accept CU_EVENT_RECORD_EXTERNAL.
- Type:
anon_struct3
- priority#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_PRIORITY. Execution priority of the kernel.
- Type:
int
- memSyncDomainMap#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN_MAP. See CUlaunchMemSyncDomainMap.
- Type:
- memSyncDomain#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN. See::CUlaunchMemSyncDomain
- Type:
- deviceUpdatableKernelNode#
Value of launch attribute CU_LAUNCH_ATTRIBUTE_DEVICE_UPDATABLE_KERNEL_NODE. with the following fields: - int deviceUpdatable - Whether or not the resulting kernel node should be device-updatable. - CUgraphDeviceNode devNode - Returns a handle to pass to the various device-side update functions.
- Type:
anon_struct4
Value of launch attribute CU_LAUNCH_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT.
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUlaunchAttribute_st(void_ptr _ptr=0)#
Launch attribute
- id#
Attribute to set
- Type:
- value#
Value of the attribute
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUlaunchConfig_st(void_ptr _ptr=0)#
CUDA extensible launch configuration
- gridDimX#
Width of grid in blocks
- Type:
unsigned int
- gridDimY#
Height of grid in blocks
- Type:
unsigned int
- gridDimZ#
Depth of grid in blocks
- Type:
unsigned int
- blockDimX#
X dimension of each thread block
- Type:
unsigned int
- blockDimY#
Y dimension of each thread block
- Type:
unsigned int
- blockDimZ#
Z dimension of each thread block
- Type:
unsigned int
Dynamic shared-memory size per thread block in bytes
- Type:
unsigned int
- attrs#
List of attributes; nullable if CUlaunchConfig::numAttrs == 0
- Type:
- numAttrs#
Number of attributes populated in CUlaunchConfig::attrs
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUexecAffinitySmCount_st(void_ptr _ptr=0)#
Value for CU_EXEC_AFFINITY_TYPE_SM_COUNT
- val#
The number of SMs the context is limited to use.
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUexecAffinityParam_st(void_ptr _ptr=0)#
Execution Affinity Parameters
- type#
- Type:
- param#
- Type:
anon_union3
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUctxCigParam_st(void_ptr _ptr=0)#
CIG Context Create Params
- Type:
- Type:
Any
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUctxCreateParams_st(void_ptr _ptr=0)#
Params for creating CUDA context Exactly one of execAffinityParams and cigParams must be non-NULL.
- execAffinityParams#
- Type:
- numExecAffinityParams#
- Type:
int
- cigParams#
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUlibraryHostUniversalFunctionAndDataTable_st(void_ptr _ptr=0)#
- functionTable#
- Type:
Any
- functionWindowSize#
- Type:
size_t
- dataTable#
- Type:
Any
- dataWindowSize#
- Type:
size_t
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_MEMCPY2D_st(void_ptr _ptr=0)#
2D memory copy parameters
- srcXInBytes#
Source X in bytes
- Type:
size_t
- srcY#
Source Y
- Type:
size_t
- srcMemoryType#
Source memory type (host, device, array)
- Type:
- srcHost#
Source host pointer
- Type:
Any
- srcDevice#
Source device pointer
- Type:
- srcPitch#
Source pitch (ignored when src is array)
- Type:
size_t
- dstXInBytes#
Destination X in bytes
- Type:
size_t
- dstY#
Destination Y
- Type:
size_t
- dstMemoryType#
Destination memory type (host, device, array)
- Type:
- dstHost#
Destination host pointer
- Type:
Any
- dstDevice#
Destination device pointer
- Type:
- dstPitch#
Destination pitch (ignored when dst is array)
- Type:
size_t
- WidthInBytes#
Width of 2D memory copy in bytes
- Type:
size_t
- Height#
Height of 2D memory copy
- Type:
size_t
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_MEMCPY3D_st(void_ptr _ptr=0)#
3D memory copy parameters
- srcXInBytes#
Source X in bytes
- Type:
size_t
- srcY#
Source Y
- Type:
size_t
- srcZ#
Source Z
- Type:
size_t
- srcLOD#
Source LOD
- Type:
size_t
- srcMemoryType#
Source memory type (host, device, array)
- Type:
- srcHost#
Source host pointer
- Type:
Any
- srcDevice#
Source device pointer
- Type:
- reserved0#
Must be NULL
- Type:
Any
- srcPitch#
Source pitch (ignored when src is array)
- Type:
size_t
- srcHeight#
Source height (ignored when src is array; may be 0 if Depth==1)
- Type:
size_t
- dstXInBytes#
Destination X in bytes
- Type:
size_t
- dstY#
Destination Y
- Type:
size_t
- dstZ#
Destination Z
- Type:
size_t
- dstLOD#
Destination LOD
- Type:
size_t
- dstMemoryType#
Destination memory type (host, device, array)
- Type:
- dstHost#
Destination host pointer
- Type:
Any
- dstDevice#
Destination device pointer
- Type:
- reserved1#
Must be NULL
- Type:
Any
- dstPitch#
Destination pitch (ignored when dst is array)
- Type:
size_t
- dstHeight#
Destination height (ignored when dst is array; may be 0 if Depth==1)
- Type:
size_t
- WidthInBytes#
Width of 3D memory copy in bytes
- Type:
size_t
- Height#
Height of 3D memory copy
- Type:
size_t
- Depth#
Depth of 3D memory copy
- Type:
size_t
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_MEMCPY3D_PEER_st(void_ptr _ptr=0)#
3D memory cross-context copy parameters
- srcXInBytes#
Source X in bytes
- Type:
size_t
- srcY#
Source Y
- Type:
size_t
- srcZ#
Source Z
- Type:
size_t
- srcLOD#
Source LOD
- Type:
size_t
- srcMemoryType#
Source memory type (host, device, array)
- Type:
- srcHost#
Source host pointer
- Type:
Any
- srcDevice#
Source device pointer
- Type:
- srcPitch#
Source pitch (ignored when src is array)
- Type:
size_t
- srcHeight#
Source height (ignored when src is array; may be 0 if Depth==1)
- Type:
size_t
- dstXInBytes#
Destination X in bytes
- Type:
size_t
- dstY#
Destination Y
- Type:
size_t
- dstZ#
Destination Z
- Type:
size_t
- dstLOD#
Destination LOD
- Type:
size_t
- dstMemoryType#
Destination memory type (host, device, array)
- Type:
- dstHost#
Destination host pointer
- Type:
Any
- dstDevice#
Destination device pointer
- Type:
- dstPitch#
Destination pitch (ignored when dst is array)
- Type:
size_t
- dstHeight#
Destination height (ignored when dst is array; may be 0 if Depth==1)
- Type:
size_t
- WidthInBytes#
Width of 3D memory copy in bytes
- Type:
size_t
- Height#
Height of 3D memory copy
- Type:
size_t
- Depth#
Depth of 3D memory copy
- Type:
size_t
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_MEMCPY_NODE_PARAMS_st(void_ptr _ptr=0)#
Memcpy node parameters
- flags#
Must be zero
- Type:
int
- reserved#
Must be zero
- Type:
int
- copyParams#
Parameters for the memory copy
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_ARRAY_DESCRIPTOR_st(void_ptr _ptr=0)#
Array descriptor
- Width#
Width of array
- Type:
size_t
- Height#
Height of array
- Type:
size_t
- Format#
Array format
- Type:
- NumChannels#
Channels per array element
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_ARRAY3D_DESCRIPTOR_st(void_ptr _ptr=0)#
3D array descriptor
- Width#
Width of 3D array
- Type:
size_t
- Height#
Height of 3D array
- Type:
size_t
- Depth#
Depth of 3D array
- Type:
size_t
- Format#
Array format
- Type:
- NumChannels#
Channels per array element
- Type:
unsigned int
- Flags#
Flags
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_ARRAY_SPARSE_PROPERTIES_st(void_ptr _ptr=0)#
CUDA array sparse properties
- tileExtent#
- Type:
anon_struct5
- miptailFirstLevel#
First mip level at which the mip tail begins.
- Type:
unsigned int
- miptailSize#
Total size of the mip tail.
- Type:
unsigned long long
- flags#
Flags will either be zero or CU_ARRAY_SPARSE_PROPERTIES_SINGLE_MIPTAIL
- Type:
unsigned int
- reserved#
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_ARRAY_MEMORY_REQUIREMENTS_st(void_ptr _ptr=0)#
CUDA array memory requirements
- size#
Total required memory size
- Type:
size_t
- alignment#
alignment requirement
- Type:
size_t
- reserved#
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_RESOURCE_DESC_st(void_ptr _ptr=0)#
CUDA Resource descriptor
- resType#
Resource type
- Type:
- res#
- Type:
anon_union4
- flags#
Flags (must be zero)
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_TEXTURE_DESC_st(void_ptr _ptr=0)#
Texture descriptor
- addressMode#
Address modes
- Type:
List[CUaddress_mode]
- filterMode#
Filter mode
- Type:
- flags#
Flags
- Type:
unsigned int
- maxAnisotropy#
Maximum anisotropy ratio
- Type:
unsigned int
- mipmapFilterMode#
Mipmap filter mode
- Type:
- mipmapLevelBias#
Mipmap level bias
- Type:
float
- minMipmapLevelClamp#
Mipmap minimum level clamp
- Type:
float
- maxMipmapLevelClamp#
Mipmap maximum level clamp
- Type:
float
- borderColor#
Border Color
- Type:
List[float]
- reserved#
- Type:
List[int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_RESOURCE_VIEW_DESC_st(void_ptr _ptr=0)#
Resource view descriptor
- format#
Resource view format
- Type:
- width#
Width of the resource view
- Type:
size_t
- height#
Height of the resource view
- Type:
size_t
- depth#
Depth of the resource view
- Type:
size_t
- firstMipmapLevel#
First defined mipmap level
- Type:
unsigned int
- lastMipmapLevel#
Last defined mipmap level
- Type:
unsigned int
- firstLayer#
First layer index
- Type:
unsigned int
- lastLayer#
Last layer index
- Type:
unsigned int
- reserved#
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUtensorMap_st(void_ptr _ptr=0)#
Tensor map descriptor. Requires compiler support for aligning to 64 bytes.
- opaque#
- Type:
List[cuuint64_t]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_POINTER_ATTRIBUTE_P2P_TOKENS_st(void_ptr _ptr=0)#
GPU Direct v3 tokens
- p2pToken#
- Type:
unsigned long long
- vaSpaceToken#
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_LAUNCH_PARAMS_st(void_ptr _ptr=0)#
Kernel launch parameters
- function#
Kernel to launch
- Type:
- gridDimX#
Width of grid in blocks
- Type:
unsigned int
- gridDimY#
Height of grid in blocks
- Type:
unsigned int
- gridDimZ#
Depth of grid in blocks
- Type:
unsigned int
- blockDimX#
X dimension of each thread block
- Type:
unsigned int
- blockDimY#
Y dimension of each thread block
- Type:
unsigned int
- blockDimZ#
Z dimension of each thread block
- Type:
unsigned int
Dynamic shared-memory size per thread block in bytes
- Type:
unsigned int
- kernelParams#
Array of pointers to kernel parameters
- Type:
Any
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXTERNAL_MEMORY_HANDLE_DESC_st(void_ptr _ptr=0)#
External memory handle descriptor
- type#
Type of the handle
- handle#
- Type:
anon_union5
- size#
Size of the memory allocation
- Type:
unsigned long long
- flags#
Flags must either be zero or CUDA_EXTERNAL_MEMORY_DEDICATED
- Type:
unsigned int
- reserved#
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXTERNAL_MEMORY_BUFFER_DESC_st(void_ptr _ptr=0)#
External memory buffer descriptor
- offset#
Offset into the memory object where the buffer’s base is
- Type:
unsigned long long
- size#
Size of the buffer
- Type:
unsigned long long
- flags#
Flags reserved for future use. Must be zero.
- Type:
unsigned int
- reserved#
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXTERNAL_MEMORY_MIPMAPPED_ARRAY_DESC_st(void_ptr _ptr=0)#
External memory mipmap descriptor
- offset#
Offset into the memory object where the base level of the mipmap chain is.
- Type:
unsigned long long
- arrayDesc#
Format, dimension and type of base level of the mipmap chain
- Type:
- numLevels#
Total number of levels in the mipmap chain
- Type:
unsigned int
- reserved#
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXTERNAL_SEMAPHORE_HANDLE_DESC_st(void_ptr _ptr=0)#
External semaphore handle descriptor
- type#
Type of the handle
- handle#
- Type:
anon_union6
- flags#
Flags reserved for the future. Must be zero.
- Type:
unsigned int
- reserved#
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXTERNAL_SEMAPHORE_SIGNAL_PARAMS_st(void_ptr _ptr=0)#
External semaphore signal parameters
- params#
- Type:
anon_struct15
- flags#
Only when ::CUDA_EXTERNAL_SEMAPHORE_SIGNAL_PARAMS is used to signal a CUexternalSemaphore of type CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_NVSCISYNC, the valid flag is CUDA_EXTERNAL_SEMAPHORE_SIGNAL_SKIP_NVSCIBUF_MEMSYNC which indicates that while signaling the CUexternalSemaphore, no memory synchronization operations should be performed for any external memory object imported as CU_EXTERNAL_MEMORY_HANDLE_TYPE_NVSCIBUF. For all other types of CUexternalSemaphore, flags must be zero.
- Type:
unsigned int
- reserved#
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXTERNAL_SEMAPHORE_WAIT_PARAMS_st(void_ptr _ptr=0)#
External semaphore wait parameters
- params#
- Type:
anon_struct18
- flags#
Only when ::CUDA_EXTERNAL_SEMAPHORE_WAIT_PARAMS is used to wait on a CUexternalSemaphore of type CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_NVSCISYNC, the valid flag is CUDA_EXTERNAL_SEMAPHORE_WAIT_SKIP_NVSCIBUF_MEMSYNC which indicates that while waiting for the CUexternalSemaphore, no memory synchronization operations should be performed for any external memory object imported as CU_EXTERNAL_MEMORY_HANDLE_TYPE_NVSCIBUF. For all other types of CUexternalSemaphore, flags must be zero.
- Type:
unsigned int
- reserved#
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXT_SEM_SIGNAL_NODE_PARAMS_st(void_ptr _ptr=0)#
Semaphore signal node parameters
- extSemArray#
Array of external semaphore handles.
- Type:
- paramsArray#
Array of external semaphore signal parameters.
- numExtSems#
Number of handles and parameters supplied in extSemArray and paramsArray.
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXT_SEM_SIGNAL_NODE_PARAMS_v2_st(void_ptr _ptr=0)#
Semaphore signal node parameters
- extSemArray#
Array of external semaphore handles.
- Type:
- paramsArray#
Array of external semaphore signal parameters.
- numExtSems#
Number of handles and parameters supplied in extSemArray and paramsArray.
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXT_SEM_WAIT_NODE_PARAMS_st(void_ptr _ptr=0)#
Semaphore wait node parameters
- extSemArray#
Array of external semaphore handles.
- Type:
- paramsArray#
Array of external semaphore wait parameters.
- numExtSems#
Number of handles and parameters supplied in extSemArray and paramsArray.
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EXT_SEM_WAIT_NODE_PARAMS_v2_st(void_ptr _ptr=0)#
Semaphore wait node parameters
- extSemArray#
Array of external semaphore handles.
- Type:
- paramsArray#
Array of external semaphore wait parameters.
- numExtSems#
Number of handles and parameters supplied in extSemArray and paramsArray.
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUarrayMapInfo_st(void_ptr _ptr=0)#
Specifies the CUDA array or CUDA mipmapped array memory mapping information
- resourceType#
Resource type
- Type:
- resource#
- Type:
anon_union9
- subresourceType#
Sparse subresource type
- subresource#
- Type:
anon_union10
- memOperationType#
Memory operation type
- Type:
- memHandleType#
Memory handle type
- Type:
- memHandle#
- Type:
anon_union11
- offset#
Offset within mip tail Offset within the memory
- Type:
unsigned long long
- deviceBitMask#
Device ordinal bit mask
- Type:
unsigned int
- flags#
flags for future use, must be zero now.
- Type:
unsigned int
- reserved#
Reserved for future use, must be zero now.
- Type:
List[unsigned int]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmemLocation_st(void_ptr _ptr=0)#
Specifies a memory location.
- type#
Specifies the location type, which modifies the meaning of id.
- Type:
- id#
identifier for a given this location’s CUmemLocationType.
- Type:
int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmemAllocationProp_st(void_ptr _ptr=0)#
Specifies the allocation properties for a allocation.
- type#
Allocation type
- Type:
- requestedHandleTypes#
requested CUmemAllocationHandleType
- location#
Location of allocation
- Type:
- win32HandleMetaData#
Windows-specific POBJECT_ATTRIBUTES required when CU_MEM_HANDLE_TYPE_WIN32 is specified. This object attributes structure includes security attributes that define the scope of which exported allocations may be transferred to other processes. In all other cases, this field is required to be zero.
- Type:
Any
- allocFlags#
- Type:
anon_struct21
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmulticastObjectProp_st(void_ptr _ptr=0)#
Specifies the properties for a multicast object.
- numDevices#
The number of devices in the multicast team that will bind memory to this object
- Type:
unsigned int
- size#
The maximum amount of memory that can be bound to this multicast object per device
- Type:
size_t
- handleTypes#
Bitmask of exportable handle types (see CUmemAllocationHandleType) for this object
- Type:
unsigned long long
- flags#
Flags for future use, must be zero now
- Type:
unsigned long long
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmemAccessDesc_st(void_ptr _ptr=0)#
Memory access descriptor
- location#
Location on which the request is to change it’s accessibility
- Type:
- flags#
::CUmemProt accessibility flags to set on the request
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUgraphExecUpdateResultInfo_st(void_ptr _ptr=0)#
Result information returned by cuGraphExecUpdate
- result#
Gives more specific detail when a cuda graph update fails.
- Type:
- errorNode#
The “to node” of the error edge when the topologies do not match. The error node when the error is associated with a specific node. NULL when the error is generic.
- Type:
- errorFromNode#
The from node of error edge when the topologies do not match. Otherwise NULL.
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmemPoolProps_st(void_ptr _ptr=0)#
Specifies the properties of allocations made from the pool.
- allocType#
Allocation type. Currently must be specified as CU_MEM_ALLOCATION_TYPE_PINNED
- Type:
- handleTypes#
Handle types that will be supported by allocations from the pool.
- location#
Location where allocations should reside.
- Type:
- win32SecurityAttributes#
Windows-specific LPSECURITYATTRIBUTES required when CU_MEM_HANDLE_TYPE_WIN32 is specified. This security attribute defines the scope of which exported allocations may be transferred to other processes. In all other cases, this field is required to be zero.
- Type:
Any
- maxSize#
Maximum pool size. When set to 0, defaults to a system dependent value.
- Type:
size_t
- usage#
Bitmask indicating intended usage for the pool.
- Type:
unsigned short
- reserved#
reserved for future use, must be 0
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmemPoolPtrExportData_st(void_ptr _ptr=0)#
Opaque data for exporting a pool allocation
- reserved#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_MEM_ALLOC_NODE_PARAMS_v1_st(void_ptr _ptr=0)#
Memory allocation node parameters
- poolProps#
in: location where the allocation should reside (specified in ::location). ::handleTypes must be CU_MEM_HANDLE_TYPE_NONE. IPC is not supported.
- Type:
- accessDescs#
in: array of memory access descriptors. Used to describe peer GPU access
- Type:
- accessDescCount#
in: number of memory access descriptors. Must not exceed the number of GPUs.
- Type:
size_t
- bytesize#
in: size in bytes of the requested allocation
- Type:
size_t
- dptr#
out: address of the allocation returned by CUDA
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_MEM_ALLOC_NODE_PARAMS_v2_st(void_ptr _ptr=0)#
Memory allocation node parameters
- poolProps#
in: location where the allocation should reside (specified in ::location). ::handleTypes must be CU_MEM_HANDLE_TYPE_NONE. IPC is not supported.
- Type:
- accessDescs#
in: array of memory access descriptors. Used to describe peer GPU access
- Type:
- accessDescCount#
in: number of memory access descriptors. Must not exceed the number of GPUs.
- Type:
size_t
- bytesize#
in: size in bytes of the requested allocation
- Type:
size_t
- dptr#
out: address of the allocation returned by CUDA
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_MEM_FREE_NODE_PARAMS_st(void_ptr _ptr=0)#
Memory free node parameters
- dptr#
in: the pointer to free
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_CHILD_GRAPH_NODE_PARAMS_st(void_ptr _ptr=0)#
Child graph node parameters
- graph#
The child graph to clone into the node for node creation, or a handle to the graph owned by the node for node query
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EVENT_RECORD_NODE_PARAMS_st(void_ptr _ptr=0)#
Event record node parameters
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_EVENT_WAIT_NODE_PARAMS_st(void_ptr _ptr=0)#
Event wait node parameters
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUgraphNodeParams_st(void_ptr _ptr=0)#
Graph node parameters. See cuGraphAddNode.
- type#
Type of the node
- Type:
- reserved0#
Reserved. Must be zero.
- Type:
List[int]
- reserved1#
Padding. Unused bytes must be zero.
- Type:
List[long long]
- kernel#
Kernel node parameters.
- memcpy#
Memcpy node parameters.
- Type:
- memset#
Memset node parameters.
- host#
Host node parameters.
- Type:
- graph#
Child graph node parameters.
- eventWait#
Event wait node parameters.
- eventRecord#
Event record node parameters.
- extSemSignal#
External semaphore signal node parameters.
- extSemWait#
External semaphore wait node parameters.
- alloc#
Memory allocation node parameters.
- free#
Memory free node parameters.
- memOp#
MemOp node parameters.
- conditional#
Conditional node parameters.
- reserved2#
Reserved bytes. Must be zero.
- Type:
long long
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUeglFrame_st(void_ptr _ptr=0)#
CUDA EGLFrame structure Descriptor - structure defining one frame of EGL. Each frame may contain one or more planes depending on whether the surface * is Multiplanar or not.
- frame#
- Type:
anon_union14
- width#
Width of first plane
- Type:
unsigned int
- height#
Height of first plane
- Type:
unsigned int
- depth#
Depth of first plane
- Type:
unsigned int
- pitch#
Pitch of first plane
- Type:
unsigned int
- planeCount#
Number of planes
- Type:
unsigned int
- numChannels#
Number of channels for the plane
- Type:
unsigned int
- frameType#
Array or Pitch
- Type:
- eglColorFormat#
CUDA EGL Color Format
- Type:
- cuFormat#
CUDA Array Format
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUipcMem_flags(value)#
CUDA Ipc Mem Flags
- CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS = 1#
Automatically enable peer access between remote devices as needed
- class cuda.cuda.CUmemAttach_flags(value)#
CUDA Mem Attach Flags
- CU_MEM_ATTACH_GLOBAL = 1#
Memory can be accessed by any stream on any device
- CU_MEM_ATTACH_HOST = 2#
Memory cannot be accessed by any stream on any device
- CU_MEM_ATTACH_SINGLE = 4#
Memory can only be accessed by a single stream on the associated device
- class cuda.cuda.CUctx_flags(value)#
Context creation flags
- CU_CTX_SCHED_AUTO = 0#
Automatic scheduling
- CU_CTX_SCHED_SPIN = 1#
Set spin as default scheduling
- CU_CTX_SCHED_YIELD = 2#
Set yield as default scheduling
- CU_CTX_SCHED_BLOCKING_SYNC = 4#
Set blocking synchronization as default scheduling
- CU_CTX_BLOCKING_SYNC = 4#
Set blocking synchronization as default scheduling [Deprecated]
- CU_CTX_SCHED_MASK = 7#
- CU_CTX_MAP_HOST = 8#
[Deprecated]
- CU_CTX_LMEM_RESIZE_TO_MAX = 16#
Keep local memory allocation after launch
- CU_CTX_COREDUMP_ENABLE = 32#
Trigger coredumps from exceptions in this context
- CU_CTX_USER_COREDUMP_ENABLE = 64#
Enable user pipe to trigger coredumps in this context
- CU_CTX_SYNC_MEMOPS = 128#
Ensure synchronous memory operations on this context will synchronize
- CU_CTX_FLAGS_MASK = 255#
- class cuda.cuda.CUevent_sched_flags(value)#
Event sched flags
- CU_EVENT_SCHED_AUTO = 0#
Automatic scheduling
- CU_EVENT_SCHED_SPIN = 1#
Set spin as default scheduling
- CU_EVENT_SCHED_YIELD = 2#
Set yield as default scheduling
- CU_EVENT_SCHED_BLOCKING_SYNC = 4#
Set blocking synchronization as default scheduling
- class cuda.cuda.cl_event_flags(value)#
NVCL event scheduling flags
- NVCL_EVENT_SCHED_AUTO = 0#
Automatic scheduling
- NVCL_EVENT_SCHED_SPIN = 1#
Set spin as default scheduling
- NVCL_EVENT_SCHED_YIELD = 2#
Set yield as default scheduling
- NVCL_EVENT_SCHED_BLOCKING_SYNC = 4#
Set blocking synchronization as default scheduling
- class cuda.cuda.cl_context_flags(value)#
NVCL context scheduling flags
- NVCL_CTX_SCHED_AUTO = 0#
Automatic scheduling
- NVCL_CTX_SCHED_SPIN = 1#
Set spin as default scheduling
- NVCL_CTX_SCHED_YIELD = 2#
Set yield as default scheduling
- NVCL_CTX_SCHED_BLOCKING_SYNC = 4#
Set blocking synchronization as default scheduling
- class cuda.cuda.CUstream_flags(value)#
Stream creation flags
- CU_STREAM_DEFAULT = 0#
Default stream flag
- CU_STREAM_NON_BLOCKING = 1#
Stream does not synchronize with stream 0 (the NULL stream)
- class cuda.cuda.CUevent_flags(value)#
Event creation flags
- CU_EVENT_DEFAULT = 0#
Default event flag
- CU_EVENT_BLOCKING_SYNC = 1#
Event uses blocking synchronization
- CU_EVENT_DISABLE_TIMING = 2#
Event will not record timing data
- CU_EVENT_INTERPROCESS = 4#
Event is suitable for interprocess use. CU_EVENT_DISABLE_TIMING must be set
- class cuda.cuda.CUevent_record_flags(value)#
Event record flags
- CU_EVENT_RECORD_DEFAULT = 0#
Default event record flag
- CU_EVENT_RECORD_EXTERNAL = 1#
When using stream capture, create an event record node instead of the default behavior. This flag is invalid when used outside of capture.
- class cuda.cuda.CUevent_wait_flags(value)#
Event wait flags
- CU_EVENT_WAIT_DEFAULT = 0#
Default event wait flag
- CU_EVENT_WAIT_EXTERNAL = 1#
When using stream capture, create an event wait node instead of the default behavior. This flag is invalid when used outside of capture.
- class cuda.cuda.CUstreamWaitValue_flags(value)#
Flags for
cuStreamWaitValue32
andcuStreamWaitValue64
- CU_STREAM_WAIT_VALUE_GEQ = 0#
Wait until (int32_t)(*addr - value) >= 0 (or int64_t for 64 bit values). Note this is a cyclic comparison which ignores wraparound. (Default behavior.)
- CU_STREAM_WAIT_VALUE_NOR = 3#
Wait until ~(*addr | value) != 0. Support for this operation can be queried with
cuDeviceGetAttribute()
andCU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_WAIT_VALUE_NOR
.
- CU_STREAM_WAIT_VALUE_FLUSH = 1073741824#
Follow the wait operation with a flush of outstanding remote writes. This means that, if a remote write operation is guaranteed to have reached the device before the wait can be satisfied, that write is guaranteed to be visible to downstream device work. The device is permitted to reorder remote writes internally. For example, this flag would be required if two remote writes arrive in a defined order, the wait is satisfied by the second write, and downstream work needs to observe the first write. Support for this operation is restricted to selected platforms and can be queried with
CU_DEVICE_ATTRIBUTE_CAN_FLUSH_REMOTE_WRITES
.
- class cuda.cuda.CUstreamWriteValue_flags(value)#
Flags for
cuStreamWriteValue32
- CU_STREAM_WRITE_VALUE_DEFAULT = 0#
Default behavior
- CU_STREAM_WRITE_VALUE_NO_MEMORY_BARRIER = 1#
Permits the write to be reordered with writes which were issued before it, as a performance optimization. Normally,
cuStreamWriteValue32
will provide a memory fence before the write, which has similar semantics to __threadfence_system() but is scoped to the stream rather than a CUDA thread. This flag is not supported in the v2 API.
- class cuda.cuda.CUstreamBatchMemOpType(value)#
Operations for
cuStreamBatchMemOp
- CU_STREAM_MEM_OP_WAIT_VALUE_32 = 1#
Represents a
cuStreamWaitValue32
operation
- CU_STREAM_MEM_OP_WRITE_VALUE_32 = 2#
Represents a
cuStreamWriteValue32
operation
- CU_STREAM_MEM_OP_WAIT_VALUE_64 = 4#
Represents a
cuStreamWaitValue64
operation
- CU_STREAM_MEM_OP_WRITE_VALUE_64 = 5#
Represents a
cuStreamWriteValue64
operation
- CU_STREAM_MEM_OP_BARRIER = 6#
Insert a memory barrier of the specified type
- CU_STREAM_MEM_OP_FLUSH_REMOTE_WRITES = 3#
This has the same effect as
CU_STREAM_WAIT_VALUE_FLUSH
, but as a standalone operation.
- class cuda.cuda.CUstreamMemoryBarrier_flags(value)#
Flags for
cuStreamMemoryBarrier
- CU_STREAM_MEMORY_BARRIER_TYPE_SYS = 0#
System-wide memory barrier.
- CU_STREAM_MEMORY_BARRIER_TYPE_GPU = 1#
Limit memory barrier scope to the GPU.
- class cuda.cuda.CUoccupancy_flags(value)#
Occupancy calculator flag
- CU_OCCUPANCY_DEFAULT = 0#
Default behavior
- CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE = 1#
Assume global caching is enabled and cannot be automatically turned off
- class cuda.cuda.CUstreamUpdateCaptureDependencies_flags(value)#
Flags for
cuStreamUpdateCaptureDependencies
- CU_STREAM_ADD_CAPTURE_DEPENDENCIES = 0#
Add new nodes to the dependency set
- CU_STREAM_SET_CAPTURE_DEPENDENCIES = 1#
Replace the dependency set with the new nodes
- class cuda.cuda.CUasyncNotificationType(value)#
Types of async notification that can be sent
- CU_ASYNC_NOTIFICATION_TYPE_OVER_BUDGET = 1#
- class cuda.cuda.CUarray_format(value)#
Array formats
- CU_AD_FORMAT_UNSIGNED_INT8 = 1#
Unsigned 8-bit integers
- CU_AD_FORMAT_UNSIGNED_INT16 = 2#
Unsigned 16-bit integers
- CU_AD_FORMAT_UNSIGNED_INT32 = 3#
Unsigned 32-bit integers
- CU_AD_FORMAT_SIGNED_INT8 = 8#
Signed 8-bit integers
- CU_AD_FORMAT_SIGNED_INT16 = 9#
Signed 16-bit integers
- CU_AD_FORMAT_SIGNED_INT32 = 10#
Signed 32-bit integers
- CU_AD_FORMAT_HALF = 16#
16-bit floating point
- CU_AD_FORMAT_FLOAT = 32#
32-bit floating point
- CU_AD_FORMAT_NV12 = 176#
8-bit YUV planar format, with 4:2:0 sampling
- CU_AD_FORMAT_UNORM_INT8X1 = 192#
1 channel unsigned 8-bit normalized integer
- CU_AD_FORMAT_UNORM_INT8X2 = 193#
2 channel unsigned 8-bit normalized integer
- CU_AD_FORMAT_UNORM_INT8X4 = 194#
4 channel unsigned 8-bit normalized integer
- CU_AD_FORMAT_UNORM_INT16X1 = 195#
1 channel unsigned 16-bit normalized integer
- CU_AD_FORMAT_UNORM_INT16X2 = 196#
2 channel unsigned 16-bit normalized integer
- CU_AD_FORMAT_UNORM_INT16X4 = 197#
4 channel unsigned 16-bit normalized integer
- CU_AD_FORMAT_SNORM_INT8X1 = 198#
1 channel signed 8-bit normalized integer
- CU_AD_FORMAT_SNORM_INT8X2 = 199#
2 channel signed 8-bit normalized integer
- CU_AD_FORMAT_SNORM_INT8X4 = 200#
4 channel signed 8-bit normalized integer
- CU_AD_FORMAT_SNORM_INT16X1 = 201#
1 channel signed 16-bit normalized integer
- CU_AD_FORMAT_SNORM_INT16X2 = 202#
2 channel signed 16-bit normalized integer
- CU_AD_FORMAT_SNORM_INT16X4 = 203#
4 channel signed 16-bit normalized integer
- CU_AD_FORMAT_BC1_UNORM = 145#
4 channel unsigned normalized block-compressed (BC1 compression) format
- CU_AD_FORMAT_BC1_UNORM_SRGB = 146#
4 channel unsigned normalized block-compressed (BC1 compression) format with sRGB encoding
- CU_AD_FORMAT_BC2_UNORM = 147#
4 channel unsigned normalized block-compressed (BC2 compression) format
- CU_AD_FORMAT_BC2_UNORM_SRGB = 148#
4 channel unsigned normalized block-compressed (BC2 compression) format with sRGB encoding
- CU_AD_FORMAT_BC3_UNORM = 149#
4 channel unsigned normalized block-compressed (BC3 compression) format
- CU_AD_FORMAT_BC3_UNORM_SRGB = 150#
4 channel unsigned normalized block-compressed (BC3 compression) format with sRGB encoding
- CU_AD_FORMAT_BC4_UNORM = 151#
1 channel unsigned normalized block-compressed (BC4 compression) format
- CU_AD_FORMAT_BC4_SNORM = 152#
1 channel signed normalized block-compressed (BC4 compression) format
- CU_AD_FORMAT_BC5_UNORM = 153#
2 channel unsigned normalized block-compressed (BC5 compression) format
- CU_AD_FORMAT_BC5_SNORM = 154#
2 channel signed normalized block-compressed (BC5 compression) format
- CU_AD_FORMAT_BC6H_UF16 = 155#
3 channel unsigned half-float block-compressed (BC6H compression) format
- CU_AD_FORMAT_BC6H_SF16 = 156#
3 channel signed half-float block-compressed (BC6H compression) format
- CU_AD_FORMAT_BC7_UNORM = 157#
4 channel unsigned normalized block-compressed (BC7 compression) format
- CU_AD_FORMAT_BC7_UNORM_SRGB = 158#
4 channel unsigned normalized block-compressed (BC7 compression) format with sRGB encoding
- CU_AD_FORMAT_P010 = 159#
10-bit YUV planar format, with 4:2:0 sampling
- CU_AD_FORMAT_P016 = 161#
16-bit YUV planar format, with 4:2:0 sampling
- CU_AD_FORMAT_NV16 = 162#
8-bit YUV planar format, with 4:2:2 sampling
- CU_AD_FORMAT_P210 = 163#
10-bit YUV planar format, with 4:2:2 sampling
- CU_AD_FORMAT_P216 = 164#
16-bit YUV planar format, with 4:2:2 sampling
- CU_AD_FORMAT_YUY2 = 165#
2 channel, 8-bit YUV packed planar format, with 4:2:2 sampling
- CU_AD_FORMAT_Y210 = 166#
2 channel, 10-bit YUV packed planar format, with 4:2:2 sampling
- CU_AD_FORMAT_Y216 = 167#
2 channel, 16-bit YUV packed planar format, with 4:2:2 sampling
- CU_AD_FORMAT_AYUV = 168#
4 channel, 8-bit YUV packed planar format, with 4:4:4 sampling
- CU_AD_FORMAT_Y410 = 169#
10-bit YUV packed planar format, with 4:4:4 sampling
- CU_AD_FORMAT_Y416 = 177#
4 channel, 12-bit YUV packed planar format, with 4:4:4 sampling
- CU_AD_FORMAT_Y444_PLANAR8 = 178#
3 channel 8-bit YUV planar format, with 4:4:4 sampling
- CU_AD_FORMAT_Y444_PLANAR10 = 179#
3 channel 10-bit YUV planar format, with 4:4:4 sampling
- CU_AD_FORMAT_MAX = 2147483647#
- class cuda.cuda.CUaddress_mode(value)#
Texture reference addressing modes
- CU_TR_ADDRESS_MODE_WRAP = 0#
Wrapping address mode
- CU_TR_ADDRESS_MODE_CLAMP = 1#
Clamp to edge address mode
- CU_TR_ADDRESS_MODE_MIRROR = 2#
Mirror address mode
- CU_TR_ADDRESS_MODE_BORDER = 3#
Border address mode
- class cuda.cuda.CUfilter_mode(value)#
Texture reference filtering modes
- CU_TR_FILTER_MODE_POINT = 0#
Point filter mode
- CU_TR_FILTER_MODE_LINEAR = 1#
Linear filter mode
- class cuda.cuda.CUdevice_attribute(value)#
Device properties
- CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK = 1#
Maximum number of threads per block
- CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X = 2#
Maximum block dimension X
- CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y = 3#
Maximum block dimension Y
- CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z = 4#
Maximum block dimension Z
- CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X = 5#
Maximum grid dimension X
- CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y = 6#
Maximum grid dimension Y
- CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z = 7#
Maximum grid dimension Z
- CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK = 8#
Maximum shared memory available per block in bytes
- CU_DEVICE_ATTRIBUTE_SHARED_MEMORY_PER_BLOCK = 8#
Deprecated, use CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK
- CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY = 9#
Memory available on device for constant variables in a CUDA C kernel in bytes
- CU_DEVICE_ATTRIBUTE_WARP_SIZE = 10#
Warp size in threads
- CU_DEVICE_ATTRIBUTE_MAX_PITCH = 11#
Maximum pitch in bytes allowed by memory copies
- CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK = 12#
Maximum number of 32-bit registers available per block
- CU_DEVICE_ATTRIBUTE_REGISTERS_PER_BLOCK = 12#
Deprecated, use CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK
- CU_DEVICE_ATTRIBUTE_CLOCK_RATE = 13#
Typical clock frequency in kilohertz
- CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT = 14#
Alignment requirement for textures
- CU_DEVICE_ATTRIBUTE_GPU_OVERLAP = 15#
Device can possibly copy memory and execute a kernel concurrently. Deprecated. Use instead CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT.
- CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT = 16#
Number of multiprocessors on device
- CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT = 17#
Specifies whether there is a run time limit on kernels
- CU_DEVICE_ATTRIBUTE_INTEGRATED = 18#
Device is integrated with host memory
- CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY = 19#
Device can map host memory into CUDA address space
- CU_DEVICE_ATTRIBUTE_COMPUTE_MODE = 20#
Compute mode (See
CUcomputemode
for details)
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_WIDTH = 21#
Maximum 1D texture width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_WIDTH = 22#
Maximum 2D texture width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_HEIGHT = 23#
Maximum 2D texture height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH = 24#
Maximum 3D texture width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT = 25#
Maximum 3D texture height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH = 26#
Maximum 3D texture depth
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_WIDTH = 27#
Maximum 2D layered texture width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_HEIGHT = 28#
Maximum 2D layered texture height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_LAYERS = 29#
Maximum layers in a 2D layered texture
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_WIDTH = 27#
Deprecated, use CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_WIDTH
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_HEIGHT = 28#
Deprecated, use CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_HEIGHT
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES = 29#
Deprecated, use CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LAYERED_LAYERS
- CU_DEVICE_ATTRIBUTE_SURFACE_ALIGNMENT = 30#
Alignment requirement for surfaces
- CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31#
Device can possibly execute multiple kernels concurrently
- CU_DEVICE_ATTRIBUTE_ECC_ENABLED = 32#
Device has ECC support enabled
- CU_DEVICE_ATTRIBUTE_PCI_BUS_ID = 33#
PCI bus ID of the device
- CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID = 34#
PCI device ID of the device
- CU_DEVICE_ATTRIBUTE_TCC_DRIVER = 35#
Device is using TCC driver model
- CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE = 36#
Peak memory clock frequency in kilohertz
- CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH = 37#
Global memory bus width in bits
- CU_DEVICE_ATTRIBUTE_L2_CACHE_SIZE = 38#
Size of L2 cache in bytes
- CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39#
Maximum resident threads per multiprocessor
- CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40#
Number of asynchronous engines
- CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41#
Device shares a unified address space with the host
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LAYERED_WIDTH = 42#
Maximum 1D layered texture width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LAYERED_LAYERS = 43#
Maximum layers in a 1D layered texture
- CU_DEVICE_ATTRIBUTE_CAN_TEX2D_GATHER = 44#
Deprecated, do not use.
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_GATHER_WIDTH = 45#
Maximum 2D texture width if CUDA_ARRAY3D_TEXTURE_GATHER is set
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_GATHER_HEIGHT = 46#
Maximum 2D texture height if CUDA_ARRAY3D_TEXTURE_GATHER is set
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH_ALTERNATE = 47#
Alternate maximum 3D texture width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT_ALTERNATE = 48#
Alternate maximum 3D texture height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH_ALTERNATE = 49#
Alternate maximum 3D texture depth
- CU_DEVICE_ATTRIBUTE_PCI_DOMAIN_ID = 50#
PCI domain ID of the device
- CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT = 51#
Pitch alignment requirement for textures
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_WIDTH = 52#
Maximum cubemap texture width/height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_LAYERED_WIDTH = 53#
Maximum cubemap layered texture width/height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURECUBEMAP_LAYERED_LAYERS = 54#
Maximum layers in a cubemap layered texture
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_WIDTH = 55#
Maximum 1D surface width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_WIDTH = 56#
Maximum 2D surface width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_HEIGHT = 57#
Maximum 2D surface height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_WIDTH = 58#
Maximum 3D surface width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_HEIGHT = 59#
Maximum 3D surface height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE3D_DEPTH = 60#
Maximum 3D surface depth
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_LAYERED_WIDTH = 61#
Maximum 1D layered surface width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE1D_LAYERED_LAYERS = 62#
Maximum layers in a 1D layered surface
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_WIDTH = 63#
Maximum 2D layered surface width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_HEIGHT = 64#
Maximum 2D layered surface height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACE2D_LAYERED_LAYERS = 65#
Maximum layers in a 2D layered surface
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_WIDTH = 66#
Maximum cubemap surface width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_LAYERED_WIDTH = 67#
Maximum cubemap layered surface width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_SURFACECUBEMAP_LAYERED_LAYERS = 68#
Maximum layers in a cubemap layered surface
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_LINEAR_WIDTH = 69#
Deprecated, do not use. Use cudaDeviceGetTexture1DLinearMaxWidth() or
cuDeviceGetTexture1DLinearMaxWidth()
instead.
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_WIDTH = 70#
Maximum 2D linear texture width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_HEIGHT = 71#
Maximum 2D linear texture height
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_LINEAR_PITCH = 72#
Maximum 2D linear texture pitch in bytes
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_MIPMAPPED_WIDTH = 73#
Maximum mipmapped 2D texture width
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_MIPMAPPED_HEIGHT = 74#
Maximum mipmapped 2D texture height
- CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75#
Major compute capability version number
- CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76#
Minor compute capability version number
- CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_MIPMAPPED_WIDTH = 77#
Maximum mipmapped 1D texture width
- CU_DEVICE_ATTRIBUTE_STREAM_PRIORITIES_SUPPORTED = 78#
Device supports stream priorities
- CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED = 79#
Device supports caching globals in L1
- CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED = 80#
Device supports caching locals in L1
- CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR = 81#
Maximum shared memory available per multiprocessor in bytes
- CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82#
Maximum number of 32-bit registers available per multiprocessor
- CU_DEVICE_ATTRIBUTE_MANAGED_MEMORY = 83#
Device can allocate managed memory on this system
- CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD = 84#
Device is on a multi-GPU board
- CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD_GROUP_ID = 85#
Unique id for a group of devices on the same multi-GPU board
- CU_DEVICE_ATTRIBUTE_HOST_NATIVE_ATOMIC_SUPPORTED = 86#
Link between the device and the host supports native atomic operations (this is a placeholder attribute, and is not supported on any current hardware)
- CU_DEVICE_ATTRIBUTE_SINGLE_TO_DOUBLE_PRECISION_PERF_RATIO = 87#
Ratio of single precision performance (in floating-point operations per second) to double precision performance
- CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS = 88#
Device supports coherently accessing pageable memory without calling cudaHostRegister on it
- CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS = 89#
Device can coherently access managed memory concurrently with the CPU
- CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED = 90#
Device supports compute preemption.
- CU_DEVICE_ATTRIBUTE_CAN_USE_HOST_POINTER_FOR_REGISTERED_MEM = 91#
Device can access host registered memory at the same virtual address as the CPU
- CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS_V1 = 92#
Deprecated, along with v1 MemOps API,
cuStreamBatchMemOp
and related APIs are supported.
- CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS_V1 = 93#
Deprecated, along with v1 MemOps API, 64-bit operations are supported in
cuStreamBatchMemOp
and related APIs.
- CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_WAIT_VALUE_NOR_V1 = 94#
Deprecated, along with v1 MemOps API,
CU_STREAM_WAIT_VALUE_NOR
is supported.
- CU_DEVICE_ATTRIBUTE_COOPERATIVE_LAUNCH = 95#
Device supports launching cooperative kernels via
cuLaunchCooperativeKernel
- CU_DEVICE_ATTRIBUTE_COOPERATIVE_MULTI_DEVICE_LAUNCH = 96#
Deprecated,
cuLaunchCooperativeKernelMultiDevice
is deprecated.
- CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN = 97#
Maximum optin shared memory per block
- CU_DEVICE_ATTRIBUTE_CAN_FLUSH_REMOTE_WRITES = 98#
The
CU_STREAM_WAIT_VALUE_FLUSH
flag and theCU_STREAM_MEM_OP_FLUSH_REMOTE_WRITES
MemOp are supported on the device. SeeStream Memory Operations
for additional details.
- CU_DEVICE_ATTRIBUTE_HOST_REGISTER_SUPPORTED = 99#
Device supports host memory registration via
cudaHostRegister
.
- CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES = 100#
Device accesses pageable memory via the host’s page tables.
- CU_DEVICE_ATTRIBUTE_DIRECT_MANAGED_MEM_ACCESS_FROM_HOST = 101#
The host can directly access managed memory on the device without migration.
- CU_DEVICE_ATTRIBUTE_VIRTUAL_ADDRESS_MANAGEMENT_SUPPORTED = 102#
Deprecated, Use CU_DEVICE_ATTRIBUTE_VIRTUAL_MEMORY_MANAGEMENT_SUPPORTED
- CU_DEVICE_ATTRIBUTE_VIRTUAL_MEMORY_MANAGEMENT_SUPPORTED = 102#
Device supports virtual memory management APIs like
cuMemAddressReserve
,cuMemCreate
,cuMemMap
and related APIs
- CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR_SUPPORTED = 103#
Device supports exporting memory to a posix file descriptor with
cuMemExportToShareableHandle
, if requested viacuMemCreate
- CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_WIN32_HANDLE_SUPPORTED = 104#
Device supports exporting memory to a Win32 NT handle with
cuMemExportToShareableHandle
, if requested viacuMemCreate
- CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_WIN32_KMT_HANDLE_SUPPORTED = 105#
Device supports exporting memory to a Win32 KMT handle with
cuMemExportToShareableHandle
, if requested viacuMemCreate
- CU_DEVICE_ATTRIBUTE_MAX_BLOCKS_PER_MULTIPROCESSOR = 106#
Maximum number of blocks per multiprocessor
- CU_DEVICE_ATTRIBUTE_GENERIC_COMPRESSION_SUPPORTED = 107#
Device supports compression of memory
- CU_DEVICE_ATTRIBUTE_MAX_PERSISTING_L2_CACHE_SIZE = 108#
Maximum L2 persisting lines capacity setting in bytes.
- CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_WITH_CUDA_VMM_SUPPORTED = 110#
Device supports specifying the GPUDirect RDMA flag with
cuMemCreate
- CU_DEVICE_ATTRIBUTE_RESERVED_SHARED_MEMORY_PER_BLOCK = 111#
Shared memory reserved by CUDA driver per block in bytes
- CU_DEVICE_ATTRIBUTE_SPARSE_CUDA_ARRAY_SUPPORTED = 112#
Device supports sparse CUDA arrays and sparse CUDA mipmapped arrays
- CU_DEVICE_ATTRIBUTE_READ_ONLY_HOST_REGISTER_SUPPORTED = 113#
Device supports using the
cuMemHostRegister
flagCU_MEMHOSTERGISTER_READ_ONLY
to register memory that must be mapped as read-only to the GPU
- CU_DEVICE_ATTRIBUTE_TIMELINE_SEMAPHORE_INTEROP_SUPPORTED = 114#
External timeline semaphore interop is supported on the device
- CU_DEVICE_ATTRIBUTE_MEMORY_POOLS_SUPPORTED = 115#
Device supports using the
cuMemAllocAsync
andcuMemPool
family of APIs
- CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_SUPPORTED = 116#
Device supports GPUDirect RDMA APIs, like nvidia_p2p_get_pages (see https://docs.nvidia.com/cuda/gpudirect-rdma for more information)
- CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_FLUSH_WRITES_OPTIONS = 117#
The returned attribute shall be interpreted as a bitmask, where the individual bits are described by the
CUflushGPUDirectRDMAWritesOptions
enum
- CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_WRITES_ORDERING = 118#
GPUDirect RDMA writes to the device do not need to be flushed for consumers within the scope indicated by the returned attribute. See
CUGPUDirectRDMAWritesOrdering
for the numerical values returned here.
- CU_DEVICE_ATTRIBUTE_MEMPOOL_SUPPORTED_HANDLE_TYPES = 119#
Handle types supported with mempool based IPC
- CU_DEVICE_ATTRIBUTE_CLUSTER_LAUNCH = 120#
Indicates device supports cluster launch
- CU_DEVICE_ATTRIBUTE_DEFERRED_MAPPING_CUDA_ARRAY_SUPPORTED = 121#
Device supports deferred mapping CUDA arrays and CUDA mipmapped arrays
- CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS = 122#
64-bit operations are supported in
cuStreamBatchMemOp
and related MemOp APIs.
- CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_WAIT_VALUE_NOR = 123#
CU_STREAM_WAIT_VALUE_NOR
is supported by MemOp APIs.
- CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORTED = 124#
Device supports buffer sharing with dma_buf mechanism.
- CU_DEVICE_ATTRIBUTE_IPC_EVENT_SUPPORTED = 125#
Device supports IPC Events.
- CU_DEVICE_ATTRIBUTE_MEM_SYNC_DOMAIN_COUNT = 126#
Number of memory domains the device supports.
- CU_DEVICE_ATTRIBUTE_TENSOR_MAP_ACCESS_SUPPORTED = 127#
Device supports accessing memory using Tensor Map.
- CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_FABRIC_SUPPORTED = 128#
Device supports exporting memory to a fabric handle with
cuMemExportToShareableHandle()
or requested withcuMemCreate()
- CU_DEVICE_ATTRIBUTE_UNIFIED_FUNCTION_POINTERS = 129#
Device supports unified function pointers.
- CU_DEVICE_ATTRIBUTE_NUMA_CONFIG = 130#
NUMA configuration of a device: value is of type
CUdeviceNumaConfig
enum
- CU_DEVICE_ATTRIBUTE_NUMA_ID = 131#
NUMA node ID of the GPU memory
- CU_DEVICE_ATTRIBUTE_MULTICAST_SUPPORTED = 132#
Device supports switch multicast and reduction operations.
- CU_DEVICE_ATTRIBUTE_MPS_ENABLED = 133#
Indicates if contexts created on this device will be shared via MPS
- CU_DEVICE_ATTRIBUTE_HOST_NUMA_ID = 134#
NUMA ID of the host node closest to the device. Returns -1 when system does not support NUMA.
- CU_DEVICE_ATTRIBUTE_D3D12_CIG_SUPPORTED = 135#
Device supports CIG with D3D12.
- CU_DEVICE_ATTRIBUTE_MAX = 136#
- class cuda.cuda.CUpointer_attribute(value)#
Pointer information
- CU_POINTER_ATTRIBUTE_MEMORY_TYPE = 2#
The
CUmemorytype
describing the physical location of a pointer
- CU_POINTER_ATTRIBUTE_DEVICE_POINTER = 3#
The address at which a pointer’s memory may be accessed on the device
- CU_POINTER_ATTRIBUTE_HOST_POINTER = 4#
The address at which a pointer’s memory may be accessed on the host
- CU_POINTER_ATTRIBUTE_P2P_TOKENS = 5#
A pair of tokens for use with the nv-p2p.h Linux kernel interface
- CU_POINTER_ATTRIBUTE_SYNC_MEMOPS = 6#
Synchronize every synchronous memory operation initiated on this region
- CU_POINTER_ATTRIBUTE_BUFFER_ID = 7#
A process-wide unique ID for an allocated memory region
- CU_POINTER_ATTRIBUTE_IS_MANAGED = 8#
Indicates if the pointer points to managed memory
- CU_POINTER_ATTRIBUTE_DEVICE_ORDINAL = 9#
A device ordinal of a device on which a pointer was allocated or registered
- CU_POINTER_ATTRIBUTE_IS_LEGACY_CUDA_IPC_CAPABLE = 10#
1 if this pointer maps to an allocation that is suitable for
cudaIpcGetMemHandle
, 0 otherwise
- CU_POINTER_ATTRIBUTE_RANGE_START_ADDR = 11#
Starting address for this requested pointer
- CU_POINTER_ATTRIBUTE_RANGE_SIZE = 12#
Size of the address range for this requested pointer
- CU_POINTER_ATTRIBUTE_MAPPED = 13#
1 if this pointer is in a valid address range that is mapped to a backing allocation, 0 otherwise
- CU_POINTER_ATTRIBUTE_ALLOWED_HANDLE_TYPES = 14#
Bitmask of allowed
CUmemAllocationHandleType
for this allocation
- CU_POINTER_ATTRIBUTE_IS_GPU_DIRECT_RDMA_CAPABLE = 15#
1 if the memory this pointer is referencing can be used with the GPUDirect RDMA API
- CU_POINTER_ATTRIBUTE_ACCESS_FLAGS = 16#
Returns the access flags the device associated with the current context has on the corresponding memory referenced by the pointer given
- CU_POINTER_ATTRIBUTE_MEMPOOL_HANDLE = 17#
Returns the mempool handle for the allocation if it was allocated from a mempool. Otherwise returns NULL.
- CU_POINTER_ATTRIBUTE_MAPPING_SIZE = 18#
Size of the actual underlying mapping that the pointer belongs to
- CU_POINTER_ATTRIBUTE_MAPPING_BASE_ADDR = 19#
The start address of the mapping that the pointer belongs to
- CU_POINTER_ATTRIBUTE_MEMORY_BLOCK_ID = 20#
A process-wide unique id corresponding to the physical allocation the pointer belongs to
- class cuda.cuda.CUfunction_attribute(value)#
Function properties
- CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK = 0#
The maximum number of threads per block, beyond which a launch of the function would fail. This number depends on both the function and the device on which the function is currently loaded.
- CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES = 1#
The size in bytes of statically-allocated shared memory required by this function. This does not include dynamically-allocated shared memory requested by the user at runtime.
- CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES = 2#
The size in bytes of user-allocated constant memory required by this function.
- CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES = 3#
The size in bytes of local memory used by each thread of this function.
- CU_FUNC_ATTRIBUTE_NUM_REGS = 4#
The number of registers used by each thread of this function.
- CU_FUNC_ATTRIBUTE_PTX_VERSION = 5#
The PTX virtual architecture version for which the function was compiled. This value is the major PTX version * 10 + the minor PTX version, so a PTX version 1.3 function would return the value 13. Note that this may return the undefined value of 0 for cubins compiled prior to CUDA 3.0.
- CU_FUNC_ATTRIBUTE_BINARY_VERSION = 6#
The binary architecture version for which the function was compiled. This value is the major binary version * 10 + the minor binary version, so a binary version 1.3 function would return the value 13. Note that this will return a value of 10 for legacy cubins that do not have a properly-encoded binary architecture version.
- CU_FUNC_ATTRIBUTE_CACHE_MODE_CA = 7#
The attribute to indicate whether the function has been compiled with user specified option “-Xptxas –dlcm=ca” set .
- CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES = 8#
The maximum size in bytes of dynamically-allocated shared memory that can be used by this function. If the user-specified dynamic shared memory size is larger than this value, the launch will fail. See
cuFuncSetAttribute
,cuKernelSetAttribute
- CU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT = 9#
On devices where the L1 cache and shared memory use the same hardware resources, this sets the shared memory carveout preference, in percent of the total shared memory. Refer to
CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR
. This is only a hint, and the driver can choose a different ratio if required to execute the function. SeecuFuncSetAttribute
,cuKernelSetAttribute
- CU_FUNC_ATTRIBUTE_CLUSTER_SIZE_MUST_BE_SET = 10#
If this attribute is set, the kernel must launch with a valid cluster size specified. See
cuFuncSetAttribute
,cuKernelSetAttribute
- CU_FUNC_ATTRIBUTE_REQUIRED_CLUSTER_WIDTH = 11#
The required cluster width in blocks. The values must either all be 0 or all be positive. The validity of the cluster dimensions is otherwise checked at launch time.
If the value is set during compile time, it cannot be set at runtime. Setting it at runtime will return CUDA_ERROR_NOT_PERMITTED. See
cuFuncSetAttribute
,cuKernelSetAttribute
- CU_FUNC_ATTRIBUTE_REQUIRED_CLUSTER_HEIGHT = 12#
The required cluster height in blocks. The values must either all be 0 or all be positive. The validity of the cluster dimensions is otherwise checked at launch time.
If the value is set during compile time, it cannot be set at runtime. Setting it at runtime should return CUDA_ERROR_NOT_PERMITTED. See
cuFuncSetAttribute
,cuKernelSetAttribute
- CU_FUNC_ATTRIBUTE_REQUIRED_CLUSTER_DEPTH = 13#
The required cluster depth in blocks. The values must either all be 0 or all be positive. The validity of the cluster dimensions is otherwise checked at launch time.
If the value is set during compile time, it cannot be set at runtime. Setting it at runtime should return CUDA_ERROR_NOT_PERMITTED. See
cuFuncSetAttribute
,cuKernelSetAttribute
- CU_FUNC_ATTRIBUTE_NON_PORTABLE_CLUSTER_SIZE_ALLOWED = 14#
Whether the function can be launched with non-portable cluster size. 1 is allowed, 0 is disallowed. A non-portable cluster size may only function on the specific SKUs the program is tested on. The launch might fail if the program is run on a different hardware platform.
CUDA API provides cudaOccupancyMaxActiveClusters to assist with checking whether the desired size can be launched on the current device.
Portable Cluster Size
A portable cluster size is guaranteed to be functional on all compute capabilities higher than the target compute capability. The portable cluster size for sm_90 is 8 blocks per cluster. This value may increase for future compute capabilities.
The specific hardware unit may support higher cluster sizes that’s not guaranteed to be portable. See
cuFuncSetAttribute
,cuKernelSetAttribute
- CU_FUNC_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE = 15#
The block scheduling policy of a function. The value type is CUclusterSchedulingPolicy / cudaClusterSchedulingPolicy. See
cuFuncSetAttribute
,cuKernelSetAttribute
- CU_FUNC_ATTRIBUTE_MAX = 16#
- class cuda.cuda.CUfunc_cache(value)#
Function cache configurations
- CU_FUNC_CACHE_PREFER_NONE = 0#
no preference for shared memory or L1 (default)
- CU_FUNC_CACHE_PREFER_SHARED = 1#
prefer larger shared memory and smaller L1 cache
- CU_FUNC_CACHE_PREFER_L1 = 2#
prefer larger L1 cache and smaller shared memory
- CU_FUNC_CACHE_PREFER_EQUAL = 3#
prefer equal sized L1 cache and shared memory
[Deprecated] Shared memory configurations
set default shared memory bank size
set shared memory bank width to four bytes
set shared memory bank width to eight bytes
Shared memory carveout configurations. These may be passed to
cuFuncSetAttribute
orcuKernelSetAttribute
No preference for shared memory or L1 (default)
Prefer maximum available shared memory, minimum L1 cache
Prefer maximum available L1 cache, minimum shared memory
- class cuda.cuda.CUmemorytype(value)#
Memory types
- CU_MEMORYTYPE_HOST = 1#
Host memory
- CU_MEMORYTYPE_DEVICE = 2#
Device memory
- CU_MEMORYTYPE_ARRAY = 3#
Array memory
- CU_MEMORYTYPE_UNIFIED = 4#
Unified device or host memory
- class cuda.cuda.CUcomputemode(value)#
Compute Modes
- CU_COMPUTEMODE_DEFAULT = 0#
Default compute mode (Multiple contexts allowed per device)
- CU_COMPUTEMODE_PROHIBITED = 2#
Compute-prohibited mode (No contexts can be created on this device at this time)
- CU_COMPUTEMODE_EXCLUSIVE_PROCESS = 3#
Compute-exclusive-process mode (Only one context used by a single process can be present on this device at a time)
- class cuda.cuda.CUmem_advise(value)#
Memory advise values
- CU_MEM_ADVISE_SET_READ_MOSTLY = 1#
Data will mostly be read and only occasionally be written to
- CU_MEM_ADVISE_UNSET_READ_MOSTLY = 2#
Undo the effect of
CU_MEM_ADVISE_SET_READ_MOSTLY
- CU_MEM_ADVISE_SET_PREFERRED_LOCATION = 3#
Set the preferred location for the data as the specified device
- CU_MEM_ADVISE_UNSET_PREFERRED_LOCATION = 4#
Clear the preferred location for the data
- CU_MEM_ADVISE_SET_ACCESSED_BY = 5#
Data will be accessed by the specified device, so prevent page faults as much as possible
- CU_MEM_ADVISE_UNSET_ACCESSED_BY = 6#
Let the Unified Memory subsystem decide on the page faulting policy for the specified device
- class cuda.cuda.CUmem_range_attribute(value)#
- CU_MEM_RANGE_ATTRIBUTE_READ_MOSTLY = 1#
Whether the range will mostly be read and only occasionally be written to
- CU_MEM_RANGE_ATTRIBUTE_PREFERRED_LOCATION = 2#
The preferred location of the range
- CU_MEM_RANGE_ATTRIBUTE_ACCESSED_BY = 3#
Memory range has
CU_MEM_ADVISE_SET_ACCESSED_BY
set for specified device
- CU_MEM_RANGE_ATTRIBUTE_LAST_PREFETCH_LOCATION = 4#
The last location to which the range was prefetched
- CU_MEM_RANGE_ATTRIBUTE_PREFERRED_LOCATION_TYPE = 5#
The preferred location type of the range
- CU_MEM_RANGE_ATTRIBUTE_PREFERRED_LOCATION_ID = 6#
The preferred location id of the range
- CU_MEM_RANGE_ATTRIBUTE_LAST_PREFETCH_LOCATION_TYPE = 7#
The last location type to which the range was prefetched
- CU_MEM_RANGE_ATTRIBUTE_LAST_PREFETCH_LOCATION_ID = 8#
The last location id to which the range was prefetched
- class cuda.cuda.CUjit_option(value)#
Online compiler and linker options
- CU_JIT_MAX_REGISTERS = 0#
Max number of registers that a thread may use.
Option type: unsigned int
Applies to: compiler only
- CU_JIT_THREADS_PER_BLOCK = 1#
IN: Specifies minimum number of threads per block to target compilation for
OUT: Returns the number of threads the compiler actually targeted. This restricts the resource utilization of the compiler (e.g. max registers) such that a block with the given number of threads should be able to launch based on register limitations. Note, this option does not currently take into account any other resource limitations, such as shared memory utilization.
Cannot be combined with
CU_JIT_TARGET
.Option type: unsigned int
Applies to: compiler only
- CU_JIT_WALL_TIME = 2#
Overwrites the option value with the total wall clock time, in milliseconds, spent in the compiler and linker
Option type: float
Applies to: compiler and linker
- CU_JIT_INFO_LOG_BUFFER = 3#
Pointer to a buffer in which to print any log messages that are informational in nature (the buffer size is specified via option
CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES
)Option type: char *
Applies to: compiler and linker
- CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4#
IN: Log buffer size in bytes. Log messages will be capped at this size (including null terminator)
OUT: Amount of log buffer filled with messages
Option type: unsigned int
Applies to: compiler and linker
- CU_JIT_ERROR_LOG_BUFFER = 5#
Pointer to a buffer in which to print any log messages that reflect errors (the buffer size is specified via option
CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES
)Option type: char *
Applies to: compiler and linker
- CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6#
IN: Log buffer size in bytes. Log messages will be capped at this size (including null terminator)
OUT: Amount of log buffer filled with messages
Option type: unsigned int
Applies to: compiler and linker
- CU_JIT_OPTIMIZATION_LEVEL = 7#
Level of optimizations to apply to generated code (0 - 4), with 4 being the default and highest level of optimizations.
Option type: unsigned int
Applies to: compiler only
- CU_JIT_TARGET_FROM_CUCONTEXT = 8#
No option value required. Determines the target based on the current attached context (default)
Option type: No option value needed
Applies to: compiler and linker
- CU_JIT_TARGET = 9#
Target is chosen based on supplied
CUjit_target
. Cannot be combined withCU_JIT_THREADS_PER_BLOCK
.Option type: unsigned int for enumerated type
CUjit_target
Applies to: compiler and linker
- CU_JIT_FALLBACK_STRATEGY = 10#
Specifies choice of fallback strategy if matching cubin is not found. Choice is based on supplied
CUjit_fallback
. This option cannot be used with cuLink* APIs as the linker requires exact matches.Option type: unsigned int for enumerated type
CUjit_fallback
Applies to: compiler only
- CU_JIT_GENERATE_DEBUG_INFO = 11#
Specifies whether to create debug information in output (-g) (0: false, default)
Option type: int
Applies to: compiler and linker
- CU_JIT_LOG_VERBOSE = 12#
Generate verbose log messages (0: false, default)
Option type: int
Applies to: compiler and linker
- CU_JIT_GENERATE_LINE_INFO = 13#
Generate line number information (-lineinfo) (0: false, default)
Option type: int
Applies to: compiler only
- CU_JIT_CACHE_MODE = 14#
Specifies whether to enable caching explicitly (-dlcm)
Choice is based on supplied
CUjit_cacheMode_enum
.Option type: unsigned int for enumerated type
CUjit_cacheMode_enum
Applies to: compiler only
- CU_JIT_NEW_SM3X_OPT = 15#
[Deprecated]
- CU_JIT_FAST_COMPILE = 16#
This jit option is used for internal purpose only.
- CU_JIT_GLOBAL_SYMBOL_NAMES = 17#
Array of device symbol names that will be relocated to the corresponding host addresses stored in
CU_JIT_GLOBAL_SYMBOL_ADDRESSES
.Must contain
CU_JIT_GLOBAL_SYMBOL_COUNT
entries.When loading a device module, driver will relocate all encountered unresolved symbols to the host addresses.
It is only allowed to register symbols that correspond to unresolved global variables.
It is illegal to register the same device symbol at multiple addresses.
Option type: const char **
Applies to: dynamic linker only
- CU_JIT_GLOBAL_SYMBOL_ADDRESSES = 18#
Array of host addresses that will be used to relocate corresponding device symbols stored in
CU_JIT_GLOBAL_SYMBOL_NAMES
.Must contain
CU_JIT_GLOBAL_SYMBOL_COUNT
entries.Option type: void **
Applies to: dynamic linker only
- CU_JIT_GLOBAL_SYMBOL_COUNT = 19#
Number of entries in
CU_JIT_GLOBAL_SYMBOL_NAMES
andCU_JIT_GLOBAL_SYMBOL_ADDRESSES
arrays.Option type: unsigned int
Applies to: dynamic linker only
- CU_JIT_LTO = 20#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_FTZ = 21#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_PREC_DIV = 22#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_PREC_SQRT = 23#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_FMA = 24#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_REFERENCED_KERNEL_NAMES = 25#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_REFERENCED_KERNEL_COUNT = 26#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_REFERENCED_VARIABLE_NAMES = 27#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_REFERENCED_VARIABLE_COUNT = 28#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_OPTIMIZE_UNUSED_DEVICE_VARIABLES = 29#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_POSITION_INDEPENDENT_CODE = 30#
Generate position independent code (0: false)
Option type: int
Applies to: compiler only
- CU_JIT_MIN_CTA_PER_SM = 31#
This option hints to the JIT compiler the minimum number of CTAs from the kernel’s grid to be mapped to a SM. This option is ignored when used together with
CU_JIT_MAX_REGISTERS
orCU_JIT_THREADS_PER_BLOCK
. Optimizations based on this option needCU_JIT_MAX_THREADS_PER_BLOCK
to be specified as well. For kernels already using PTX directive .minnctapersm, this option will be ignored by default. UseCU_JIT_OVERRIDE_DIRECTIVE_VALUES
to let this option take precedence over the PTX directive. Option type: unsigned intApplies to: compiler only
- CU_JIT_MAX_THREADS_PER_BLOCK = 32#
Maximum number threads in a thread block, computed as the product of the maximum extent specifed for each dimension of the block. This limit is guaranteed not to be exeeded in any invocation of the kernel. Exceeding the the maximum number of threads results in runtime error or kernel launch failure. For kernels already using PTX directive .maxntid, this option will be ignored by default. Use
CU_JIT_OVERRIDE_DIRECTIVE_VALUES
to let this option take precedence over the PTX directive. Option type: intApplies to: compiler only
- CU_JIT_OVERRIDE_DIRECTIVE_VALUES = 33#
This option lets the values specified using
CU_JIT_MAX_REGISTERS
,CU_JIT_THREADS_PER_BLOCK
,CU_JIT_MAX_THREADS_PER_BLOCK
andCU_JIT_MIN_CTA_PER_SM
take precedence over any PTX directives. (0: Disable, default; 1: Enable) Option type: intApplies to: compiler only
- CU_JIT_NUM_OPTIONS = 34#
- class cuda.cuda.CUjit_target(value)#
Online compilation targets
- CU_TARGET_COMPUTE_30 = 30#
Compute device class 3.0
- CU_TARGET_COMPUTE_32 = 32#
Compute device class 3.2
- CU_TARGET_COMPUTE_35 = 35#
Compute device class 3.5
- CU_TARGET_COMPUTE_37 = 37#
Compute device class 3.7
- CU_TARGET_COMPUTE_50 = 50#
Compute device class 5.0
- CU_TARGET_COMPUTE_52 = 52#
Compute device class 5.2
- CU_TARGET_COMPUTE_53 = 53#
Compute device class 5.3
- CU_TARGET_COMPUTE_60 = 60#
Compute device class 6.0.
- CU_TARGET_COMPUTE_61 = 61#
Compute device class 6.1.
- CU_TARGET_COMPUTE_62 = 62#
Compute device class 6.2.
- CU_TARGET_COMPUTE_70 = 70#
Compute device class 7.0.
- CU_TARGET_COMPUTE_72 = 72#
Compute device class 7.2.
- CU_TARGET_COMPUTE_75 = 75#
Compute device class 7.5.
- CU_TARGET_COMPUTE_80 = 80#
Compute device class 8.0.
- CU_TARGET_COMPUTE_86 = 86#
Compute device class 8.6.
- CU_TARGET_COMPUTE_87 = 87#
Compute device class 8.7.
- CU_TARGET_COMPUTE_89 = 89#
Compute device class 8.9.
- CU_TARGET_COMPUTE_90 = 90#
Compute device class 9.0. Compute device class 9.0. with accelerated features.
- CU_TARGET_COMPUTE_90A = 65626#
- class cuda.cuda.CUjit_fallback(value)#
Cubin matching fallback strategies
- CU_PREFER_PTX = 0#
Prefer to compile ptx if exact binary match not found
- CU_PREFER_BINARY = 1#
Prefer to fall back to compatible binary code if exact match not found
- class cuda.cuda.CUjit_cacheMode(value)#
Caching modes for dlcm
- CU_JIT_CACHE_OPTION_NONE = 0#
Compile with no -dlcm flag specified
- CU_JIT_CACHE_OPTION_CG = 1#
Compile with L1 cache disabled
- CU_JIT_CACHE_OPTION_CA = 2#
Compile with L1 cache enabled
- class cuda.cuda.CUjitInputType(value)#
Device code formats
- CU_JIT_INPUT_CUBIN = 0#
Compiled device-class-specific device code
Applicable options: none
- CU_JIT_INPUT_PTX = 1#
PTX source code
Applicable options: PTX compiler options
- CU_JIT_INPUT_FATBINARY = 2#
Bundle of multiple cubins and/or PTX of some device code
Applicable options: PTX compiler options,
CU_JIT_FALLBACK_STRATEGY
- CU_JIT_INPUT_OBJECT = 3#
Host object with embedded device code
Applicable options: PTX compiler options,
CU_JIT_FALLBACK_STRATEGY
- CU_JIT_INPUT_LIBRARY = 4#
Archive of host objects with embedded device code
Applicable options: PTX compiler options,
CU_JIT_FALLBACK_STRATEGY
- CU_JIT_INPUT_NVVM = 5#
[Deprecated]
Only valid with LTO-IR compiled with toolkits prior to CUDA 12.0
- CU_JIT_NUM_INPUT_TYPES = 6#
- class cuda.cuda.CUgraphicsRegisterFlags(value)#
Flags to register a graphics resource
- CU_GRAPHICS_REGISTER_FLAGS_NONE = 0#
- CU_GRAPHICS_REGISTER_FLAGS_READ_ONLY = 1#
- CU_GRAPHICS_REGISTER_FLAGS_WRITE_DISCARD = 2#
- CU_GRAPHICS_REGISTER_FLAGS_SURFACE_LDST = 4#
- CU_GRAPHICS_REGISTER_FLAGS_TEXTURE_GATHER = 8#
- class cuda.cuda.CUgraphicsMapResourceFlags(value)#
Flags for mapping and unmapping interop resources
- CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE = 0#
- CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY = 1#
- CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD = 2#
- class cuda.cuda.CUarray_cubemap_face(value)#
Array indices for cube faces
- CU_CUBEMAP_FACE_POSITIVE_X = 0#
Positive X face of cubemap
- CU_CUBEMAP_FACE_NEGATIVE_X = 1#
Negative X face of cubemap
- CU_CUBEMAP_FACE_POSITIVE_Y = 2#
Positive Y face of cubemap
- CU_CUBEMAP_FACE_NEGATIVE_Y = 3#
Negative Y face of cubemap
- CU_CUBEMAP_FACE_POSITIVE_Z = 4#
Positive Z face of cubemap
- CU_CUBEMAP_FACE_NEGATIVE_Z = 5#
Negative Z face of cubemap
- class cuda.cuda.CUlimit(value)#
Limits
- CU_LIMIT_STACK_SIZE = 0#
GPU thread stack size
- CU_LIMIT_PRINTF_FIFO_SIZE = 1#
GPU printf FIFO size
- CU_LIMIT_MALLOC_HEAP_SIZE = 2#
GPU malloc heap size
- CU_LIMIT_DEV_RUNTIME_SYNC_DEPTH = 3#
GPU device runtime launch synchronize depth
- CU_LIMIT_DEV_RUNTIME_PENDING_LAUNCH_COUNT = 4#
GPU device runtime pending launch count
- CU_LIMIT_MAX_L2_FETCH_GRANULARITY = 5#
A value between 0 and 128 that indicates the maximum fetch granularity of L2 (in Bytes). This is a hint
- CU_LIMIT_PERSISTING_L2_CACHE_SIZE = 6#
A size in bytes for L2 persisting lines cache size
- CU_LIMIT_SHMEM_SIZE = 7#
A maximum size in bytes of shared memory available to CUDA kernels on a CIG context. Can only be queried, cannot be set
- CU_LIMIT_CIG_ENABLED = 8#
A non-zero value indicates this CUDA context is a CIG-enabled context. Can only be queried, cannot be set
- CU_LIMIT_CIG_SHMEM_FALLBACK_ENABLED = 9#
When set to a non-zero value, CUDA will fail to launch a kernel on a CIG context, instead of using the fallback path, if the kernel uses more shared memory than available
- CU_LIMIT_MAX = 10#
- class cuda.cuda.CUresourcetype(value)#
Resource types
- CU_RESOURCE_TYPE_ARRAY = 0#
Array resource
- CU_RESOURCE_TYPE_MIPMAPPED_ARRAY = 1#
Mipmapped array resource
- CU_RESOURCE_TYPE_LINEAR = 2#
Linear resource
- CU_RESOURCE_TYPE_PITCH2D = 3#
Pitch 2D resource
- class cuda.cuda.CUaccessProperty(value)#
Specifies performance hint with
CUaccessPolicyWindow
for hitProp and missProp members.- CU_ACCESS_PROPERTY_NORMAL = 0#
Normal cache persistence.
- CU_ACCESS_PROPERTY_STREAMING = 1#
Streaming access is less likely to persit from cache.
- CU_ACCESS_PROPERTY_PERSISTING = 2#
Persisting access is more likely to persist in cache.
- class cuda.cuda.CUgraphConditionalNodeType(value)#
Conditional node types
- CU_GRAPH_COND_TYPE_IF = 0#
Conditional ‘if’ Node. Body executed once if condition value is non-zero.
- CU_GRAPH_COND_TYPE_WHILE = 1#
Conditional ‘while’ Node. Body executed repeatedly while condition value is non-zero.
- class cuda.cuda.CUgraphNodeType(value)#
Graph node types
- CU_GRAPH_NODE_TYPE_KERNEL = 0#
GPU kernel node
- CU_GRAPH_NODE_TYPE_MEMCPY = 1#
Memcpy node
- CU_GRAPH_NODE_TYPE_MEMSET = 2#
Memset node
- CU_GRAPH_NODE_TYPE_HOST = 3#
Host (executable) node
- CU_GRAPH_NODE_TYPE_GRAPH = 4#
Node which executes an embedded graph
- CU_GRAPH_NODE_TYPE_EMPTY = 5#
Empty (no-op) node
- CU_GRAPH_NODE_TYPE_WAIT_EVENT = 6#
External event wait node
- CU_GRAPH_NODE_TYPE_EVENT_RECORD = 7#
External event record node
- CU_GRAPH_NODE_TYPE_EXT_SEMAS_SIGNAL = 8#
External semaphore signal node
- CU_GRAPH_NODE_TYPE_EXT_SEMAS_WAIT = 9#
External semaphore wait node
- CU_GRAPH_NODE_TYPE_MEM_ALLOC = 10#
Memory Allocation Node
- CU_GRAPH_NODE_TYPE_MEM_FREE = 11#
Memory Free Node
- CU_GRAPH_NODE_TYPE_BATCH_MEM_OP = 12#
Batch MemOp Node
- CU_GRAPH_NODE_TYPE_CONDITIONAL = 13#
Conditional Node May be used to implement a conditional execution path or loop
inside of a graph. The graph(s) contained within the body of the conditional node
can be selectively executed or iterated upon based on the value of a conditional
variable.
Handles must be created in advance of creating the node
using
cuGraphConditionalHandleCreate
.The following restrictions apply to graphs which contain conditional nodes:
The graph cannot be used in a child node.
Only one instantiation of the graph may exist at any point in time.
The graph cannot be cloned.
To set the control value, supply a default value when creating the handle and/or
call
cudaGraphSetConditional
from device code.
- class cuda.cuda.CUgraphDependencyType(value)#
Type annotations that can be applied to graph edges as part of
CUgraphEdgeData
.- CU_GRAPH_DEPENDENCY_TYPE_DEFAULT = 0#
This is an ordinary dependency.
- CU_GRAPH_DEPENDENCY_TYPE_PROGRAMMATIC = 1#
This dependency type allows the downstream node to use cudaGridDependencySynchronize(). It may only be used between kernel nodes, and must be used with either the
CU_GRAPH_KERNEL_NODE_PORT_PROGRAMMATIC
orCU_GRAPH_KERNEL_NODE_PORT_LAUNCH_ORDER
outgoing port.
- class cuda.cuda.CUgraphInstantiateResult(value)#
Graph instantiation results
- CUDA_GRAPH_INSTANTIATE_SUCCESS = 0#
Instantiation succeeded
- CUDA_GRAPH_INSTANTIATE_ERROR = 1#
Instantiation failed for an unexpected reason which is described in the return value of the function
- CUDA_GRAPH_INSTANTIATE_INVALID_STRUCTURE = 2#
Instantiation failed due to invalid structure, such as cycles
- CUDA_GRAPH_INSTANTIATE_NODE_OPERATION_NOT_SUPPORTED = 3#
Instantiation for device launch failed because the graph contained an unsupported operation
- CUDA_GRAPH_INSTANTIATE_MULTIPLE_CTXS_NOT_SUPPORTED = 4#
Instantiation for device launch failed due to the nodes belonging to different contexts
- class cuda.cuda.CUsynchronizationPolicy(value)#
- CU_SYNC_POLICY_AUTO = 1#
- CU_SYNC_POLICY_SPIN = 2#
- CU_SYNC_POLICY_YIELD = 3#
- CU_SYNC_POLICY_BLOCKING_SYNC = 4#
- class cuda.cuda.CUclusterSchedulingPolicy(value)#
Cluster scheduling policies. These may be passed to
cuFuncSetAttribute
orcuKernelSetAttribute
- CU_CLUSTER_SCHEDULING_POLICY_DEFAULT = 0#
the default policy
- CU_CLUSTER_SCHEDULING_POLICY_SPREAD = 1#
spread the blocks within a cluster to the SMs
- CU_CLUSTER_SCHEDULING_POLICY_LOAD_BALANCING = 2#
allow the hardware to load-balance the blocks in a cluster to the SMs
- class cuda.cuda.CUlaunchMemSyncDomain(value)#
Memory Synchronization Domain A kernel can be launched in a specified memory synchronization domain that affects all memory operations issued by that kernel. A memory barrier issued in one domain will only order memory operations in that domain, thus eliminating latency increase from memory barriers ordering unrelated traffic. By default, kernels are launched in domain 0. Kernel launched with
CU_LAUNCH_MEM_SYNC_DOMAIN_REMOTE
will have a different domain ID. User may also alter the domain ID withCUlaunchMemSyncDomainMap
for a specific stream / graph node / kernel launch. SeeCU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN
,cuStreamSetAttribute
,cuLaunchKernelEx
,cuGraphKernelNodeSetAttribute
. Memory operations done in kernels launched in different domains are considered system- scope distanced. In other words, a GPU scoped memory synchronization is not sufficient for memory order to be observed by kernels in another memory synchronization domain even if they are on the same GPU.- CU_LAUNCH_MEM_SYNC_DOMAIN_DEFAULT = 0#
Launch kernels in the default domain
- CU_LAUNCH_MEM_SYNC_DOMAIN_REMOTE = 1#
Launch kernels in the remote domain
- class cuda.cuda.CUlaunchAttributeID(value)#
Launch attributes enum; used as id field of
CUlaunchAttribute
- CU_LAUNCH_ATTRIBUTE_IGNORE = 0#
Ignored entry, for convenient composition
- CU_LAUNCH_ATTRIBUTE_ACCESS_POLICY_WINDOW = 1#
Valid for streams, graph nodes, launches. See
accessPolicyWindow
.
- CU_LAUNCH_ATTRIBUTE_COOPERATIVE = 2#
Valid for graph nodes, launches. See
cooperative
.
- CU_LAUNCH_ATTRIBUTE_SYNCHRONIZATION_POLICY = 3#
Valid for streams. See
syncPolicy
.
- CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION = 4#
Valid for graph nodes, launches. See
clusterDim
.
- CU_LAUNCH_ATTRIBUTE_CLUSTER_SCHEDULING_POLICY_PREFERENCE = 5#
Valid for graph nodes, launches. See
clusterSchedulingPolicyPreference
.
- CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_STREAM_SERIALIZATION = 6#
Valid for launches. Setting
programmaticStreamSerializationAllowed
to non-0 signals that the kernel will use programmatic means to resolve its stream dependency, so that the CUDA runtime should opportunistically allow the grid’s execution to overlap with the previous kernel in the stream, if that kernel requests the overlap. The dependent launches can choose to wait on the dependency using the programmatic sync (cudaGridDependencySynchronize() or equivalent PTX instructions).
- CU_LAUNCH_ATTRIBUTE_PROGRAMMATIC_EVENT = 7#
Valid for launches. Set
programmaticEvent
to record the event. Event recorded through this launch attribute is guaranteed to only trigger after all block in the associated kernel trigger the event. A block can trigger the event through PTX launchdep.release or CUDA builtin function cudaTriggerProgrammaticLaunchCompletion(). A trigger can also be inserted at the beginning of each block’s execution if triggerAtBlockStart is set to non-0. The dependent launches can choose to wait on the dependency using the programmatic sync (cudaGridDependencySynchronize() or equivalent PTX instructions). Note that dependents (including the CPU thread callingcuEventSynchronize()
) are not guaranteed to observe the release precisely when it is released. For example,cuEventSynchronize()
may only observe the event trigger long after the associated kernel has completed. This recording type is primarily meant for establishing programmatic dependency between device tasks. Note also this type of dependency allows, but does not guarantee, concurrent execution of tasks.The event supplied must not be an interprocess or interop event. The event must disable timing (i.e. must be created with the
CU_EVENT_DISABLE_TIMING
flag set).
- CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN_MAP = 9#
Valid for streams, graph nodes, launches. See
memSyncDomainMap
.
- CU_LAUNCH_ATTRIBUTE_MEM_SYNC_DOMAIN = 10#
Valid for streams, graph nodes, launches. See
memSyncDomain
.
- CU_LAUNCH_ATTRIBUTE_LAUNCH_COMPLETION_EVENT = 12#
Valid for launches. Set
launchCompletionEvent
to record the event.Nominally, the event is triggered once all blocks of the kernel have begun execution. Currently this is a best effort. If a kernel B has a launch completion dependency on a kernel A, B may wait until A is complete. Alternatively, blocks of B may begin before all blocks of A have begun, for example if B can claim execution resources unavailable to A (e.g. they run on different GPUs) or if B is a higher priority than A. Exercise caution if such an ordering inversion could lead to deadlock.
A launch completion event is nominally similar to a programmatic event with triggerAtBlockStart set except that it is not visible to cudaGridDependencySynchronize() and can be used with compute capability less than 9.0.
The event supplied must not be an interprocess or interop event. The event must disable timing (i.e. must be created with the
CU_EVENT_DISABLE_TIMING
flag set).
- CU_LAUNCH_ATTRIBUTE_DEVICE_UPDATABLE_KERNEL_NODE = 13#
Valid for graph nodes, launches. This attribute is graphs-only, and passing it to a launch in a non-capturing stream will result in an error.
CUlaunchAttributeValue
::deviceUpdatableKernelNode::deviceUpdatable can only be set to 0 or 1. Setting the field to 1 indicates that the corresponding kernel node should be device-updatable. On success, a handle will be returned viaCUlaunchAttributeValue
::deviceUpdatableKernelNode::devNode which can be passed to the various device-side update functions to update the node’s kernel parameters from within another kernel. For more information on the types of device updates that can be made, as well as the relevant limitations thereof, seecudaGraphKernelNodeUpdatesApply
.Nodes which are device-updatable have additional restrictions compared to regular kernel nodes. Firstly, device-updatable nodes cannot be removed from their graph via
cuGraphDestroyNode
. Additionally, once opted-in to this functionality, a node cannot opt out, and any attempt to set the deviceUpdatable attribute to 0 will result in an error. Device-updatable kernel nodes also cannot have their attributes copied to/from another kernel node viacuGraphKernelNodeCopyAttributes
. Graphs containing one or more device-updatable nodes also do not allow multiple instantiation, and neither the graph nor its instantiated version can be passed tocuGraphExecUpdate
.If a graph contains device-updatable nodes and updates those nodes from the device from within the graph, the graph must be uploaded with
cuGraphUpload
before it is launched. For such a graph, if host-side executable graph updates are made to the device-updatable nodes, the graph must be uploaded before it is launched again.
- CU_LAUNCH_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT = 14#
Valid for launches. On devices where the L1 cache and shared memory use the same hardware resources, setting
sharedMemCarveout
to a percentage between 0-100 signals the CUDA driver to set the shared memory carveout preference, in percent of the total shared memory for that kernel launch. This attribute takes precedence overCU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT
. This is only a hint, and the CUDA driver can choose a different configuration if required for the launch.
- class cuda.cuda.CUstreamCaptureStatus(value)#
Possible stream capture statuses returned by
cuStreamIsCapturing
- CU_STREAM_CAPTURE_STATUS_NONE = 0#
Stream is not capturing
- CU_STREAM_CAPTURE_STATUS_ACTIVE = 1#
Stream is actively capturing
- CU_STREAM_CAPTURE_STATUS_INVALIDATED = 2#
Stream is part of a capture sequence that has been invalidated, but not terminated
- class cuda.cuda.CUstreamCaptureMode(value)#
Possible modes for stream capture thread interactions. For more details see
cuStreamBeginCapture
andcuThreadExchangeStreamCaptureMode
- CU_STREAM_CAPTURE_MODE_GLOBAL = 0#
- CU_STREAM_CAPTURE_MODE_THREAD_LOCAL = 1#
- CU_STREAM_CAPTURE_MODE_RELAXED = 2#
- class cuda.cuda.CUdriverProcAddress_flags(value)#
Flags to specify search options. For more details see
cuGetProcAddress
- CU_GET_PROC_ADDRESS_DEFAULT = 0#
Default search mode for driver symbols.
- CU_GET_PROC_ADDRESS_LEGACY_STREAM = 1#
Search for legacy versions of driver symbols.
- CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM = 2#
Search for per-thread versions of driver symbols.
- class cuda.cuda.CUdriverProcAddressQueryResult(value)#
Flags to indicate search status. For more details see
cuGetProcAddress
- CU_GET_PROC_ADDRESS_SUCCESS = 0#
Symbol was succesfully found
- CU_GET_PROC_ADDRESS_SYMBOL_NOT_FOUND = 1#
Symbol was not found in search
- CU_GET_PROC_ADDRESS_VERSION_NOT_SUFFICIENT = 2#
Symbol was found but version supplied was not sufficient
- class cuda.cuda.CUexecAffinityType(value)#
Execution Affinity Types
- CU_EXEC_AFFINITY_TYPE_SM_COUNT = 0#
Create a context with limited SMs.
- CU_EXEC_AFFINITY_TYPE_MAX = 1#
- class cuda.cuda.CUlibraryOption(value)#
Library options to be specified with
cuLibraryLoadData()
orcuLibraryLoadFromFile()
- CU_LIBRARY_HOST_UNIVERSAL_FUNCTION_AND_DATA_TABLE = 0#
- CU_LIBRARY_BINARY_IS_PRESERVED = 1#
Specifes that the argument code passed to
cuLibraryLoadData()
will be preserved. Specifying this option will let the driver know that code can be accessed at any point untilcuLibraryUnload()
. The default behavior is for the driver to allocate and maintain its own copy of code. Note that this is only a memory usage optimization hint and the driver can choose to ignore it if required. Specifying this option withcuLibraryLoadFromFile()
is invalid and will returnCUDA_ERROR_INVALID_VALUE
.
- CU_LIBRARY_NUM_OPTIONS = 2#
- class cuda.cuda.CUresult(value)#
Error codes
- CUDA_SUCCESS = 0#
The API call returned with no errors. In the case of query calls, this also means that the operation being queried is complete (see
cuEventQuery()
andcuStreamQuery()
).
- CUDA_ERROR_INVALID_VALUE = 1#
This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.
- CUDA_ERROR_OUT_OF_MEMORY = 2#
The API call failed because it was unable to allocate enough memory or other resources to perform the requested operation.
- CUDA_ERROR_NOT_INITIALIZED = 3#
This indicates that the CUDA driver has not been initialized with
cuInit()
or that initialization has failed.
- CUDA_ERROR_DEINITIALIZED = 4#
This indicates that the CUDA driver is in the process of shutting down.
- CUDA_ERROR_PROFILER_DISABLED = 5#
This indicates profiler is not initialized for this run. This can happen when the application is running with external profiling tools like visual profiler.
- CUDA_ERROR_PROFILER_NOT_INITIALIZED = 6#
[Deprecated]
- CUDA_ERROR_PROFILER_ALREADY_STARTED = 7#
[Deprecated]
- CUDA_ERROR_PROFILER_ALREADY_STOPPED = 8#
[Deprecated]
- CUDA_ERROR_STUB_LIBRARY = 34#
This indicates that the CUDA driver that the application has loaded is a stub library. Applications that run with the stub rather than a real driver loaded will result in CUDA API returning this error.
- CUDA_ERROR_DEVICE_UNAVAILABLE = 46#
This indicates that requested CUDA device is unavailable at the current time. Devices are often unavailable due to use of
CU_COMPUTEMODE_EXCLUSIVE_PROCESS
orCU_COMPUTEMODE_PROHIBITED
.
- CUDA_ERROR_NO_DEVICE = 100#
This indicates that no CUDA-capable devices were detected by the installed CUDA driver.
- CUDA_ERROR_INVALID_DEVICE = 101#
This indicates that the device ordinal supplied by the user does not correspond to a valid CUDA device or that the action requested is invalid for the specified device.
- CUDA_ERROR_DEVICE_NOT_LICENSED = 102#
This error indicates that the Grid license is not applied.
- CUDA_ERROR_INVALID_IMAGE = 200#
This indicates that the device kernel image is invalid. This can also indicate an invalid CUDA module.
- CUDA_ERROR_INVALID_CONTEXT = 201#
This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had
cuCtxDestroy()
invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). SeecuCtxGetApiVersion()
for more details. This can also be returned if the green context passed to an API call was not converted to aCUcontext
usingcuCtxFromGreenCtx
API.
- CUDA_ERROR_CONTEXT_ALREADY_CURRENT = 202#
This indicated that the context being supplied as a parameter to the API call was already the active context. [Deprecated]
- CUDA_ERROR_MAP_FAILED = 205#
This indicates that a map or register operation has failed.
- CUDA_ERROR_UNMAP_FAILED = 206#
This indicates that an unmap or unregister operation has failed.
- CUDA_ERROR_ARRAY_IS_MAPPED = 207#
This indicates that the specified array is currently mapped and thus cannot be destroyed.
- CUDA_ERROR_ALREADY_MAPPED = 208#
This indicates that the resource is already mapped.
- CUDA_ERROR_NO_BINARY_FOR_GPU = 209#
This indicates that there is no kernel image available that is suitable for the device. This can occur when a user specifies code generation options for a particular CUDA source file that do not include the corresponding device configuration.
- CUDA_ERROR_ALREADY_ACQUIRED = 210#
This indicates that a resource has already been acquired.
- CUDA_ERROR_NOT_MAPPED = 211#
This indicates that a resource is not mapped.
- CUDA_ERROR_NOT_MAPPED_AS_ARRAY = 212#
This indicates that a mapped resource is not available for access as an array.
- CUDA_ERROR_NOT_MAPPED_AS_POINTER = 213#
This indicates that a mapped resource is not available for access as a pointer.
- CUDA_ERROR_ECC_UNCORRECTABLE = 214#
This indicates that an uncorrectable ECC error was detected during execution.
- CUDA_ERROR_UNSUPPORTED_LIMIT = 215#
This indicates that the
CUlimit
passed to the API call is not supported by the active device.
- CUDA_ERROR_CONTEXT_ALREADY_IN_USE = 216#
This indicates that the
CUcontext
passed to the API call can only be bound to a single CPU thread at a time but is already bound to a CPU thread.
- CUDA_ERROR_PEER_ACCESS_UNSUPPORTED = 217#
This indicates that peer access is not supported across the given devices.
- CUDA_ERROR_INVALID_PTX = 218#
This indicates that a PTX JIT compilation failed.
- CUDA_ERROR_INVALID_GRAPHICS_CONTEXT = 219#
This indicates an error with OpenGL or DirectX context.
- CUDA_ERROR_NVLINK_UNCORRECTABLE = 220#
This indicates that an uncorrectable NVLink error was detected during the execution.
- CUDA_ERROR_JIT_COMPILER_NOT_FOUND = 221#
This indicates that the PTX JIT compiler library was not found.
- CUDA_ERROR_UNSUPPORTED_PTX_VERSION = 222#
This indicates that the provided PTX was compiled with an unsupported toolchain.
- CUDA_ERROR_JIT_COMPILATION_DISABLED = 223#
This indicates that the PTX JIT compilation was disabled.
- CUDA_ERROR_UNSUPPORTED_EXEC_AFFINITY = 224#
This indicates that the
CUexecAffinityType
passed to the API call is not supported by the active device.
- CUDA_ERROR_UNSUPPORTED_DEVSIDE_SYNC = 225#
This indicates that the code to be compiled by the PTX JIT contains unsupported call to cudaDeviceSynchronize.
- CUDA_ERROR_INVALID_SOURCE = 300#
This indicates that the device kernel source is invalid. This includes compilation/linker errors encountered in device code or user error.
- CUDA_ERROR_FILE_NOT_FOUND = 301#
This indicates that the file specified was not found.
- CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND = 302#
This indicates that a link to a shared object failed to resolve.
- CUDA_ERROR_SHARED_OBJECT_INIT_FAILED = 303#
This indicates that initialization of a shared object failed.
- CUDA_ERROR_OPERATING_SYSTEM = 304#
This indicates that an OS call failed.
- CUDA_ERROR_INVALID_HANDLE = 400#
This indicates that a resource handle passed to the API call was not valid. Resource handles are opaque types like
CUstream
andCUevent
.
- CUDA_ERROR_ILLEGAL_STATE = 401#
This indicates that a resource required by the API call is not in a valid state to perform the requested operation.
- CUDA_ERROR_LOSSY_QUERY = 402#
This indicates an attempt was made to introspect an object in a way that would discard semantically important information. This is either due to the object using funtionality newer than the API version used to introspect it or omission of optional return arguments.
- CUDA_ERROR_NOT_FOUND = 500#
This indicates that a named symbol was not found. Examples of symbols are global/constant variable names, driver function names, texture names, and surface names.
- CUDA_ERROR_NOT_READY = 600#
This indicates that asynchronous operations issued previously have not completed yet. This result is not actually an error, but must be indicated differently than
CUDA_SUCCESS
(which indicates completion). Calls that may return this value includecuEventQuery()
andcuStreamQuery()
.
- CUDA_ERROR_ILLEGAL_ADDRESS = 700#
While executing a kernel, the device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES = 701#
This indicates that a launch did not occur because it did not have appropriate resources. This error usually indicates that the user has attempted to pass too many arguments to the device kernel, or the kernel launch specifies too many threads for the kernel’s register count. Passing arguments of the wrong size (i.e. a 64-bit pointer when a 32-bit int is expected) is equivalent to passing too many arguments and can also result in this error.
- CUDA_ERROR_LAUNCH_TIMEOUT = 702#
This indicates that the device kernel took too long to execute. This can only occur if timeouts are enabled - see the device attribute
CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT
for more information. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING = 703#
This error indicates a kernel launch that uses an incompatible texturing mode.
- CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED = 704#
This error indicates that a call to
cuCtxEnablePeerAccess()
is trying to re-enable peer access to a context which has already had peer access to it enabled.
- CUDA_ERROR_PEER_ACCESS_NOT_ENABLED = 705#
This error indicates that
cuCtxDisablePeerAccess()
is trying to disable peer access which has not been enabled yet viacuCtxEnablePeerAccess()
.
- CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE = 708#
This error indicates that the primary context for the specified device has already been initialized.
- CUDA_ERROR_CONTEXT_IS_DESTROYED = 709#
This error indicates that the context current to the calling thread has been destroyed using
cuCtxDestroy
, or is a primary context which has not yet been initialized.
- CUDA_ERROR_ASSERT = 710#
A device-side assert triggered during kernel execution. The context cannot be used anymore, and must be destroyed. All existing device memory allocations from this context are invalid and must be reconstructed if the program is to continue using CUDA.
- CUDA_ERROR_TOO_MANY_PEERS = 711#
This error indicates that the hardware resources required to enable peer access have been exhausted for one or more of the devices passed to
cuCtxEnablePeerAccess()
.
- CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED = 712#
This error indicates that the memory range passed to
cuMemHostRegister()
has already been registered.
- CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED = 713#
This error indicates that the pointer passed to
cuMemHostUnregister()
does not correspond to any currently registered memory region.
- CUDA_ERROR_HARDWARE_STACK_ERROR = 714#
While executing a kernel, the device encountered a stack error. This can be due to stack corruption or exceeding the stack size limit. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_ILLEGAL_INSTRUCTION = 715#
While executing a kernel, the device encountered an illegal instruction. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_MISALIGNED_ADDRESS = 716#
While executing a kernel, the device encountered a load or store instruction on a memory address which is not aligned. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_INVALID_ADDRESS_SPACE = 717#
While executing a kernel, the device encountered an instruction which can only operate on memory locations in certain address spaces (global, shared, or local), but was supplied a memory address not belonging to an allowed address space. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_INVALID_PC = 718#
While executing a kernel, the device program counter wrapped its address space. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_LAUNCH_FAILED = 719#
An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases can be found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_COOPERATIVE_LAUNCH_TOO_LARGE = 720#
This error indicates that the number of blocks launched per grid for a kernel that was launched via either
cuLaunchCooperativeKernel
orcuLaunchCooperativeKernelMultiDevice
exceeds the maximum number of blocks as allowed bycuOccupancyMaxActiveBlocksPerMultiprocessor
orcuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags
times the number of multiprocessors as specified by the device attributeCU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT
.
- CUDA_ERROR_NOT_PERMITTED = 800#
This error indicates that the attempted operation is not permitted.
- CUDA_ERROR_NOT_SUPPORTED = 801#
This error indicates that the attempted operation is not supported on the current system or device.
- CUDA_ERROR_SYSTEM_NOT_READY = 802#
This error indicates that the system is not yet ready to start any CUDA work. To continue using CUDA, verify the system configuration is in a valid state and all required driver daemons are actively running. More information about this error can be found in the system specific user guide.
- CUDA_ERROR_SYSTEM_DRIVER_MISMATCH = 803#
This error indicates that there is a mismatch between the versions of the display driver and the CUDA driver. Refer to the compatibility documentation for supported versions.
- CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE = 804#
This error indicates that the system was upgraded to run with forward compatibility but the visible hardware detected by CUDA does not support this configuration. Refer to the compatibility documentation for the supported hardware matrix or ensure that only supported hardware is visible during initialization via the CUDA_VISIBLE_DEVICES environment variable.
- CUDA_ERROR_MPS_CONNECTION_FAILED = 805#
This error indicates that the MPS client failed to connect to the MPS control daemon or the MPS server.
- CUDA_ERROR_MPS_RPC_FAILURE = 806#
This error indicates that the remote procedural call between the MPS server and the MPS client failed.
- CUDA_ERROR_MPS_SERVER_NOT_READY = 807#
This error indicates that the MPS server is not ready to accept new MPS client requests. This error can be returned when the MPS server is in the process of recovering from a fatal failure.
- CUDA_ERROR_MPS_MAX_CLIENTS_REACHED = 808#
This error indicates that the hardware resources required to create MPS client have been exhausted.
- CUDA_ERROR_MPS_MAX_CONNECTIONS_REACHED = 809#
This error indicates the the hardware resources required to support device connections have been exhausted.
- CUDA_ERROR_MPS_CLIENT_TERMINATED = 810#
This error indicates that the MPS client has been terminated by the server. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_CDP_NOT_SUPPORTED = 811#
This error indicates that the module is using CUDA Dynamic Parallelism, but the current configuration, like MPS, does not support it.
- CUDA_ERROR_CDP_VERSION_MISMATCH = 812#
This error indicates that a module contains an unsupported interaction between different versions of CUDA Dynamic Parallelism.
- CUDA_ERROR_STREAM_CAPTURE_UNSUPPORTED = 900#
This error indicates that the operation is not permitted when the stream is capturing.
- CUDA_ERROR_STREAM_CAPTURE_INVALIDATED = 901#
This error indicates that the current capture sequence on the stream has been invalidated due to a previous error.
- CUDA_ERROR_STREAM_CAPTURE_MERGE = 902#
This error indicates that the operation would have resulted in a merge of two independent capture sequences.
- CUDA_ERROR_STREAM_CAPTURE_UNMATCHED = 903#
This error indicates that the capture was not initiated in this stream.
- CUDA_ERROR_STREAM_CAPTURE_UNJOINED = 904#
This error indicates that the capture sequence contains a fork that was not joined to the primary stream.
- CUDA_ERROR_STREAM_CAPTURE_ISOLATION = 905#
This error indicates that a dependency would have been created which crosses the capture sequence boundary. Only implicit in-stream ordering dependencies are allowed to cross the boundary.
- CUDA_ERROR_STREAM_CAPTURE_IMPLICIT = 906#
This error indicates a disallowed implicit dependency on a current capture sequence from cudaStreamLegacy.
- CUDA_ERROR_CAPTURED_EVENT = 907#
This error indicates that the operation is not permitted on an event which was last recorded in a capturing stream.
- CUDA_ERROR_STREAM_CAPTURE_WRONG_THREAD = 908#
A stream capture sequence not initiated with the
CU_STREAM_CAPTURE_MODE_RELAXED
argument tocuStreamBeginCapture
was passed tocuStreamEndCapture
in a different thread.
- CUDA_ERROR_TIMEOUT = 909#
This error indicates that the timeout specified for the wait operation has lapsed.
- CUDA_ERROR_GRAPH_EXEC_UPDATE_FAILURE = 910#
This error indicates that the graph update was not performed because it included changes which violated constraints specific to instantiated graph update.
- CUDA_ERROR_EXTERNAL_DEVICE = 911#
This indicates that an async error has occurred in a device outside of CUDA. If CUDA was waiting for an external device’s signal before consuming shared data, the external device signaled an error indicating that the data is not valid for consumption. This leaves the process in an inconsistent state and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched.
- CUDA_ERROR_INVALID_CLUSTER_SIZE = 912#
Indicates a kernel launch error due to cluster misconfiguration.
- CUDA_ERROR_FUNCTION_NOT_LOADED = 913#
Indiciates a function handle is not loaded when calling an API that requires a loaded function.
- CUDA_ERROR_INVALID_RESOURCE_TYPE = 914#
This error indicates one or more resources passed in are not valid resource types for the operation.
- CUDA_ERROR_INVALID_RESOURCE_CONFIGURATION = 915#
This error indicates one or more resources are insufficient or non-applicable for the operation.
- CUDA_ERROR_UNKNOWN = 999#
This indicates that an unknown internal error has occurred.
- class cuda.cuda.CUdevice_P2PAttribute(value)#
P2P Attributes
- CU_DEVICE_P2P_ATTRIBUTE_PERFORMANCE_RANK = 1#
A relative value indicating the performance of the link between two devices
- CU_DEVICE_P2P_ATTRIBUTE_ACCESS_SUPPORTED = 2#
P2P Access is enable
- CU_DEVICE_P2P_ATTRIBUTE_NATIVE_ATOMIC_SUPPORTED = 3#
Atomic operation over the link supported
- CU_DEVICE_P2P_ATTRIBUTE_ACCESS_ACCESS_SUPPORTED = 4#
[Deprecated]
- CU_DEVICE_P2P_ATTRIBUTE_CUDA_ARRAY_ACCESS_SUPPORTED = 4#
Accessing CUDA arrays over the link supported
- class cuda.cuda.CUresourceViewFormat(value)#
Resource view format
- CU_RES_VIEW_FORMAT_NONE = 0#
No resource view format (use underlying resource format)
- CU_RES_VIEW_FORMAT_UINT_1X8 = 1#
1 channel unsigned 8-bit integers
- CU_RES_VIEW_FORMAT_UINT_2X8 = 2#
2 channel unsigned 8-bit integers
- CU_RES_VIEW_FORMAT_UINT_4X8 = 3#
4 channel unsigned 8-bit integers
- CU_RES_VIEW_FORMAT_SINT_1X8 = 4#
1 channel signed 8-bit integers
- CU_RES_VIEW_FORMAT_SINT_2X8 = 5#
2 channel signed 8-bit integers
- CU_RES_VIEW_FORMAT_SINT_4X8 = 6#
4 channel signed 8-bit integers
- CU_RES_VIEW_FORMAT_UINT_1X16 = 7#
1 channel unsigned 16-bit integers
- CU_RES_VIEW_FORMAT_UINT_2X16 = 8#
2 channel unsigned 16-bit integers
- CU_RES_VIEW_FORMAT_UINT_4X16 = 9#
4 channel unsigned 16-bit integers
- CU_RES_VIEW_FORMAT_SINT_1X16 = 10#
1 channel signed 16-bit integers
- CU_RES_VIEW_FORMAT_SINT_2X16 = 11#
2 channel signed 16-bit integers
- CU_RES_VIEW_FORMAT_SINT_4X16 = 12#
4 channel signed 16-bit integers
- CU_RES_VIEW_FORMAT_UINT_1X32 = 13#
1 channel unsigned 32-bit integers
- CU_RES_VIEW_FORMAT_UINT_2X32 = 14#
2 channel unsigned 32-bit integers
- CU_RES_VIEW_FORMAT_UINT_4X32 = 15#
4 channel unsigned 32-bit integers
- CU_RES_VIEW_FORMAT_SINT_1X32 = 16#
1 channel signed 32-bit integers
- CU_RES_VIEW_FORMAT_SINT_2X32 = 17#
2 channel signed 32-bit integers
- CU_RES_VIEW_FORMAT_SINT_4X32 = 18#
4 channel signed 32-bit integers
- CU_RES_VIEW_FORMAT_FLOAT_1X16 = 19#
1 channel 16-bit floating point
- CU_RES_VIEW_FORMAT_FLOAT_2X16 = 20#
2 channel 16-bit floating point
- CU_RES_VIEW_FORMAT_FLOAT_4X16 = 21#
4 channel 16-bit floating point
- CU_RES_VIEW_FORMAT_FLOAT_1X32 = 22#
1 channel 32-bit floating point
- CU_RES_VIEW_FORMAT_FLOAT_2X32 = 23#
2 channel 32-bit floating point
- CU_RES_VIEW_FORMAT_FLOAT_4X32 = 24#
4 channel 32-bit floating point
- CU_RES_VIEW_FORMAT_UNSIGNED_BC1 = 25#
Block compressed 1
- CU_RES_VIEW_FORMAT_UNSIGNED_BC2 = 26#
Block compressed 2
- CU_RES_VIEW_FORMAT_UNSIGNED_BC3 = 27#
Block compressed 3
- CU_RES_VIEW_FORMAT_UNSIGNED_BC4 = 28#
Block compressed 4 unsigned
- CU_RES_VIEW_FORMAT_SIGNED_BC4 = 29#
Block compressed 4 signed
- CU_RES_VIEW_FORMAT_UNSIGNED_BC5 = 30#
Block compressed 5 unsigned
- CU_RES_VIEW_FORMAT_SIGNED_BC5 = 31#
Block compressed 5 signed
- CU_RES_VIEW_FORMAT_UNSIGNED_BC6H = 32#
Block compressed 6 unsigned half-float
- CU_RES_VIEW_FORMAT_SIGNED_BC6H = 33#
Block compressed 6 signed half-float
- CU_RES_VIEW_FORMAT_UNSIGNED_BC7 = 34#
Block compressed 7
- class cuda.cuda.CUtensorMapDataType(value)#
Tensor map data type
- CU_TENSOR_MAP_DATA_TYPE_UINT8 = 0#
- CU_TENSOR_MAP_DATA_TYPE_UINT16 = 1#
- CU_TENSOR_MAP_DATA_TYPE_UINT32 = 2#
- CU_TENSOR_MAP_DATA_TYPE_INT32 = 3#
- CU_TENSOR_MAP_DATA_TYPE_UINT64 = 4#
- CU_TENSOR_MAP_DATA_TYPE_INT64 = 5#
- CU_TENSOR_MAP_DATA_TYPE_FLOAT16 = 6#
- CU_TENSOR_MAP_DATA_TYPE_FLOAT32 = 7#
- CU_TENSOR_MAP_DATA_TYPE_FLOAT64 = 8#
- CU_TENSOR_MAP_DATA_TYPE_BFLOAT16 = 9#
- CU_TENSOR_MAP_DATA_TYPE_FLOAT32_FTZ = 10#
- CU_TENSOR_MAP_DATA_TYPE_TFLOAT32 = 11#
- CU_TENSOR_MAP_DATA_TYPE_TFLOAT32_FTZ = 12#
- class cuda.cuda.CUtensorMapInterleave(value)#
Tensor map interleave layout type
- CU_TENSOR_MAP_INTERLEAVE_NONE = 0#
- CU_TENSOR_MAP_INTERLEAVE_16B = 1#
- CU_TENSOR_MAP_INTERLEAVE_32B = 2#
- class cuda.cuda.CUtensorMapSwizzle(value)#
Tensor map swizzling mode of shared memory banks
- CU_TENSOR_MAP_SWIZZLE_NONE = 0#
- CU_TENSOR_MAP_SWIZZLE_32B = 1#
- CU_TENSOR_MAP_SWIZZLE_64B = 2#
- CU_TENSOR_MAP_SWIZZLE_128B = 3#
- class cuda.cuda.CUtensorMapL2promotion(value)#
Tensor map L2 promotion type
- CU_TENSOR_MAP_L2_PROMOTION_NONE = 0#
- CU_TENSOR_MAP_L2_PROMOTION_L2_64B = 1#
- CU_TENSOR_MAP_L2_PROMOTION_L2_128B = 2#
- CU_TENSOR_MAP_L2_PROMOTION_L2_256B = 3#
- class cuda.cuda.CUtensorMapFloatOOBfill(value)#
Tensor map out-of-bounds fill type
- CU_TENSOR_MAP_FLOAT_OOB_FILL_NONE = 0#
- CU_TENSOR_MAP_FLOAT_OOB_FILL_NAN_REQUEST_ZERO_FMA = 1#
- class cuda.cuda.CUDA_POINTER_ATTRIBUTE_ACCESS_FLAGS(value)#
Access flags that specify the level of access the current context’s device has on the memory referenced.
- CU_POINTER_ATTRIBUTE_ACCESS_FLAG_NONE = 0#
No access, meaning the device cannot access this memory at all, thus must be staged through accessible memory in order to complete certain operations
- CU_POINTER_ATTRIBUTE_ACCESS_FLAG_READ = 1#
Read-only access, meaning writes to this memory are considered invalid accesses and thus return error in that case.
- CU_POINTER_ATTRIBUTE_ACCESS_FLAG_READWRITE = 3#
Read-write access, the device has full read-write access to the memory
- class cuda.cuda.CUexternalMemoryHandleType(value)#
External memory handle types
- CU_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD = 1#
Handle is an opaque file descriptor
- CU_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32 = 2#
Handle is an opaque shared NT handle
- CU_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_WIN32_KMT = 3#
Handle is an opaque, globally shared handle
- CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_HEAP = 4#
Handle is a D3D12 heap object
- CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE = 5#
Handle is a D3D12 committed resource
- CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D11_RESOURCE = 6#
Handle is a shared NT handle to a D3D11 resource
- CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D11_RESOURCE_KMT = 7#
Handle is a globally shared handle to a D3D11 resource
- CU_EXTERNAL_MEMORY_HANDLE_TYPE_NVSCIBUF = 8#
Handle is an NvSciBuf object
- class cuda.cuda.CUexternalSemaphoreHandleType(value)#
External semaphore handle types
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_FD = 1#
Handle is an opaque file descriptor
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_WIN32 = 2#
Handle is an opaque shared NT handle
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_WIN32_KMT = 3#
Handle is an opaque, globally shared handle
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_D3D12_FENCE = 4#
Handle is a shared NT handle referencing a D3D12 fence object
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_D3D11_FENCE = 5#
Handle is a shared NT handle referencing a D3D11 fence object
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_NVSCISYNC = 6#
Opaque handle to NvSciSync Object
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_D3D11_KEYED_MUTEX = 7#
Handle is a shared NT handle referencing a D3D11 keyed mutex object
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_D3D11_KEYED_MUTEX_KMT = 8#
Handle is a globally shared handle referencing a D3D11 keyed mutex object
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_TIMELINE_SEMAPHORE_FD = 9#
Handle is an opaque file descriptor referencing a timeline semaphore
- CU_EXTERNAL_SEMAPHORE_HANDLE_TYPE_TIMELINE_SEMAPHORE_WIN32 = 10#
Handle is an opaque shared NT handle referencing a timeline semaphore
- class cuda.cuda.CUmemAllocationHandleType(value)#
Flags for specifying particular handle types
- CU_MEM_HANDLE_TYPE_NONE = 0#
Does not allow any export mechanism. >
- CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR = 1#
Allows a file descriptor to be used for exporting. Permitted only on POSIX systems. (int)
- CU_MEM_HANDLE_TYPE_WIN32 = 2#
Allows a Win32 NT handle to be used for exporting. (HANDLE)
- CU_MEM_HANDLE_TYPE_WIN32_KMT = 4#
Allows a Win32 KMT handle to be used for exporting. (D3DKMT_HANDLE)
- CU_MEM_HANDLE_TYPE_FABRIC = 8#
Allows a fabric handle to be used for exporting. (CUmemFabricHandle)
- CU_MEM_HANDLE_TYPE_MAX = 2147483647#
- class cuda.cuda.CUmemAccess_flags(value)#
Specifies the memory protection flags for mapping.
- CU_MEM_ACCESS_FLAGS_PROT_NONE = 0#
Default, make the address range not accessible
- CU_MEM_ACCESS_FLAGS_PROT_READ = 1#
Make the address range read accessible
- CU_MEM_ACCESS_FLAGS_PROT_READWRITE = 3#
Make the address range read-write accessible
- CU_MEM_ACCESS_FLAGS_PROT_MAX = 2147483647#
- class cuda.cuda.CUmemLocationType(value)#
Specifies the type of location
- CU_MEM_LOCATION_TYPE_INVALID = 0#
- CU_MEM_LOCATION_TYPE_DEVICE = 1#
Location is a device location, thus id is a device ordinal
- CU_MEM_LOCATION_TYPE_HOST = 2#
Location is host, id is ignored
- CU_MEM_LOCATION_TYPE_HOST_NUMA = 3#
Location is a host NUMA node, thus id is a host NUMA node id
- CU_MEM_LOCATION_TYPE_HOST_NUMA_CURRENT = 4#
Location is a host NUMA node of the current thread, id is ignored
- CU_MEM_LOCATION_TYPE_MAX = 2147483647#
- class cuda.cuda.CUmemAllocationType(value)#
Defines the allocation types available
- CU_MEM_ALLOCATION_TYPE_INVALID = 0#
- CU_MEM_ALLOCATION_TYPE_PINNED = 1#
This allocation type is ‘pinned’, i.e. cannot migrate from its current location while the application is actively using it
- CU_MEM_ALLOCATION_TYPE_MAX = 2147483647#
- class cuda.cuda.CUmemAllocationGranularity_flags(value)#
Flag for requesting different optimal and required granularities for an allocation.
- CU_MEM_ALLOC_GRANULARITY_MINIMUM = 0#
Minimum required granularity for allocation
- CU_MEM_ALLOC_GRANULARITY_RECOMMENDED = 1#
Recommended granularity for allocation for best performance
- class cuda.cuda.CUmemRangeHandleType(value)#
Specifies the handle type for address range
- CU_MEM_RANGE_HANDLE_TYPE_DMA_BUF_FD = 1#
- CU_MEM_RANGE_HANDLE_TYPE_MAX = 2147483647#
- class cuda.cuda.CUarraySparseSubresourceType(value)#
Sparse subresource types
- CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_SPARSE_LEVEL = 0#
- CU_ARRAY_SPARSE_SUBRESOURCE_TYPE_MIPTAIL = 1#
- class cuda.cuda.CUmemOperationType(value)#
Memory operation types
- CU_MEM_OPERATION_TYPE_MAP = 1#
- CU_MEM_OPERATION_TYPE_UNMAP = 2#
- class cuda.cuda.CUmemAllocationCompType(value)#
Specifies compression attribute for an allocation.
- CU_MEM_ALLOCATION_COMP_NONE = 0#
Allocating non-compressible memory
- CU_MEM_ALLOCATION_COMP_GENERIC = 1#
Allocating compressible memory
- class cuda.cuda.CUmulticastGranularity_flags(value)#
Flags for querying different granularities for a multicast object
- CU_MULTICAST_GRANULARITY_MINIMUM = 0#
Minimum required granularity
- CU_MULTICAST_GRANULARITY_RECOMMENDED = 1#
Recommended granularity for best performance
- class cuda.cuda.CUgraphExecUpdateResult(value)#
CUDA Graph Update error types
- CU_GRAPH_EXEC_UPDATE_SUCCESS = 0#
The update succeeded
- CU_GRAPH_EXEC_UPDATE_ERROR = 1#
The update failed for an unexpected reason which is described in the return value of the function
- CU_GRAPH_EXEC_UPDATE_ERROR_TOPOLOGY_CHANGED = 2#
The update failed because the topology changed
- CU_GRAPH_EXEC_UPDATE_ERROR_NODE_TYPE_CHANGED = 3#
The update failed because a node type changed
- CU_GRAPH_EXEC_UPDATE_ERROR_FUNCTION_CHANGED = 4#
The update failed because the function of a kernel node changed (CUDA driver < 11.2)
- CU_GRAPH_EXEC_UPDATE_ERROR_PARAMETERS_CHANGED = 5#
The update failed because the parameters changed in a way that is not supported
- CU_GRAPH_EXEC_UPDATE_ERROR_NOT_SUPPORTED = 6#
The update failed because something about the node is not supported
- CU_GRAPH_EXEC_UPDATE_ERROR_UNSUPPORTED_FUNCTION_CHANGE = 7#
The update failed because the function of a kernel node changed in an unsupported way
- CU_GRAPH_EXEC_UPDATE_ERROR_ATTRIBUTES_CHANGED = 8#
The update failed because the node attributes changed in a way that is not supported
- class cuda.cuda.CUmemPool_attribute(value)#
CUDA memory pool attributes
- CU_MEMPOOL_ATTR_REUSE_FOLLOW_EVENT_DEPENDENCIES = 1#
(value type = int) Allow cuMemAllocAsync to use memory asynchronously freed in another streams as long as a stream ordering dependency of the allocating stream on the free action exists. Cuda events and null stream interactions can create the required stream ordered dependencies. (default enabled)
- CU_MEMPOOL_ATTR_REUSE_ALLOW_OPPORTUNISTIC = 2#
(value type = int) Allow reuse of already completed frees when there is no dependency between the free and allocation. (default enabled)
- CU_MEMPOOL_ATTR_REUSE_ALLOW_INTERNAL_DEPENDENCIES = 3#
(value type = int) Allow cuMemAllocAsync to insert new stream dependencies in order to establish the stream ordering required to reuse a piece of memory released by cuFreeAsync (default enabled).
- CU_MEMPOOL_ATTR_RELEASE_THRESHOLD = 4#
(value type = cuuint64_t) Amount of reserved memory in bytes to hold onto before trying to release memory back to the OS. When more than the release threshold bytes of memory are held by the memory pool, the allocator will try to release memory back to the OS on the next call to stream, event or context synchronize. (default 0)
- CU_MEMPOOL_ATTR_RESERVED_MEM_CURRENT = 5#
(value type = cuuint64_t) Amount of backing memory currently allocated for the mempool.
- CU_MEMPOOL_ATTR_RESERVED_MEM_HIGH = 6#
(value type = cuuint64_t) High watermark of backing memory allocated for the mempool since the last time it was reset. High watermark can only be reset to zero.
- CU_MEMPOOL_ATTR_USED_MEM_CURRENT = 7#
(value type = cuuint64_t) Amount of memory from the pool that is currently in use by the application.
- CU_MEMPOOL_ATTR_USED_MEM_HIGH = 8#
(value type = cuuint64_t) High watermark of the amount of memory from the pool that was in use by the application since the last time it was reset. High watermark can only be reset to zero.
- class cuda.cuda.CUgraphMem_attribute(value)#
- CU_GRAPH_MEM_ATTR_USED_MEM_CURRENT = 0#
(value type = cuuint64_t) Amount of memory, in bytes, currently associated with graphs
- CU_GRAPH_MEM_ATTR_USED_MEM_HIGH = 1#
(value type = cuuint64_t) High watermark of memory, in bytes, associated with graphs since the last time it was reset. High watermark can only be reset to zero.
- CU_GRAPH_MEM_ATTR_RESERVED_MEM_CURRENT = 2#
(value type = cuuint64_t) Amount of memory, in bytes, currently allocated for use by the CUDA graphs asynchronous allocator.
- CU_GRAPH_MEM_ATTR_RESERVED_MEM_HIGH = 3#
(value type = cuuint64_t) High watermark of memory, in bytes, currently allocated for use by the CUDA graphs asynchronous allocator.
- class cuda.cuda.CUflushGPUDirectRDMAWritesOptions(value)#
Bitmasks for
CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_FLUSH_WRITES_OPTIONS
- CU_FLUSH_GPU_DIRECT_RDMA_WRITES_OPTION_HOST = 1#
cuFlushGPUDirectRDMAWrites()
and its CUDA Runtime API counterpart are supported on the device.
- CU_FLUSH_GPU_DIRECT_RDMA_WRITES_OPTION_MEMOPS = 2#
The
CU_STREAM_WAIT_VALUE_FLUSH
flag and theCU_STREAM_MEM_OP_FLUSH_REMOTE_WRITES
MemOp are supported on the device.
- class cuda.cuda.CUGPUDirectRDMAWritesOrdering(value)#
Platform native ordering for GPUDirect RDMA writes
- CU_GPU_DIRECT_RDMA_WRITES_ORDERING_NONE = 0#
The device does not natively support ordering of remote writes.
cuFlushGPUDirectRDMAWrites()
can be leveraged if supported.
- CU_GPU_DIRECT_RDMA_WRITES_ORDERING_OWNER = 100#
Natively, the device can consistently consume remote writes, although other CUDA devices may not.
- CU_GPU_DIRECT_RDMA_WRITES_ORDERING_ALL_DEVICES = 200#
Any CUDA device in the system can consistently consume remote writes to this device.
- class cuda.cuda.CUflushGPUDirectRDMAWritesScope(value)#
The scopes for
cuFlushGPUDirectRDMAWrites
- CU_FLUSH_GPU_DIRECT_RDMA_WRITES_TO_OWNER = 100#
Blocks until remote writes are visible to the CUDA device context owning the data.
- CU_FLUSH_GPU_DIRECT_RDMA_WRITES_TO_ALL_DEVICES = 200#
Blocks until remote writes are visible to all CUDA device contexts.
- class cuda.cuda.CUflushGPUDirectRDMAWritesTarget(value)#
The targets for
cuFlushGPUDirectRDMAWrites
- CU_FLUSH_GPU_DIRECT_RDMA_WRITES_TARGET_CURRENT_CTX = 0#
Sets the target for
cuFlushGPUDirectRDMAWrites()
to the currently active CUDA device context.
- class cuda.cuda.CUgraphDebugDot_flags(value)#
The additional write options for
cuGraphDebugDotPrint
- CU_GRAPH_DEBUG_DOT_FLAGS_VERBOSE = 1#
Output all debug data as if every debug flag is enabled
- CU_GRAPH_DEBUG_DOT_FLAGS_RUNTIME_TYPES = 2#
Use CUDA Runtime structures for output
- CU_GRAPH_DEBUG_DOT_FLAGS_KERNEL_NODE_PARAMS = 4#
Adds CUDA_KERNEL_NODE_PARAMS values to output
- CU_GRAPH_DEBUG_DOT_FLAGS_MEMCPY_NODE_PARAMS = 8#
Adds CUDA_MEMCPY3D values to output
- CU_GRAPH_DEBUG_DOT_FLAGS_MEMSET_NODE_PARAMS = 16#
Adds CUDA_MEMSET_NODE_PARAMS values to output
- CU_GRAPH_DEBUG_DOT_FLAGS_HOST_NODE_PARAMS = 32#
Adds CUDA_HOST_NODE_PARAMS values to output
- CU_GRAPH_DEBUG_DOT_FLAGS_EVENT_NODE_PARAMS = 64#
Adds CUevent handle from record and wait nodes to output
- CU_GRAPH_DEBUG_DOT_FLAGS_EXT_SEMAS_SIGNAL_NODE_PARAMS = 128#
Adds CUDA_EXT_SEM_SIGNAL_NODE_PARAMS values to output
- CU_GRAPH_DEBUG_DOT_FLAGS_EXT_SEMAS_WAIT_NODE_PARAMS = 256#
Adds CUDA_EXT_SEM_WAIT_NODE_PARAMS values to output
- CU_GRAPH_DEBUG_DOT_FLAGS_KERNEL_NODE_ATTRIBUTES = 512#
Adds CUkernelNodeAttrValue values to output
- CU_GRAPH_DEBUG_DOT_FLAGS_HANDLES = 1024#
Adds node handles and every kernel function handle to output
- CU_GRAPH_DEBUG_DOT_FLAGS_MEM_ALLOC_NODE_PARAMS = 2048#
Adds memory alloc node parameters to output
- CU_GRAPH_DEBUG_DOT_FLAGS_MEM_FREE_NODE_PARAMS = 4096#
Adds memory free node parameters to output
- CU_GRAPH_DEBUG_DOT_FLAGS_BATCH_MEM_OP_NODE_PARAMS = 8192#
Adds batch mem op node parameters to output
- CU_GRAPH_DEBUG_DOT_FLAGS_EXTRA_TOPO_INFO = 16384#
Adds edge numbering information
- CU_GRAPH_DEBUG_DOT_FLAGS_CONDITIONAL_NODE_PARAMS = 32768#
Adds conditional node parameters to output
- class cuda.cuda.CUuserObject_flags(value)#
Flags for user objects for graphs
- CU_USER_OBJECT_NO_DESTRUCTOR_SYNC = 1#
Indicates the destructor execution is not synchronized by any CUDA handle.
- class cuda.cuda.CUuserObjectRetain_flags(value)#
Flags for retaining user object references for graphs
- CU_GRAPH_USER_OBJECT_MOVE = 1#
Transfer references from the caller rather than creating new references.
- class cuda.cuda.CUgraphInstantiate_flags(value)#
Flags for instantiating a graph
- CUDA_GRAPH_INSTANTIATE_FLAG_AUTO_FREE_ON_LAUNCH = 1#
Automatically free memory allocated in a graph before relaunching.
- CUDA_GRAPH_INSTANTIATE_FLAG_UPLOAD = 2#
Automatically upload the graph after instantiation. Only supported by
cuGraphInstantiateWithParams
. The upload will be performed using the stream provided in instantiateParams.
- CUDA_GRAPH_INSTANTIATE_FLAG_DEVICE_LAUNCH = 4#
Instantiate the graph to be launchable from the device. This flag can only be used on platforms which support unified addressing. This flag cannot be used in conjunction with CUDA_GRAPH_INSTANTIATE_FLAG_AUTO_FREE_ON_LAUNCH.
- CUDA_GRAPH_INSTANTIATE_FLAG_USE_NODE_PRIORITY = 8#
Run the graph using the per-node priority attributes rather than the priority of the stream it is launched into.
- class cuda.cuda.CUdeviceNumaConfig(value)#
CUDA device NUMA configuration
- CU_DEVICE_NUMA_CONFIG_NONE = 0#
The GPU is not a NUMA node
- CU_DEVICE_NUMA_CONFIG_NUMA_NODE = 1#
The GPU is a NUMA node, CU_DEVICE_ATTRIBUTE_NUMA_ID contains its NUMA ID
- class cuda.cuda.CUeglFrameType(value)#
CUDA EglFrame type - array or pointer
- CU_EGL_FRAME_TYPE_ARRAY = 0#
Frame type CUDA array
- CU_EGL_FRAME_TYPE_PITCH = 1#
Frame type pointer
- class cuda.cuda.CUeglResourceLocationFlags(value)#
Resource location flags- sysmem or vidmem For CUDA context on iGPU, since video and system memory are equivalent - these flags will not have an effect on the execution. For CUDA context on dGPU, applications can use the flag
CUeglResourceLocationFlags
to give a hint about the desired location.CU_EGL_RESOURCE_LOCATION_SYSMEM
- the frame data is made resident on the system memory to be accessed by CUDA.CU_EGL_RESOURCE_LOCATION_VIDMEM
- the frame data is made resident on the dedicated video memory to be accessed by CUDA. There may be an additional latency due to new allocation and data migration, if the frame is produced on a different memory.- CU_EGL_RESOURCE_LOCATION_SYSMEM = 0#
Resource location sysmem
- CU_EGL_RESOURCE_LOCATION_VIDMEM = 1#
Resource location vidmem
- class cuda.cuda.CUeglColorFormat(value)#
CUDA EGL Color Format - The different planar and multiplanar formats currently supported for CUDA_EGL interops. Three channel formats are currently not supported for
CU_EGL_FRAME_TYPE_ARRAY
- CU_EGL_COLOR_FORMAT_YUV420_PLANAR = 0#
Y, U, V in three surfaces, each in a separate surface, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR = 1#
Y, UV in two surfaces (UV as one surface) with VU byte ordering, width, height ratio same as YUV420Planar.
- CU_EGL_COLOR_FORMAT_YUV422_PLANAR = 2#
Y, U, V each in a separate surface, U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YUV422_SEMIPLANAR = 3#
Y, UV in two surfaces with VU byte ordering, width, height ratio same as YUV422Planar.
- CU_EGL_COLOR_FORMAT_RGB = 4#
R/G/B three channels in one surface with BGR byte ordering. Only pitch linear format supported.
- CU_EGL_COLOR_FORMAT_BGR = 5#
R/G/B three channels in one surface with RGB byte ordering. Only pitch linear format supported.
- CU_EGL_COLOR_FORMAT_ARGB = 6#
R/G/B/A four channels in one surface with BGRA byte ordering.
- CU_EGL_COLOR_FORMAT_RGBA = 7#
R/G/B/A four channels in one surface with ABGR byte ordering.
- CU_EGL_COLOR_FORMAT_L = 8#
single luminance channel in one surface.
- CU_EGL_COLOR_FORMAT_R = 9#
single color channel in one surface.
- CU_EGL_COLOR_FORMAT_YUV444_PLANAR = 10#
Y, U, V in three surfaces, each in a separate surface, U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YUV444_SEMIPLANAR = 11#
Y, UV in two surfaces (UV as one surface) with VU byte ordering, width, height ratio same as YUV444Planar.
- CU_EGL_COLOR_FORMAT_YUYV_422 = 12#
Y, U, V in one surface, interleaved as UYVY in one channel.
- CU_EGL_COLOR_FORMAT_UYVY_422 = 13#
Y, U, V in one surface, interleaved as YUYV in one channel.
- CU_EGL_COLOR_FORMAT_ABGR = 14#
R/G/B/A four channels in one surface with RGBA byte ordering.
- CU_EGL_COLOR_FORMAT_BGRA = 15#
R/G/B/A four channels in one surface with ARGB byte ordering.
- CU_EGL_COLOR_FORMAT_A = 16#
Alpha color format - one channel in one surface.
- CU_EGL_COLOR_FORMAT_RG = 17#
R/G color format - two channels in one surface with GR byte ordering
- CU_EGL_COLOR_FORMAT_AYUV = 18#
Y, U, V, A four channels in one surface, interleaved as VUYA.
- CU_EGL_COLOR_FORMAT_YVU444_SEMIPLANAR = 19#
Y, VU in two surfaces (VU as one surface) with UV byte ordering, U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YVU422_SEMIPLANAR = 20#
Y, VU in two surfaces (VU as one surface) with UV byte ordering, U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YVU420_SEMIPLANAR = 21#
Y, VU in two surfaces (VU as one surface) with UV byte ordering, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_444_SEMIPLANAR = 22#
Y10, V10U10 in two surfaces (VU as one surface) with UV byte ordering, U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_420_SEMIPLANAR = 23#
Y10, V10U10 in two surfaces (VU as one surface) with UV byte ordering, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_Y12V12U12_444_SEMIPLANAR = 24#
Y12, V12U12 in two surfaces (VU as one surface) with UV byte ordering, U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_Y12V12U12_420_SEMIPLANAR = 25#
Y12, V12U12 in two surfaces (VU as one surface) with UV byte ordering, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_VYUY_ER = 26#
Extended Range Y, U, V in one surface, interleaved as YVYU in one channel.
- CU_EGL_COLOR_FORMAT_UYVY_ER = 27#
Extended Range Y, U, V in one surface, interleaved as YUYV in one channel.
- CU_EGL_COLOR_FORMAT_YUYV_ER = 28#
Extended Range Y, U, V in one surface, interleaved as UYVY in one channel.
- CU_EGL_COLOR_FORMAT_YVYU_ER = 29#
Extended Range Y, U, V in one surface, interleaved as VYUY in one channel.
- CU_EGL_COLOR_FORMAT_YUV_ER = 30#
Extended Range Y, U, V three channels in one surface, interleaved as VUY. Only pitch linear format supported.
- CU_EGL_COLOR_FORMAT_YUVA_ER = 31#
Extended Range Y, U, V, A four channels in one surface, interleaved as AVUY.
- CU_EGL_COLOR_FORMAT_AYUV_ER = 32#
Extended Range Y, U, V, A four channels in one surface, interleaved as VUYA.
- CU_EGL_COLOR_FORMAT_YUV444_PLANAR_ER = 33#
Extended Range Y, U, V in three surfaces, U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YUV422_PLANAR_ER = 34#
Extended Range Y, U, V in three surfaces, U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YUV420_PLANAR_ER = 35#
Extended Range Y, U, V in three surfaces, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YUV444_SEMIPLANAR_ER = 36#
Extended Range Y, UV in two surfaces (UV as one surface) with VU byte ordering, U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YUV422_SEMIPLANAR_ER = 37#
Extended Range Y, UV in two surfaces (UV as one surface) with VU byte ordering, U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR_ER = 38#
Extended Range Y, UV in two surfaces (UV as one surface) with VU byte ordering, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YVU444_PLANAR_ER = 39#
Extended Range Y, V, U in three surfaces, U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YVU422_PLANAR_ER = 40#
Extended Range Y, V, U in three surfaces, U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YVU420_PLANAR_ER = 41#
Extended Range Y, V, U in three surfaces, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YVU444_SEMIPLANAR_ER = 42#
Extended Range Y, VU in two surfaces (VU as one surface) with UV byte ordering, U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YVU422_SEMIPLANAR_ER = 43#
Extended Range Y, VU in two surfaces (VU as one surface) with UV byte ordering, U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YVU420_SEMIPLANAR_ER = 44#
Extended Range Y, VU in two surfaces (VU as one surface) with UV byte ordering, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_BAYER_RGGB = 45#
Bayer format - one channel in one surface with interleaved RGGB ordering.
- CU_EGL_COLOR_FORMAT_BAYER_BGGR = 46#
Bayer format - one channel in one surface with interleaved BGGR ordering.
- CU_EGL_COLOR_FORMAT_BAYER_GRBG = 47#
Bayer format - one channel in one surface with interleaved GRBG ordering.
- CU_EGL_COLOR_FORMAT_BAYER_GBRG = 48#
Bayer format - one channel in one surface with interleaved GBRG ordering.
- CU_EGL_COLOR_FORMAT_BAYER10_RGGB = 49#
Bayer10 format - one channel in one surface with interleaved RGGB ordering. Out of 16 bits, 10 bits used 6 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER10_BGGR = 50#
Bayer10 format - one channel in one surface with interleaved BGGR ordering. Out of 16 bits, 10 bits used 6 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER10_GRBG = 51#
Bayer10 format - one channel in one surface with interleaved GRBG ordering. Out of 16 bits, 10 bits used 6 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER10_GBRG = 52#
Bayer10 format - one channel in one surface with interleaved GBRG ordering. Out of 16 bits, 10 bits used 6 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER12_RGGB = 53#
Bayer12 format - one channel in one surface with interleaved RGGB ordering. Out of 16 bits, 12 bits used 4 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER12_BGGR = 54#
Bayer12 format - one channel in one surface with interleaved BGGR ordering. Out of 16 bits, 12 bits used 4 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER12_GRBG = 55#
Bayer12 format - one channel in one surface with interleaved GRBG ordering. Out of 16 bits, 12 bits used 4 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER12_GBRG = 56#
Bayer12 format - one channel in one surface with interleaved GBRG ordering. Out of 16 bits, 12 bits used 4 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER14_RGGB = 57#
Bayer14 format - one channel in one surface with interleaved RGGB ordering. Out of 16 bits, 14 bits used 2 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER14_BGGR = 58#
Bayer14 format - one channel in one surface with interleaved BGGR ordering. Out of 16 bits, 14 bits used 2 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER14_GRBG = 59#
Bayer14 format - one channel in one surface with interleaved GRBG ordering. Out of 16 bits, 14 bits used 2 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER14_GBRG = 60#
Bayer14 format - one channel in one surface with interleaved GBRG ordering. Out of 16 bits, 14 bits used 2 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER20_RGGB = 61#
Bayer20 format - one channel in one surface with interleaved RGGB ordering. Out of 32 bits, 20 bits used 12 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER20_BGGR = 62#
Bayer20 format - one channel in one surface with interleaved BGGR ordering. Out of 32 bits, 20 bits used 12 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER20_GRBG = 63#
Bayer20 format - one channel in one surface with interleaved GRBG ordering. Out of 32 bits, 20 bits used 12 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER20_GBRG = 64#
Bayer20 format - one channel in one surface with interleaved GBRG ordering. Out of 32 bits, 20 bits used 12 bits No-op.
- CU_EGL_COLOR_FORMAT_YVU444_PLANAR = 65#
Y, V, U in three surfaces, each in a separate surface, U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YVU422_PLANAR = 66#
Y, V, U in three surfaces, each in a separate surface, U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_YVU420_PLANAR = 67#
Y, V, U in three surfaces, each in a separate surface, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_BAYER_ISP_RGGB = 68#
Nvidia proprietary Bayer ISP format - one channel in one surface with interleaved RGGB ordering and mapped to opaque integer datatype.
- CU_EGL_COLOR_FORMAT_BAYER_ISP_BGGR = 69#
Nvidia proprietary Bayer ISP format - one channel in one surface with interleaved BGGR ordering and mapped to opaque integer datatype.
- CU_EGL_COLOR_FORMAT_BAYER_ISP_GRBG = 70#
Nvidia proprietary Bayer ISP format - one channel in one surface with interleaved GRBG ordering and mapped to opaque integer datatype.
- CU_EGL_COLOR_FORMAT_BAYER_ISP_GBRG = 71#
Nvidia proprietary Bayer ISP format - one channel in one surface with interleaved GBRG ordering and mapped to opaque integer datatype.
- CU_EGL_COLOR_FORMAT_BAYER_BCCR = 72#
Bayer format - one channel in one surface with interleaved BCCR ordering.
- CU_EGL_COLOR_FORMAT_BAYER_RCCB = 73#
Bayer format - one channel in one surface with interleaved RCCB ordering.
- CU_EGL_COLOR_FORMAT_BAYER_CRBC = 74#
Bayer format - one channel in one surface with interleaved CRBC ordering.
- CU_EGL_COLOR_FORMAT_BAYER_CBRC = 75#
Bayer format - one channel in one surface with interleaved CBRC ordering.
- CU_EGL_COLOR_FORMAT_BAYER10_CCCC = 76#
Bayer10 format - one channel in one surface with interleaved CCCC ordering. Out of 16 bits, 10 bits used 6 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER12_BCCR = 77#
Bayer12 format - one channel in one surface with interleaved BCCR ordering. Out of 16 bits, 12 bits used 4 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER12_RCCB = 78#
Bayer12 format - one channel in one surface with interleaved RCCB ordering. Out of 16 bits, 12 bits used 4 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER12_CRBC = 79#
Bayer12 format - one channel in one surface with interleaved CRBC ordering. Out of 16 bits, 12 bits used 4 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER12_CBRC = 80#
Bayer12 format - one channel in one surface with interleaved CBRC ordering. Out of 16 bits, 12 bits used 4 bits No-op.
- CU_EGL_COLOR_FORMAT_BAYER12_CCCC = 81#
Bayer12 format - one channel in one surface with interleaved CCCC ordering. Out of 16 bits, 12 bits used 4 bits No-op.
- CU_EGL_COLOR_FORMAT_Y = 82#
Color format for single Y plane.
- CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR_2020 = 83#
Y, UV in two surfaces (UV as one surface) U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YVU420_SEMIPLANAR_2020 = 84#
Y, VU in two surfaces (VU as one surface) U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YUV420_PLANAR_2020 = 85#
Y, U, V each in a separate surface, U/V width = 1/2 Y width, U/V height= 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YVU420_PLANAR_2020 = 86#
Y, V, U each in a separate surface, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR_709 = 87#
Y, UV in two surfaces (UV as one surface) U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YVU420_SEMIPLANAR_709 = 88#
Y, VU in two surfaces (VU as one surface) U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YUV420_PLANAR_709 = 89#
Y, U, V each in a separate surface, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_YVU420_PLANAR_709 = 90#
Y, V, U each in a separate surface, U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_420_SEMIPLANAR_709 = 91#
Y10, V10U10 in two surfaces (VU as one surface), U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_420_SEMIPLANAR_2020 = 92#
Y10, V10U10 in two surfaces (VU as one surface), U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_422_SEMIPLANAR_2020 = 93#
Y10, V10U10 in two surfaces(VU as one surface) U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_422_SEMIPLANAR = 94#
Y10, V10U10 in two surfaces(VU as one surface) U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_422_SEMIPLANAR_709 = 95#
Y10, V10U10 in two surfaces(VU as one surface) U/V width = 1/2 Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_Y_ER = 96#
Extended Range Color format for single Y plane.
- CU_EGL_COLOR_FORMAT_Y_709_ER = 97#
Extended Range Color format for single Y plane.
- CU_EGL_COLOR_FORMAT_Y10_ER = 98#
Extended Range Color format for single Y10 plane.
- CU_EGL_COLOR_FORMAT_Y10_709_ER = 99#
Extended Range Color format for single Y10 plane.
- CU_EGL_COLOR_FORMAT_Y12_ER = 100#
Extended Range Color format for single Y12 plane.
- CU_EGL_COLOR_FORMAT_Y12_709_ER = 101#
Extended Range Color format for single Y12 plane.
- CU_EGL_COLOR_FORMAT_YUVA = 102#
Y, U, V, A four channels in one surface, interleaved as AVUY.
- CU_EGL_COLOR_FORMAT_YUV = 103#
Y, U, V three channels in one surface, interleaved as VUY. Only pitch linear format supported.
- CU_EGL_COLOR_FORMAT_YVYU = 104#
Y, U, V in one surface, interleaved as YVYU in one channel.
- CU_EGL_COLOR_FORMAT_VYUY = 105#
Y, U, V in one surface, interleaved as VYUY in one channel.
- CU_EGL_COLOR_FORMAT_Y10V10U10_420_SEMIPLANAR_ER = 106#
Extended Range Y10, V10U10 in two surfaces(VU as one surface) U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_420_SEMIPLANAR_709_ER = 107#
Extended Range Y10, V10U10 in two surfaces(VU as one surface) U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_444_SEMIPLANAR_ER = 108#
Extended Range Y10, V10U10 in two surfaces (VU as one surface) U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_Y10V10U10_444_SEMIPLANAR_709_ER = 109#
Extended Range Y10, V10U10 in two surfaces (VU as one surface) U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_Y12V12U12_420_SEMIPLANAR_ER = 110#
Extended Range Y12, V12U12 in two surfaces (VU as one surface) U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_Y12V12U12_420_SEMIPLANAR_709_ER = 111#
Extended Range Y12, V12U12 in two surfaces (VU as one surface) U/V width = 1/2 Y width, U/V height = 1/2 Y height.
- CU_EGL_COLOR_FORMAT_Y12V12U12_444_SEMIPLANAR_ER = 112#
Extended Range Y12, V12U12 in two surfaces (VU as one surface) U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_Y12V12U12_444_SEMIPLANAR_709_ER = 113#
Extended Range Y12, V12U12 in two surfaces (VU as one surface) U/V width = Y width, U/V height = Y height.
- CU_EGL_COLOR_FORMAT_MAX = 114#
- class cuda.cuda.CUdeviceptr_v2#
CUDA device pointer CUdeviceptr is defined as an unsigned integer type whose size matches the size of a pointer on the target platform.
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUdeviceptr#
CUDA device pointer CUdeviceptr is defined as an unsigned integer type whose size matches the size of a pointer on the target platform.
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUcontext(*args, **kwargs)#
A regular context handle
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmodule(*args, **kwargs)#
CUDA module
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUfunction(*args, **kwargs)#
CUDA function
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUlibrary(*args, **kwargs)#
CUDA library
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUkernel(*args, **kwargs)#
CUDA kernel
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmipmappedArray(*args, **kwargs)#
CUDA mipmapped array
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUtexref(*args, **kwargs)#
CUDA texture reference
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUsurfref(*args, **kwargs)#
CUDA surface reference
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUstream(*args, **kwargs)#
CUDA stream
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUgraphicsResource(*args, **kwargs)#
CUDA graphics interop resource
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUtexObject_v1#
An opaque value that represents a CUDA texture object
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUtexObject#
An opaque value that represents a CUDA texture object
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUsurfObject_v1#
An opaque value that represents a CUDA surface object
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUsurfObject#
An opaque value that represents a CUDA surface object
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUexternalMemory(*args, **kwargs)#
CUDA external memory
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUexternalSemaphore(*args, **kwargs)#
CUDA external semaphore
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUgraphNode(*args, **kwargs)#
CUDA graph node
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUgraphExec(*args, **kwargs)#
CUDA executable graph
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmemoryPool(*args, **kwargs)#
CUDA memory pool
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUuserObject(*args, **kwargs)#
CUDA user object for graphs
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUgraphDeviceNode(*args, **kwargs)#
CUDA graph device node handle
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUasyncCallbackHandle(*args, **kwargs)#
CUDA async notification callback handle
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUgreenCtx(*args, **kwargs)#
A green context handle. This handle can be used safely from only one CPU thread at a time. Created via cuGreenCtxCreate
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUuuid#
- bytes#
< CUDA definition of UUID
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmemFabricHandle_v1#
Fabric handle - An opaque handle representing a memory allocation that can be exported to processes in same or different nodes. For IPC between processes on different nodes they must be connected via the NVSwitch fabric.
- data#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUmemFabricHandle#
Fabric handle - An opaque handle representing a memory allocation that can be exported to processes in same or different nodes. For IPC between processes on different nodes they must be connected via the NVSwitch fabric.
- data#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUipcEventHandle_v1#
CUDA IPC event handle
- reserved#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUipcEventHandle#
CUDA IPC event handle
- reserved#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUipcMemHandle_v1#
CUDA IPC mem handle
- reserved#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUipcMemHandle#
CUDA IPC mem handle
- reserved#
- Type:
bytes
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUstreamBatchMemOpParams_v1#
Per-operation parameters for cuStreamBatchMemOp
- operation#
- Type:
- waitValue#
- Type:
CUstreamMemOpWaitValueParams_st
- writeValue#
- Type:
CUstreamMemOpWriteValueParams_st
- flushRemoteWrites#
- Type:
CUstreamMemOpFlushRemoteWritesParams_st
- memoryBarrier#
- Type:
CUstreamMemOpMemoryBarrierParams_st
- pad#
- Type:
List[cuuint64_t]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUstreamBatchMemOpParams#
Per-operation parameters for cuStreamBatchMemOp
- operation#
- Type:
- waitValue#
- Type:
CUstreamMemOpWaitValueParams_st
- writeValue#
- Type:
CUstreamMemOpWriteValueParams_st
- flushRemoteWrites#
- Type:
CUstreamMemOpFlushRemoteWritesParams_st
- memoryBarrier#
- Type:
CUstreamMemOpMemoryBarrierParams_st
- pad#
- Type:
List[cuuint64_t]
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_BATCH_MEM_OP_NODE_PARAMS_v1#
-
- count#
- Type:
unsigned int
- paramArray#
- Type:
- flags#
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_BATCH_MEM_OP_NODE_PARAMS#
-
- count#
- Type:
unsigned int
- paramArray#
- Type:
- flags#
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_BATCH_MEM_OP_NODE_PARAMS_v2#
Batch memory operation node parameters
- count#
Number of operations in paramArray.
- Type:
unsigned int
- paramArray#
Array of batch memory operations.
- Type:
- flags#
Flags to control the node.
- Type:
unsigned int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUasyncNotificationInfo#
Information passed to the user via the async notification callback
- type#
- Type:
- info#
- Type:
anon_union2
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUdevprop_v1#
Legacy device properties
- maxThreadsPerBlock#
Maximum number of threads per block
- Type:
int
- maxThreadsDim#
Maximum size of each dimension of a block
- Type:
List[int]
- maxGridSize#
Maximum size of each dimension of a grid
- Type:
List[int]
Shared memory available per block in bytes
- Type:
int
- totalConstantMemory#
Constant memory available on device in bytes
- Type:
int
- SIMDWidth#
Warp size in threads
- Type:
int
- memPitch#
Maximum pitch in bytes allowed by memory copies
- Type:
int
- regsPerBlock#
32-bit registers available per block
- Type:
int
- clockRate#
Clock frequency in kilohertz
- Type:
int
- textureAlign#
Alignment requirement for textures
- Type:
int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUdevprop#
Legacy device properties
- maxThreadsPerBlock#
Maximum number of threads per block
- Type:
int
- maxThreadsDim#
Maximum size of each dimension of a block
- Type:
List[int]
- maxGridSize#
Maximum size of each dimension of a grid
- Type:
List[int]
Shared memory available per block in bytes
- Type:
int
- totalConstantMemory#
Constant memory available on device in bytes
- Type:
int
- SIMDWidth#
Warp size in threads
- Type:
int
- memPitch#
Maximum pitch in bytes allowed by memory copies
- Type:
int
- regsPerBlock#
32-bit registers available per block
- Type:
int
- clockRate#
Clock frequency in kilohertz
- Type:
int
- textureAlign#
Alignment requirement for textures
- Type:
int
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUaccessPolicyWindow_v1#
Specifies an access policy for a window, a contiguous extent of memory beginning at base_ptr and ending at base_ptr + num_bytes. num_bytes is limited by CU_DEVICE_ATTRIBUTE_MAX_ACCESS_POLICY_WINDOW_SIZE. Partition into many segments and assign segments such that: sum of “hit segments” / window == approx. ratio. sum of “miss segments” / window == approx 1-ratio. Segments and ratio specifications are fitted to the capabilities of the architecture. Accesses in a hit segment apply the hitProp access policy. Accesses in a miss segment apply the missProp access policy.
- base_ptr#
Starting address of the access policy window. CUDA driver may align it.
- Type:
Any
- num_bytes#
Size in bytes of the window policy. CUDA driver may restrict the maximum size and alignment.
- Type:
size_t
- hitRatio#
hitRatio specifies percentage of lines assigned hitProp, rest are assigned missProp.
- Type:
float
- hitProp#
CUaccessProperty set for hit.
- Type:
- missProp#
CUaccessProperty set for miss. Must be either NORMAL or STREAMING
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUaccessPolicyWindow#
Specifies an access policy for a window, a contiguous extent of memory beginning at base_ptr and ending at base_ptr + num_bytes. num_bytes is limited by CU_DEVICE_ATTRIBUTE_MAX_ACCESS_POLICY_WINDOW_SIZE. Partition into many segments and assign segments such that: sum of “hit segments” / window == approx. ratio. sum of “miss segments” / window == approx 1-ratio. Segments and ratio specifications are fitted to the capabilities of the architecture. Accesses in a hit segment apply the hitProp access policy. Accesses in a miss segment apply the missProp access policy.
- base_ptr#
Starting address of the access policy window. CUDA driver may align it.
- Type:
Any
- num_bytes#
Size in bytes of the window policy. CUDA driver may restrict the maximum size and alignment.
- Type:
size_t
- hitRatio#
hitRatio specifies percentage of lines assigned hitProp, rest are assigned missProp.
- Type:
float
- hitProp#
CUaccessProperty set for hit.
- Type:
- missProp#
CUaccessProperty set for miss. Must be either NORMAL or STREAMING
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_KERNEL_NODE_PARAMS_v1#
GPU kernel node parameters
- func#
Kernel to launch
- Type:
- gridDimX#
Width of grid in blocks
- Type:
unsigned int
- gridDimY#
Height of grid in blocks
- Type:
unsigned int
- gridDimZ#
Depth of grid in blocks
- Type:
unsigned int
- blockDimX#
X dimension of each thread block
- Type:
unsigned int
- blockDimY#
Y dimension of each thread block
- Type:
unsigned int
- blockDimZ#
Z dimension of each thread block
- Type:
unsigned int
Dynamic shared-memory size per thread block in bytes
- Type:
unsigned int
- kernelParams#
Array of pointers to kernel parameters
- Type:
Any
- extra#
Extra options
- Type:
Any
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_KERNEL_NODE_PARAMS_v2#
GPU kernel node parameters
- func#
Kernel to launch
- Type:
- gridDimX#
Width of grid in blocks
- Type:
unsigned int
- gridDimY#
Height of grid in blocks
- Type:
unsigned int
- gridDimZ#
Depth of grid in blocks
- Type:
unsigned int
- blockDimX#
X dimension of each thread block
- Type:
unsigned int
- blockDimY#
Y dimension of each thread block
- Type:
unsigned int
- blockDimZ#
Z dimension of each thread block
- Type:
unsigned int
Dynamic shared-memory size per thread block in bytes
- Type:
unsigned int
- kernelParams#
Array of pointers to kernel parameters
- Type:
Any
- extra#
Extra options
- Type:
Any
- ctx#
Context for the kernel task to run in. The value NULL will indicate the current context should be used by the api. This field is ignored if func is set.
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_KERNEL_NODE_PARAMS#
GPU kernel node parameters
- func#
Kernel to launch
- Type:
- gridDimX#
Width of grid in blocks
- Type:
unsigned int
- gridDimY#
Height of grid in blocks
- Type:
unsigned int
- gridDimZ#
Depth of grid in blocks
- Type:
unsigned int
- blockDimX#
X dimension of each thread block
- Type:
unsigned int
- blockDimY#
Y dimension of each thread block
- Type:
unsigned int
- blockDimZ#
Z dimension of each thread block
- Type:
unsigned int
Dynamic shared-memory size per thread block in bytes
- Type:
unsigned int
- kernelParams#
Array of pointers to kernel parameters
- Type:
Any
- extra#
Extra options
- Type:
Any
- ctx#
Context for the kernel task to run in. The value NULL will indicate the current context should be used by the api. This field is ignored if func is set.
- Type:
- getPtr()#
Get memory address of class instance
- class cuda.cuda.CUDA_KERNEL_NODE_PARAMS_v3#
GPU kernel node parameters
- func#
Kernel to launch
- Type:
- gridDimX#
Width of grid in blocks
- Type:
unsigned int
- gridDimY#
Height of grid in blocks
- Type:
unsigned int
- gridDimZ#
Depth of grid in blocks
- Type:
unsigned int
- blockDimX#
X dimension of each thread block
- Type:
unsigned int