cuda.core.system.Device#

Representation of a device.

cuda.core.system.Device provides access to various pieces of metadata about devices and their topology, as provided by the NVIDIA Management Library (NVML). To use CUDA with a device, use cuda.core.Device.

Creating a device instance causes NVML to initialize the target GPU. NVML may initialize additional GPUs if the target GPU is an SLI slave.

Parameters:

index (int, optional) –
Integer representing the CUDA device index to get a handle to. Valid values are between 0 and cuda.core.system.get_num_devices() - 1.

The order in which devices are enumerated has no guarantees of consistency between reboots. For that reason, it is recommended that devices are looked up by their PCI ids or UUID.
uuid (bytes or str, optional) – UUID of a CUDA device to get a handle to.
pci_bus_id (bytes or str, optional) – PCI bus ID of a CUDA device to get a handle to.

Raises:

ValueError – If anything other than a single index, uuid or pci_bus_id are specified.

Methods

__init__(*args, **kwargs)#

clear_cpu_affinity(self)#

Clear all affinity bindings for the calling thread.

For Kepler™ or newer fully supported devices.

Supported on Linux only.

clear_field_values(self, list field_ids: list[int | tuple[int, int]]) → None#

Clear multiple field values from the device.

Parameters:

field_ids (list of int or tuple of (int, int)) –

List of field IDs to clear.

Each item may be either a single value from the FieldId enum, or a pair of (FieldId, scope ID).

clock(self, clock_type: ClockType) → ClockInfo#: Get information about and manage a specific clock on a device.

fan(self, int fan: int = 0) → FanInfo#: Get information and manage a specific fan on a device.

classmethod get_all_devices(cls) → Iterable[Device]#

Query the available device instances.

Returns:: An iterator over available devices.
Return type:: Iterator of Device

classmethod get_all_devices_with_cpu_affinity( cls, int cpu_index: int, ) → Iterable[Device]#

Retrieve the set of GPUs that have a CPU affinity with the given CPU number.

Supported on Linux only.

Parameters:: cpu_index (int) – The CPU index.
Returns:: An iterator over available devices.
Return type:: Iterator of Device

get_auto_boosted_clocks_enabled(self) → tuple[bool, bool]#

Retrieve the current state of auto boosted clocks on a device.

For Kepler™ or newer fully supported devices.

Auto Boosted clocks are enabled by default on some hardware, allowing the GPU to run at higher clock rates to maximize performance as thermal limits allow.

On Pascal™ and newer hardware, Auto Boosted clocks are controlled through application clocks. Use set_application_clocks() and reset_application_clocks() to control Auto Boost behavior.

Returns:

bool – The current state of Auto Boosted clocks
bool – The default Auto Boosted clocks behavior

get_cpu_affinity( self, scope: AffinityScope = AffinityScope.NODE, ) → list[int]#

Retrieves a list of indices of NUMA nodes or CPU sockets with the ideal CPU affinity for the device.

For Kepler™ or newer fully supported devices.

Supported on Linux only.

If requested scope is not applicable to the target topology, the API will fall back to reporting the memory affinity for the immediate non-I/O ancestor of the device.

get_current_clock_event_reasons( self, ) → list[ClocksEventReasons]#

Retrieves the current clocks event reasons.

For all fully supported products.

classmethod get_device_count(cls) → int#

Get the number of available devices.

Returns:: The number of available devices.
Return type:: int

get_field_values(self, list field_ids: list[int | tuple[int, int]]) → FieldValues#

Get multiple field values from the device.

Each value specified can raise its own exception. That exception will be raised when attempting to access the corresponding value from the returned FieldValues container.

To confirm that there are no exceptions in the entire container, call FieldValues.validate().

Parameters:

field_ids (list of int or tuple of (int, int)) –

List of field IDs to query.

Each item may be either a single value from the FieldId enum, or a pair of (FieldId, scope ID).

Returns:

Container of field values corresponding to the requested field IDs.

Return type:

FieldValues

get_memory_affinity( self, scope: AffinityScope = AffinityScope.NODE, ) → list[int]#

Retrieves a list of indices of NUMA nodes or CPU sockets with the ideal memory affinity for the device.

For Kepler™ or newer fully supported devices.

Supported on Linux only.

If requested scope is not applicable to the target topology, the API will fall back to reporting the memory affinity for the immediate non-I/O ancestor of the device.

get_supported_clock_event_reasons( self, ) → list[ClocksEventReasons]#

Retrieves supported clocks event reasons that can be returned by get_current_clock_event_reasons().

For all fully supported products.

This method is not supported in virtual machines running virtual GPU (vGPU).

get_supported_event_types(self) → list[EventType]#

Get the list of event types supported by this device.

For Fermi™ or newer fully supported devices. For Linux only (returns an empty list on Windows).

Returns:: The list of supported event types.
Return type:: list[EventType]

get_supported_pstates(self) → list[Pstates]#

Get all supported Performance States (P-States) for the device.

The returned list contains a contiguous list of valid P-States supported by the device.

get_topology_nearest_gpus( self, level: GpuTopologyLevel, ) → Iterable[Device]#

Retrieve the GPUs that are nearest to this device at a specific interconnectivity level.

Supported on Linux only.

Parameters:: level (GpuTopologyLevel) – The topology level.
Returns:: The nearest devices at the given topology level.
Return type:: Iterable of Device

register_events( self, events: EventType | int | list[EventType | int], ) → DeviceEvents#

Starts recording events on this device.

For Fermi™ or newer fully supported devices. For Linux only.

ECC events are available only on ECC-enabled devices (see Device.get_total_ecc_errors()). Power capping events are available only on Power Management enabled devices (see Device.get_power_management_mode()).

This call starts recording of events on specific device. All events that occurred before this call are not recorded. Wait for events using the DeviceEvents.wait() method on the result.

Examples

>>> device = Device(index=0)
>>> events = device.register_events([
...     EventType.EVENT_TYPE_XID_CRITICAL_ERROR,
... ])
>>> while event := events.wait(timeout_ms=10000):
...     print(f"Event {event.event_type} occurred on device {event.device.uuid}")

Parameters:: events (EventType, int, or list of EventType or int) – The event type or list of event types to register for this device.
Returns:: An object representing the registered events. Call DeviceEvents.wait() on this object to wait for events.
Return type:: DeviceEvents
Raises:: cuda.core.system.NotSupportedError – None of the requested event types are registered.

set_cpu_affinity(self)#

Sets the ideal affinity for the calling thread and device.

For Kepler™ or newer fully supported devices.

Supported on Linux only.

to_cuda_device(self) → 'cuda.core.Device'#

Get the corresponding cuda.core.Device (which is used for CUDA access) for this cuda.core.system.Device (which is used for NVIDIA machine library (NVML) access).

The devices are mapped to one another by their UUID.

Returns:: The corresponding CUDA device.
Return type:: cuda.core.Device

Attributes

addressing_mode#

AddressingMode

Get the addressing mode of the device.

Addressing modes can be one of:

AddressingMode.DEVICE_ADDRESSING_MODE_HMM: System allocated memory (malloc, mmap) is addressable from the device (GPU), via software-based mirroring of the CPU’s page tables, on the GPU.
AddressingMode.DEVICE_ADDRESSING_MODE_ATS: System allocated memory (malloc, mmap) is addressable from the device (GPU), via Address Translation Services. This means that there is (effectively) a single set of page tables, and the CPU and GPU both use them.
AddressingMode.DEVICE_ADDRESSING_MODE_NONE: Neither HMM nor ATS is active.

Type:: Device.addressing_mode

architecture#

DeviceArchitecture

Device architecture. For example, a Tesla V100 will report DeviceArchitecture.name == "Volta", and RTX A6000 will report DeviceArchitecture.name == "Ampere". If the device returns an architecture that is unknown to NVML then DeviceArchitecture.name == "Unknown" is reported, whereas an architecture that is unknown to cuda.core.system is reported as DeviceArchitecture.name == "Unlisted".

Type:: Device.architecture

attributes#

DeviceAttributes

Get various device attributes.

For Ampere™ or newer fully supported devices. Only available on Linux systems.

Type:: Device.attributes

bar1_memory_info#

BAR1MemoryInfo

Get information about BAR1 memory.

BAR1 is used to map the FB (device memory) so that it can be directly accessed by the CPU or by 3rd party devices (peer-to-peer on the PCIE bus).

Type:: Device.bar1_memory_info

brand#

BrandType

Brand of the device

Type:: Device.brand

cooler#

CoolerInfo

Get information about cooler on a device.

Type:: Device.cooler

cuda_compute_capability#

tuple[int, int]

CUDA compute capability of the device, e.g.: (7, 0) for a Tesla V100.

Returns a tuple (major, minor).

Type:: Device.cuda_compute_capability

display_active#

bool

The display active status for this device.

Indicates whether a display is initialized on the device. For example, whether X Server is attached to this device and has allocated memory for the screen.

Display can be active even when no monitor is physically attached.

Type:: Device.display_active

display_mode#

bool

The display mode for this device.

Indicates whether a physical display (e.g. monitor) is currently connected to any of the device’s connectors.

Type:: Device.display_mode

dynamic_pstates_info#

GpuDynamicPstatesInfo

Retrieve performance monitor samples from the associated subdevice.

Type:: Device.dynamic_pstates_info

index#

int

The NVML index of this device.

Valid indices are derived from the count returned by Device.get_device_count(). For example, if get_device_count() returns 2, the valid indices are 0 and 1, corresponding to GPU 0 and GPU 1.

The order in which NVML enumerates devices has no guarantees of consistency between reboots. For that reason, it is recommended that devices be looked up by their PCI ids or GPU UUID.

Note: The NVML index may not correlate with other APIs, such as the CUDA device index.

Type:: Device.index

inforom#

InforomInfo

Accessor for InfoROM information.

For all products with an InfoROM.

Type:: Device.inforom

is_c2c_mode_enabled#

bool

Whether the C2C (Chip-to-Chip) mode is enabled for this device.

Type:: Device.is_c2c_mode_enabled

memory_info#

MemoryInfo

Object with memory information.

Type:: Device.memory_info

minor_number#

int

The minor number of this device.

For Linux only.

The minor number is used by the Linux device driver to identify the device node in /dev/nvidiaX.

Type:: Device.minor_number

module_id#

int

Get a unique identifier for the device module on the baseboard.

This API retrieves a unique identifier for each GPU module that exists on a given baseboard. For non-baseboard products, this ID would always be 0.

Type:: Device.module_id

name#

str

Name of the device, e.g.: “Tesla V100-SXM2-32GB”

Type:: Device.name

num_fans#

int

The number of fans on the device.

Type:: Device.num_fans

numa_node_id#

int

The NUMA node of the given GPU device.

This only applies to platforms where the GPUs are NUMA nodes.

Type:: Device.numa_node_id

pci_info#

PciInfo

The PCI attributes of this device.

Type:: Device.pci_info

performance_state#

Pstates

The current performance state of the device.

For Fermi™ or newer fully supported devices.

See Pstates for possible performance states.

Type:: Device.performance_state

persistence_mode_enabled#

bool

Whether persistence mode is enabled for this device.

For Linux only.

Type:: Device.persistence_mode_enabled

repair_status#

RepairStatus

Get the repair status for TPC/Channel repair.

For Ampere™ or newer fully supported devices.

Type:: Device.repair_status

serial#

str

Retrieves the globally unique board serial number associated with this device’s board.

Type:: Device.serial

temperature#

Temperature

Get information about temperatures on a device.

Type:: Device.temperature

uuid#

str

Retrieves the globally unique immutable UUID associated with this device, as a 5 part hexadecimal string, that augments the immutable, board serial identifier.

Type:: Device.uuid