cuda.core.experimental.ProgramOptions

class cuda.core.experimental.ProgramOptions(arch: str | None = None, relocatable_device_code: bool | None = None, extensible_whole_program: bool | None = None, debug: bool | None = None, lineinfo: bool | None = None, device_code_optimize: bool | None = None, ptxas_options: str | List[str] | Tuple[str] | None = None, max_register_count: int | None = None, ftz: bool | None = None, prec_sqrt: bool | None = None, prec_div: bool | None = None, fma: bool | None = None, use_fast_math: bool | None = None, extra_device_vectorization: bool | None = None, link_time_optimization: bool | None = None, gen_opt_lto: bool | None = None, define_macro: str | Tuple[str, str] | List[str | Tuple[str, str]] | Tuple[str | Tuple[str, str]] | None = None, undefine_macro: str | List[str] | Tuple[str] | None = None, include_path: str | List[str] | Tuple[str] | None = None, pre_include: str | List[str] | Tuple[str] | None = None, no_source_include: bool | None = None, std: str | None = None, builtin_move_forward: bool | None = None, builtin_initializer_list: bool | None = None, disable_warnings: bool | None = None, restrict: bool | None = None, device_as_default_execution_space: bool | None = None, device_int128: bool | None = None, optimization_info: str | None = None, no_display_error_number: bool | None = None, diag_error: int | List[int] | Tuple[int] | None = None, diag_suppress: int | List[int] | Tuple[int] | None = None, diag_warn: int | List[int] | Tuple[int] | None = None, brief_diagnostics: bool | None = None, time: str | None = None, split_compile: int | None = None, fdevice_syntax_only: bool | None = None, minimal: bool | None = None)

Customizable options for configuring Program.

arch

Pass the SM architecture value, such as sm_<CC> (for generating CUBIN) or compute_<CC> (for generating PTX). If not provided, the current device’s architecture will be used.

Type:

str, optional

relocatable_device_code

Enable (disable) the generation of relocatable device code. Default: False Maps to: --relocatable-device-code={true|false} (-rdc)

Type:

bool, optional

extensible_whole_program

Do extensible whole program compilation of device code. Default: False Maps to: --extensible-whole-program (-ewp)

Type:

bool, optional

debug

Generate debug information. If –dopt is not specified, then turns off all optimizations. Default: False Maps to: --device-debug (-G)

Type:

bool, optional

lineinfo

Generate line-number information. Default: False Maps to: --generate-line-info (-lineinfo)

Type:

bool, optional

device_code_optimize

Enable device code optimization. When specified along with ‘-G’, enables limited debug information generation for optimized device code. Default: None Maps to: --dopt on (-dopt)

Type:

bool, optional

ptxas_options

Specify one or more options directly to ptxas, the PTX optimizing assembler. Options should be strings. For example [“-v”, “-O2”]. Default: None Maps to: --ptxas-options <options> (-Xptxas)

Type:

Union[str, List[str]], optional

max_register_count

Specify the maximum amount of registers that GPU functions can use. Default: None Maps to: --maxrregcount=<N> (-maxrregcount)

Type:

int, optional

ftz

When performing single-precision floating-point operations, flush denormal values to zero or preserve denormal values. Default: False Maps to: --ftz={true|false} (-ftz)

Type:

bool, optional

prec_sqrt

For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation. Default: True Maps to: --prec-sqrt={true|false} (-prec-sqrt)

Type:

bool, optional

prec_div

For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation. Default: True Maps to: --prec-div={true|false} (-prec-div)

Type:

bool, optional

fma

Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations. Default: True Maps to: --fmad={true|false} (-fmad)

Type:

bool, optional

use_fast_math

Make use of fast math operations. Default: False Maps to: --use_fast_math (-use_fast_math)

Type:

bool, optional

extra_device_vectorization

Enables more aggressive device code vectorization in the NVVM optimizer. Default: False Maps to: --extra-device-vectorization (-extra-device-vectorization)

Type:

bool, optional

Generate intermediate code for later link-time optimization. Default: False Maps to: --dlink-time-opt (-dlto)

Type:

bool, optional

gen_opt_lto

Run the optimizer passes before generating the LTO IR. Default: False Maps to: --gen-opt-lto (-gen-opt-lto)

Type:

bool, optional

define_macro

Predefine a macro. Can be either a string, in which case that macro will be set to 1, a 2 element tuple of strings, in which case the first element is defined as the second, or a list of strings or tuples. Default: None Maps to: --define-macro=<def> (-D)

Type:

Union[str, Tuple[str, str], List[Union[str, Tuple[str, str]]]], optional

undefine_macro

Cancel any previous definition of a macro, or list of macros. Default: None Maps to: --undefine-macro=<def> (-U)

Type:

Union[str, List[str]], optional

include_path

Add the directory or directories to the list of directories to be searched for headers. Default: None Maps to: --include-path=<dir> (-I)

Type:

Union[str, List[str]], optional

pre_include

Preinclude one or more headers during preprocessing. Can be either a string or a list of strings. Default: None Maps to: --pre-include=<header> (-include)

Type:

Union[str, List[str]], optional

no_source_include

Disable the default behavior of adding the directory of each input source to the include path. Default: False Maps to: --no-source-include (-no-source-include)

Type:

bool, optional

std

Set language dialect to C++03, C++11, C++14, C++17 or C++20. Default: c++17 Maps to: --std={c++03|c++11|c++14|c++17|c++20} (-std)

Type:

str, optional

builtin_move_forward

Provide builtin definitions of std::move and std::forward. Default: True Maps to: --builtin-move-forward={true|false} (-builtin-move-forward)

Type:

bool, optional

builtin_initializer_list

Provide builtin definitions of std::initializer_list class and member functions. Default: True Maps to: --builtin-initializer-list={true|false} (-builtin-initializer-list)

Type:

bool, optional

disable_warnings

Inhibit all warning messages. Default: False Maps to: --disable-warnings (-w)

Type:

bool, optional

restrict

Programmer assertion that all kernel pointer parameters are restrict pointers. Default: False Maps to: --restrict (-restrict)

Type:

bool, optional

device_as_default_execution_space

Treat entities with no execution space annotation as __device__ entities. Default: False Maps to: --device-as-default-execution-space (-default-device)

Type:

bool, optional

device_int128

Allow the __int128 type in device code. Default: False Maps to: --device-int128 (-device-int128)

Type:

bool, optional

optimization_info

Provide optimization reports for the specified kind of optimization. Default: None Maps to: --optimization-info=<kind> (-opt-info)

Type:

str, optional

no_display_error_number

Disable the display of a diagnostic number for warning messages. Default: False Maps to: --no-display-error-number (-no-err-no)

Type:

bool, optional

diag_error

Emit error for a specified diagnostic message number or comma separated list of numbers. Default: None Maps to: --diag-error=<error-number>, ... (-diag-error)

Type:

Union[int, List[int]], optional

diag_suppress

Suppress a specified diagnostic message number or comma separated list of numbers. Default: None Maps to: --diag-suppress=<error-number>,… (-diag-suppress)

Type:

Union[int, List[int]], optional

diag_warn

Emit warning for a specified diagnostic message number or comma separated lis of numbers. Default: None Maps to: --diag-warn=<error-number>,… (-diag-warn)

Type:

Union[int, List[int]], optional

brief_diagnostics

Disable or enable showing source line and column info in a diagnostic. Default: False Maps to: --brief-diagnostics={true|false} (-brief-diag)

Type:

bool, optional

time

Generate a CSV table with the time taken by each compilation phase. Default: None Maps to: --time=<file-name> (-time)

Type:

str, optional

split_compile

Perform compiler optimizations in parallel. Default: 1 Maps to: --split-compile= <number of threads> (-split-compile)

Type:

int, optional

fdevice_syntax_only

Ends device compilation after front-end syntax checking. Default: False Maps to: --fdevice-syntax-only (-fdevice-syntax-only)

Type:

bool, optional

minimal

Omit certain language features to reduce compile time for small programs. Default: False Maps to: --minimal (-minimal)

Type:

bool, optional