cuda.core.experimental.ProgramOptions¶
- class cuda.core.experimental.ProgramOptions(arch: str | None = None, relocatable_device_code: bool | None = None, extensible_whole_program: bool | None = None, debug: bool | None = None, lineinfo: bool | None = None, device_code_optimize: bool | None = None, ptxas_options: str | List[str] | Tuple[str] | None = None, max_register_count: int | None = None, ftz: bool | None = None, prec_sqrt: bool | None = None, prec_div: bool | None = None, fma: bool | None = None, use_fast_math: bool | None = None, extra_device_vectorization: bool | None = None, link_time_optimization: bool | None = None, gen_opt_lto: bool | None = None, define_macro: str | Tuple[str, str] | List[str | Tuple[str, str]] | Tuple[str | Tuple[str, str]] | None = None, undefine_macro: str | List[str] | Tuple[str] | None = None, include_path: str | List[str] | Tuple[str] | None = None, pre_include: str | List[str] | Tuple[str] | None = None, no_source_include: bool | None = None, std: str | None = None, builtin_move_forward: bool | None = None, builtin_initializer_list: bool | None = None, disable_warnings: bool | None = None, restrict: bool | None = None, device_as_default_execution_space: bool | None = None, device_int128: bool | None = None, optimization_info: str | None = None, no_display_error_number: bool | None = None, diag_error: int | List[int] | Tuple[int] | None = None, diag_suppress: int | List[int] | Tuple[int] | None = None, diag_warn: int | List[int] | Tuple[int] | None = None, brief_diagnostics: bool | None = None, time: str | None = None, split_compile: int | None = None, fdevice_syntax_only: bool | None = None, minimal: bool | None = None)¶
Customizable options for configuring Program.
- arch¶
Pass the SM architecture value, such as
sm_<CC>
(for generating CUBIN) orcompute_<CC>
(for generating PTX). If not provided, the current device’s architecture will be used.- Type:
str, optional
- relocatable_device_code¶
Enable (disable) the generation of relocatable device code. Default: False Maps to:
--relocatable-device-code={true|false}
(-rdc
)- Type:
bool, optional
- extensible_whole_program¶
Do extensible whole program compilation of device code. Default: False Maps to:
--extensible-whole-program
(-ewp
)- Type:
bool, optional
- debug¶
Generate debug information. If –dopt is not specified, then turns off all optimizations. Default: False Maps to:
--device-debug
(-G
)- Type:
bool, optional
- lineinfo¶
Generate line-number information. Default: False Maps to:
--generate-line-info
(-lineinfo
)- Type:
bool, optional
- device_code_optimize¶
Enable device code optimization. When specified along with ‘-G’, enables limited debug information generation for optimized device code. Default: None Maps to:
--dopt on
(-dopt
)- Type:
bool, optional
- ptxas_options¶
Specify one or more options directly to ptxas, the PTX optimizing assembler. Options should be strings. For example [“-v”, “-O2”]. Default: None Maps to:
--ptxas-options <options>
(-Xptxas
)
- max_register_count¶
Specify the maximum amount of registers that GPU functions can use. Default: None Maps to:
--maxrregcount=<N>
(-maxrregcount
)- Type:
int, optional
- ftz¶
When performing single-precision floating-point operations, flush denormal values to zero or preserve denormal values. Default: False Maps to:
--ftz={true|false}
(-ftz
)- Type:
bool, optional
- prec_sqrt¶
For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation. Default: True Maps to:
--prec-sqrt={true|false}
(-prec-sqrt
)- Type:
bool, optional
- prec_div¶
For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation. Default: True Maps to:
--prec-div={true|false}
(-prec-div
)- Type:
bool, optional
- fma¶
Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations. Default: True Maps to:
--fmad={true|false}
(-fmad
)- Type:
bool, optional
- use_fast_math¶
Make use of fast math operations. Default: False Maps to:
--use_fast_math
(-use_fast_math
)- Type:
bool, optional
- extra_device_vectorization¶
Enables more aggressive device code vectorization in the NVVM optimizer. Default: False Maps to:
--extra-device-vectorization
(-extra-device-vectorization
)- Type:
bool, optional
- link_time_optimization¶
Generate intermediate code for later link-time optimization. Default: False Maps to:
--dlink-time-opt
(-dlto
)- Type:
bool, optional
- gen_opt_lto¶
Run the optimizer passes before generating the LTO IR. Default: False Maps to:
--gen-opt-lto
(-gen-opt-lto
)- Type:
bool, optional
- define_macro¶
Predefine a macro. Can be either a string, in which case that macro will be set to 1, a 2 element tuple of strings, in which case the first element is defined as the second, or a list of strings or tuples. Default: None Maps to:
--define-macro=<def>
(-D
)
- undefine_macro¶
Cancel any previous definition of a macro, or list of macros. Default: None Maps to:
--undefine-macro=<def>
(-U
)
- include_path¶
Add the directory or directories to the list of directories to be searched for headers. Default: None Maps to:
--include-path=<dir>
(-I
)
- pre_include¶
Preinclude one or more headers during preprocessing. Can be either a string or a list of strings. Default: None Maps to:
--pre-include=<header>
(-include
)
- no_source_include¶
Disable the default behavior of adding the directory of each input source to the include path. Default: False Maps to:
--no-source-include
(-no-source-include
)- Type:
bool, optional
- std¶
Set language dialect to C++03, C++11, C++14, C++17 or C++20. Default: c++17 Maps to:
--std={c++03|c++11|c++14|c++17|c++20}
(-std
)- Type:
str, optional
- builtin_move_forward¶
Provide builtin definitions of std::move and std::forward. Default: True Maps to:
--builtin-move-forward={true|false}
(-builtin-move-forward
)- Type:
bool, optional
- builtin_initializer_list¶
Provide builtin definitions of std::initializer_list class and member functions. Default: True Maps to:
--builtin-initializer-list={true|false}
(-builtin-initializer-list
)- Type:
bool, optional
- disable_warnings¶
Inhibit all warning messages. Default: False Maps to:
--disable-warnings
(-w
)- Type:
bool, optional
- restrict¶
Programmer assertion that all kernel pointer parameters are restrict pointers. Default: False Maps to:
--restrict
(-restrict
)- Type:
bool, optional
- device_as_default_execution_space¶
Treat entities with no execution space annotation as __device__ entities. Default: False Maps to:
--device-as-default-execution-space
(-default-device
)- Type:
bool, optional
- device_int128¶
Allow the __int128 type in device code. Default: False Maps to:
--device-int128
(-device-int128
)- Type:
bool, optional
- optimization_info¶
Provide optimization reports for the specified kind of optimization. Default: None Maps to:
--optimization-info=<kind>
(-opt-info
)- Type:
str, optional
- no_display_error_number¶
Disable the display of a diagnostic number for warning messages. Default: False Maps to:
--no-display-error-number
(-no-err-no
)- Type:
bool, optional
- diag_error¶
Emit error for a specified diagnostic message number or comma separated list of numbers. Default: None Maps to:
--diag-error=<error-number>, ...
(-diag-error
)
- diag_suppress¶
Suppress a specified diagnostic message number or comma separated list of numbers. Default: None Maps to:
--diag-suppress=<error-number>,…
(-diag-suppress
)
- diag_warn¶
Emit warning for a specified diagnostic message number or comma separated lis of numbers. Default: None Maps to:
--diag-warn=<error-number>,…
(-diag-warn
)
- brief_diagnostics¶
Disable or enable showing source line and column info in a diagnostic. Default: False Maps to:
--brief-diagnostics={true|false}
(-brief-diag
)- Type:
bool, optional
- time¶
Generate a CSV table with the time taken by each compilation phase. Default: None Maps to:
--time=<file-name>
(-time
)- Type:
str, optional
- split_compile¶
Perform compiler optimizations in parallel. Default: 1 Maps to:
--split-compile= <number of threads>
(-split-compile
)- Type:
int, optional