cuda.core.LinkerOptions#

Customizable options for configuring Linker.

Since the linker may choose either nvJitLink or the driver’s cuLink* APIs as the backend, not every option is applicable to both backends. The backend is decided per-Linker instance from the installed CUDA driver major version, nvJitLink’s availability and major version, the input code types, and whether link-time optimization is requested:

nvJitLink is used when its major version matches the driver’s.
The driver linker is used when nvJitLink is unavailable or too old (<12.3), or when its major version differs from the driver’s (and no LTO step is required).
Linking LTO IRs, or requesting link_time_optimization / ptx, with nvJitLink unavailable or with mismatched nvJitLink and driver majors is unsupported and raises RuntimeError at Linker construction time.

name#

Name of the linker. If the linking succeeds, the name is passed down to the generated ObjectCode.

Type:: str, optional

arch#

Pass the SM architecture value, such as sm_<CC> (for generating CUBIN) or compute_<CC> (for generating PTX). If not provided, the current device’s architecture will be used.

Type:: str, optional

max_register_count#

Maximum register count.

Type:: int, optional

time#

Print timing information to the info log. Default: False.

Type:: bool, optional

verbose#

Print verbose messages to the info log. Default: False.

Type:: bool, optional

link_time_optimization#

Perform link time optimization. Default: False.

Type:: bool, optional

ptx#

Emit PTX after linking instead of CUBIN; only supported with link_time_optimization=True. Default: False.

Type:: bool, optional

optimization_level#

Set optimization level. Only 0 and 3 are accepted.

Type:: int, optional

debug#

Generate debug information. Default: False.

Type:: bool, optional

lineinfo#

Generate line information. Default: False.

Type:: bool, optional

ftz#

Flush denormal values to zero. Default: False.

Type:: bool, optional

prec_div#

Use precise division. Default: True.

Type:: bool, optional

prec_sqrt#

Use precise square root. Default: True.

Type:: bool, optional

fma#

Use fast multiply-add. Default: True.

Type:: bool, optional

kernels_used#

Pass a kernel or sequence of kernels that are used; any not in the list can be removed.

Type:: [str | tuple[str] | list[str]], optional

variables_used#

Pass a variable or sequence of variables that are used; any not in the list can be removed.

Type:: [str | tuple[str] | list[str]], optional

optimize_unused_variables#

Assume that if a variable is not referenced in device code, it can be removed. Default: False.

Type:: bool, optional

ptxas_options#

Pass options to PTXAS.

Type:: [str | tuple[str] | list[str]], optional

split_compile#

Split compilation maximum thread count. Use 0 to use all available processors. Value of 1 disables split compilation (default). Default: 1.

Type:: int, optional

split_compile_extended#

A more aggressive form of split compilation available in LTO mode only. Accepts a maximum thread count value. Use 0 to use all available processors. Value of 1 disables extended split compilation (default). Note: This option can potentially impact performance of the compiled binary. Default: 1.

Type:: int, optional

no_cache#

Do not cache the intermediate steps of nvJitLink. Default: False.

Type:: bool, optional