cuda.core.experimental.LinkerOptions

class cuda.core.experimental.LinkerOptions(arch: str | None = None, max_register_count: int | None = None, time: bool | None = None, verbose: bool | None = None, link_time_optimization: bool | None = None, ptx: bool | None = None, optimization_level: int | None = None, debug: bool | None = None, lineinfo: bool | None = None, ftz: bool | None = None, prec_div: bool | None = None, prec_sqrt: bool | None = None, fma: bool | None = None, kernels_used: List[str] | None = None, variables_used: List[str] | None = None, optimize_unused_variables: bool | None = None, xptxas: List[str] | None = None, split_compile: int | None = None, split_compile_extended: int | None = None, no_cache: bool | None = None)

Customizable Linker options.

Since the linker would choose to use nvJitLink or the driver APIs as the linking backed, not all options are applicable. When the system’s installed nvJitLink is too old (<12.3), or not installed, the driver APIs (cuLink) will be used instead.

arch

Pass the SM architecture value, such as sm_<CC> (for generating CUBIN) or compute_<CC> (for generating PTX). If not provided, the current device’s architecture will be used.

Type:

str, optional

max_register_count

Maximum register count. Maps to: -maxrregcount=<N>.

Type:

int, optional

time

Print timing information to the info log. Maps to -time. Default: False.

Type:

bool, optional

verbose

Print verbose messages to the info log. Maps to -verbose. Default: False.

Type:

bool, optional

Perform link time optimization. Maps to: -lto. Default: False.

Type:

bool, optional

ptx

Emit PTX after linking instead of CUBIN; only supported with -lto. Maps to -ptx. Default: False.

Type:

bool, optional

optimization_level

Set optimization level. Only 0 and 3 are accepted. Maps to -O<N>.

Type:

int, optional

debug

Generate debug information. Maps to -g Default: False.

Type:

bool, optional

lineinfo

Generate line information. Maps to -lineinfo. Default: False.

Type:

bool, optional

ftz

Flush denormal values to zero. Maps to -ftz=<n>. Default: False.

Type:

bool, optional

prec_div

Use precise division. Maps to -prec-div=<n>. Default: True.

Type:

bool, optional

prec_sqrt

Use precise square root. Maps to -prec-sqrt=<n>. Default: True.

Type:

bool, optional

fma

Use fast multiply-add. Maps to -fma=<n>. Default: True.

Type:

bool, optional

kernels_used

Pass list of kernels that are used; any not in the list can be removed. This option can be specified multiple times. Maps to -kernels-used=<name>.

Type:

List[str], optional

variables_used

Pass a list of variables that are used; any not in the list can be removed. Maps to -variables-used=<name>

Type:

List[str], optional

optimize_unused_variables

Assume that if a variable is not referenced in device code, it can be removed. Maps to: -optimize-unused-variables Default: False.

Type:

bool, optional

xptxas

Pass options to PTXAS. Maps to: -Xptxas=<opt>.

Type:

List[str], optional

split_compile

Split compilation maximum thread count. Use 0 to use all available processors. Value of 1 disables split compilation (default). Maps to -split-compile=<N>. Default: 1.

Type:

int, optional

split_compile_extended

A more aggressive form of split compilation available in LTO mode only. Accepts a maximum thread count value. Use 0 to use all available processors. Value of 1 disables extended split compilation (default). Note: This option can potentially impact performance of the compiled binary. Maps to -split-compile-extended=<N>. Default: 1.

Type:

int, optional

no_cache

Do not cache the intermediate steps of nvJitLink. Maps to -no-cache. Default: False.

Type:

bool, optional