cuda.core.experimental.LinkerOptions¶
- class cuda.core.experimental.LinkerOptions(arch: str | None = None, max_register_count: int | None = None, time: bool | None = None, verbose: bool | None = None, link_time_optimization: bool | None = None, ptx: bool | None = None, optimization_level: int | None = None, debug: bool | None = None, lineinfo: bool | None = None, ftz: bool | None = None, prec_div: bool | None = None, prec_sqrt: bool | None = None, fma: bool | None = None, kernels_used: str | Tuple[str] | List[str] | None = None, variables_used: str | Tuple[str] | List[str] | None = None, optimize_unused_variables: bool | None = None, ptxas_options: str | Tuple[str] | List[str] | None = None, split_compile: int | None = None, split_compile_extended: int | None = None, no_cache: bool | None = None)¶
Customizable
Linker
options.Since the linker would choose to use nvJitLink or the driver APIs as the linking backed, not all options are applicable. When the system’s installed nvJitLink is too old (<12.3), or not installed, the driver APIs (cuLink) will be used instead.
- arch¶
Pass the SM architecture value, such as
sm_<CC>
(for generating CUBIN) orcompute_<CC>
(for generating PTX). If not provided, the current device’s architecture will be used.- Type:
str, optional
- ptx¶
Emit PTX after linking instead of CUBIN; only supported with
link_time_optimization=True
. Default: False.- Type:
bool, optional
- kernels_used¶
Pass a kernel or sequence of kernels that are used; any not in the list can be removed.
- variables_used¶
Pass a variable or sequence of variables that are used; any not in the list can be removed.
- optimize_unused_variables¶
Assume that if a variable is not referenced in device code, it can be removed. Default: False.
- Type:
bool, optional
- split_compile¶
Split compilation maximum thread count. Use 0 to use all available processors. Value of 1 disables split compilation (default). Default: 1.
- Type:
int, optional
- split_compile_extended¶
A more aggressive form of split compilation available in LTO mode only. Accepts a maximum thread count value. Use 0 to use all available processors. Value of 1 disables extended split compilation (default). Note: This option can potentially impact performance of the compiled binary. Default: 1.
- Type:
int, optional