cuda.core.experimental.LinkerOptions¶
- class cuda.core.experimental.LinkerOptions(arch: str | None = None, max_register_count: int | None = None, time: bool | None = None, verbose: bool | None = None, link_time_optimization: bool | None = None, ptx: bool | None = None, optimization_level: int | None = None, debug: bool | None = None, lineinfo: bool | None = None, ftz: bool | None = None, prec_div: bool | None = None, prec_sqrt: bool | None = None, fma: bool | None = None, kernels_used: str | List[str] | Tuple[str] | None = None, variables_used: str | List[str] | Tuple[str] | None = None, optimize_unused_variables: bool | None = None, ptxas_options: str | List[str] | Tuple[str] | None = None, split_compile: int | None = None, split_compile_extended: int | None = None, no_cache: bool | None = None)¶
Customizable
Linkeroptions.Since the linker would choose to use nvJitLink or the driver APIs as the linking backed, not all options are applicable. When the system’s installed nvJitLink is too old (<12.3), or not installed, the driver APIs (cuLink) will be used instead.
- arch¶
Pass the SM architecture value, such as
sm_<CC>(for generating CUBIN) orcompute_<CC>(for generating PTX). If not provided, the current device’s architecture will be used.- Type:
str, optional
- ptx¶
Emit PTX after linking instead of CUBIN; only supported with
link_time_optimization=True. Default: False.- Type:
bool, optional
- kernels_used¶
Pass a kernel or sequence of kernels that are used; any not in the list can be removed.
- variables_used¶
Pass a variable or sequence of variables that are used; any not in the list can be removed.
- optimize_unused_variables¶
Assume that if a variable is not referenced in device code, it can be removed. Default: False.
- Type:
bool, optional
- split_compile¶
Split compilation maximum thread count. Use 0 to use all available processors. Value of 1 disables split compilation (default). Default: 1.
- Type:
int, optional
- split_compile_extended¶
A more aggressive form of split compilation available in LTO mode only. Accepts a maximum thread count value. Use 0 to use all available processors. Value of 1 disables extended split compilation (default). Note: This option can potentially impact performance of the compiled binary. Default: 1.
- Type:
int, optional