nvrtc¶
Error Handling¶
NVRTC defines the following enumeration type and function for API call error handling.
- class cuda.bindings.nvrtc.nvrtcResult(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
The enumerated type nvrtcResult defines API call result codes. NVRTC API functions return nvrtcResult to indicate the call result.
- NVRTC_SUCCESS = 0¶
- NVRTC_ERROR_OUT_OF_MEMORY = 1¶
- NVRTC_ERROR_PROGRAM_CREATION_FAILURE = 2¶
- NVRTC_ERROR_INVALID_INPUT = 3¶
- NVRTC_ERROR_INVALID_PROGRAM = 4¶
- NVRTC_ERROR_INVALID_OPTION = 5¶
- NVRTC_ERROR_COMPILATION = 6¶
- NVRTC_ERROR_BUILTIN_OPERATION_FAILURE = 7¶
- NVRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION = 8¶
- NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION = 9¶
- NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID = 10¶
- NVRTC_ERROR_INTERNAL_ERROR = 11¶
- NVRTC_ERROR_TIME_FILE_WRITE_FAILED = 12¶
- NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED = 13¶
- NVRTC_ERROR_PCH_CREATE_HEAP_EXHAUSTED = 14¶
- NVRTC_ERROR_PCH_CREATE = 15¶
- NVRTC_ERROR_CANCELLED = 16¶
- cuda.bindings.nvrtc.nvrtcGetErrorString(result: nvrtcResult)¶
nvrtcGetErrorString is a helper function that returns a string describing the given nvrtcResult code, e.g., NVRTC_SUCCESS to “NVRTC_SUCCESS”. For unrecognized enumeration values, it returns “NVRTC_ERROR unknown”.
- Parameters:
result (
nvrtcResult
) – CUDA Runtime Compilation API result code.- Returns:
nvrtcResult.NVRTC_SUCCESS – nvrtcResult.NVRTC_SUCCESS
bytes – Message string for the given
nvrtcResult
code.
General Information Query¶
NVRTC defines the following function for general information query.
- cuda.bindings.nvrtc.nvrtcVersion()¶
nvrtcVersion sets the output parameters major and minor with the CUDA Runtime Compilation version number.
- Returns:
nvrtcResult –
major (int) – CUDA Runtime Compilation major version number.
minor (int) – CUDA Runtime Compilation minor version number.
- cuda.bindings.nvrtc.nvrtcGetNumSupportedArchs()¶
nvrtcGetNumSupportedArchs sets the output parameter numArchs with the number of architectures supported by NVRTC. This can then be used to pass an array to
nvrtcGetSupportedArchs
to get the supported architectures.- Returns:
nvrtcResult –
numArchs (int) – number of supported architectures.
- cuda.bindings.nvrtc.nvrtcGetSupportedArchs()¶
nvrtcGetSupportedArchs populates the array passed via the output parameter supportedArchs with the architectures supported by NVRTC. The array is sorted in the ascending order. The size of the array to be passed can be determined using
nvrtcGetNumSupportedArchs
.- Returns:
nvrtcResult –
supportedArchs (List[int]) – sorted array of supported architectures.
Compilation¶
NVRTC defines the following type and functions for actual compilation.
- class cuda.bindings.nvrtc.nvrtcProgram(*args, **kwargs)¶
nvrtcProgram is the unit of compilation, and an opaque handle for a program.
To compile a CUDA program string, an instance of nvrtcProgram must be created first with nvrtcCreateProgram, then compiled with nvrtcCompileProgram.
- getPtr()¶
Get memory address of class instance
- cuda.bindings.nvrtc.nvrtcCreateProgram(char *src, char *name, int numHeaders, headers: Optional[Tuple[bytes] | List[bytes]], includeNames: Optional[Tuple[bytes] | List[bytes]])¶
nvrtcCreateProgram creates an instance of nvrtcProgram with the given input parameters, and sets the output parameter prog with it.
- Parameters:
src (bytes) – CUDA program source.
name (bytes) – CUDA program name. name can be NULL; “default_program” is used when name is NULL or “”.
numHeaders (int) – Number of headers used. numHeaders must be greater than or equal to 0.
headers (List[bytes]) – Sources of the headers. headers can be NULL when numHeaders is 0.
includeNames (List[bytes]) – Name of each header by which they can be included in the CUDA program source. includeNames can be NULL when numHeaders is 0. These headers must be included with the exact names specified here.
- Returns:
nvrtcResult –
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.
See also
- cuda.bindings.nvrtc.nvrtcDestroyProgram(prog)¶
nvrtcDestroyProgram destroys the given program.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.- Returns:
- Return type:
See also
- cuda.bindings.nvrtc.nvrtcCompileProgram(prog, int numOptions, options: Optional[Tuple[bytes] | List[bytes]])¶
nvrtcCompileProgram compiles the given program.
It supports compile options listed in
Supported Compile Options
.- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.numOptions (int) – Number of compiler options passed.
options (List[bytes]) – Compiler options in the form of C string array. options can be NULL when numOptions is 0.
- Returns:
- Return type:
- cuda.bindings.nvrtc.nvrtcGetPTXSize(prog)¶
nvrtcGetPTXSize sets the value of ptxSizeRet with the size of the PTX generated by the previous compilation of prog (including the trailing NULL).
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.- Returns:
nvrtcResult –
ptxSizeRet (int) – Size of the generated PTX (including the trailing NULL).
See also
- cuda.bindings.nvrtc.nvrtcGetPTX(prog, char *ptx)¶
nvrtcGetPTX stores the PTX generated by the previous compilation of prog in the memory pointed by ptx.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.ptx (bytes) – Compiled result.
- Returns:
- Return type:
See also
- cuda.bindings.nvrtc.nvrtcGetCUBINSize(prog)¶
nvrtcGetCUBINSize sets the value of cubinSizeRet with the size of the cubin generated by the previous compilation of prog. The value of cubinSizeRet is set to 0 if the value specified to -arch is a virtual architecture instead of an actual architecture.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.- Returns:
nvrtcResult –
cubinSizeRet (int) – Size of the generated cubin.
See also
- cuda.bindings.nvrtc.nvrtcGetCUBIN(prog, char *cubin)¶
nvrtcGetCUBIN stores the cubin generated by the previous compilation of prog in the memory pointed by cubin. No cubin is available if the value specified to -arch is a virtual architecture instead of an actual architecture.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.cubin (bytes) – Compiled and assembled result.
- Returns:
- Return type:
See also
- cuda.bindings.nvrtc.nvrtcGetNVVMSize(prog)¶
DEPRECATION NOTICE: This function will be removed in a future release. Please use nvrtcGetLTOIRSize (and nvrtcGetLTOIR) instead.
- Parameters:
prog (
nvrtcProgram
) – None- Returns:
nvrtcResult
nvvmSizeRet (int) – None
- cuda.bindings.nvrtc.nvrtcGetNVVM(prog, char *nvvm)¶
DEPRECATION NOTICE: This function will be removed in a future release. Please use nvrtcGetLTOIR (and nvrtcGetLTOIRSize) instead.
- Parameters:
prog (
nvrtcProgram
) – Nonenvvm (bytes) – None
- Return type:
- cuda.bindings.nvrtc.nvrtcGetLTOIRSize(prog)¶
nvrtcGetLTOIRSize sets the value of LTOIRSizeRet with the size of the LTO IR generated by the previous compilation of prog. The value of LTOIRSizeRet is set to 0 if the program was not compiled with -dlto.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.- Returns:
nvrtcResult –
LTOIRSizeRet (int) – Size of the generated LTO IR.
See also
- cuda.bindings.nvrtc.nvrtcGetLTOIR(prog, char *LTOIR)¶
nvrtcGetLTOIR stores the LTO IR generated by the previous compilation of prog in the memory pointed by LTOIR. No LTO IR is available if the program was compiled without -dlto.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.LTOIR (bytes) – Compiled result.
- Returns:
- Return type:
See also
- cuda.bindings.nvrtc.nvrtcGetOptiXIRSize(prog)¶
nvrtcGetOptiXIRSize sets the value of optixirSizeRet with the size of the OptiX IR generated by the previous compilation of prog. The value of nvrtcGetOptiXIRSize is set to 0 if the program was compiled with options incompatible with OptiX IR generation.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.- Returns:
nvrtcResult –
optixirSizeRet (int) – Size of the generated LTO IR.
See also
- cuda.bindings.nvrtc.nvrtcGetOptiXIR(prog, char *optixir)¶
nvrtcGetOptiXIR stores the OptiX IR generated by the previous compilation of prog in the memory pointed by optixir. No OptiX IR is available if the program was compiled with options incompatible with OptiX IR generation.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.optixir (bytes) – Optix IR Compiled result.
- Returns:
- Return type:
See also
- cuda.bindings.nvrtc.nvrtcGetProgramLogSize(prog)¶
nvrtcGetProgramLogSize sets logSizeRet with the size of the log generated by the previous compilation of prog (including the trailing NULL).
Note that compilation log may be generated with warnings and informative messages, even when the compilation of prog succeeds.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.- Returns:
nvrtcResult –
logSizeRet (int) – Size of the compilation log (including the trailing NULL).
See also
- cuda.bindings.nvrtc.nvrtcGetProgramLog(prog, char *log)¶
nvrtcGetProgramLog stores the log generated by the previous compilation of prog in the memory pointed by log.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.log (bytes) – Compilation log.
- Returns:
- Return type:
See also
- cuda.bindings.nvrtc.nvrtcAddNameExpression(prog, char *name_expression)¶
nvrtcAddNameExpression notes the given name expression denoting the address of a global function or device/__constant__ variable.
The identical name expression string must be provided on a subsequent call to nvrtcGetLoweredName to extract the lowered name.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.name_expression (bytes) – constant expression denoting the address of a global function or device/__constant__ variable.
- Returns:
- Return type:
See also
- cuda.bindings.nvrtc.nvrtcGetLoweredName(prog, char *name_expression)¶
nvrtcGetLoweredName extracts the lowered (mangled) name for a global function or device/__constant__ variable, and updates lowered_name to point to it. The memory containing the name is released when the NVRTC program is destroyed by nvrtcDestroyProgram. The identical name expression must have been previously provided to nvrtcAddNameExpression.
- Parameters:
prog (nvrtcProgram) – CUDA Runtime Compilation program.
name_expression (bytes) – constant expression denoting the address of a global function or device/__constant__ variable.
- Returns:
nvrtcResult – NVRTC_SUCCESS NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID
lowered_name (bytes) – initialized by the function to point to a C string containing the lowered (mangled) name corresponding to the provided name expression.
See also
- cuda.bindings.nvrtc.nvrtcSetFlowCallback(prog, callback, payload)¶
nvrtcSetFlowCallback registers a callback function that the compiler will invoke at different points during a call to nvrtcCompileProgram, and the callback function can decide whether to cancel compilation by returning specific values.
The callback function must satisfy the following constraints:
Its signature should be:
View CUDA Toolkit Documentation for a C++ code example
When invoking the callback, the compiler will always pass payload to param1 so that the callback may make decisions based on payload . It’ll always pass NULL to param2 for now which is reserved for future extensions.
(2) It must return 1 to cancel compilation or 0 to continue. Other return values are reserved for future use.
(3) It must return consistent values. Once it returns 1 at one point, it must return 1 in all following invocations during the current nvrtcCompileProgram call in progress.
It must be thread-safe.
It must not invoke any nvrtc/libnvvm/ptx APIs.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.callback (Any) – the callback that issues cancellation signal.
payload (Any) – to be passed as a parameter when invoking the callback.
- Returns:
- Return type:
Precompiled header (PCH) (CUDA 12.8+)¶
NVRTC defines the following function related to PCH. Also see PCH related flags passed to nvrtcCompileProgram.
- cuda.bindings.nvrtc.nvrtcGetPCHHeapSize()¶
retrieve the current size of the PCH Heap.
- Returns:
nvrtcResult –
ret (int) – pointer to location where the size of the PCH Heap will be stored
- cuda.bindings.nvrtc.nvrtcSetPCHHeapSize(size_t size)¶
set the size of the PCH Heap.
The requested size may be rounded up to a platform dependent alignment (e.g. page size). If the PCH Heap has already been allocated, the heap memory will be freed and a new PCH Heap will be allocated.
- Parameters:
size (size_t) – requested size of the PCH Heap, in bytes
- Returns:
- Return type:
- cuda.bindings.nvrtc.nvrtcGetPCHCreateStatus(prog)¶
returns the PCH creation status.
NVRTC_SUCCESS indicates that the PCH was successfully created. NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED indicates that no PCH creation was attempted, either because PCH functionality was not requested during the preceding nvrtcCompileProgram call, or automatic PCH processing was requested, and compiler chose not to create a PCH file. NVRTC_ERROR_PCH_CREATE_HEAP_EXHAUSTED indicates that a PCH file could potentially have been created, but the compiler ran out space in the PCH heap. In this scenario, the
nvrtcGetPCHHeapSizeRequired()
can be used to query the required heap size, the heap can be reallocated for this size withnvrtcSetPCHHeapSize()
and PCH creation may be reattempted again invokingnvrtcCompileProgram()
with a new NVRTC program instance. NVRTC_ERROR_PCH_CREATE indicates that an error condition prevented the PCH file from being created.- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.- Returns:
- Return type:
- cuda.bindings.nvrtc.nvrtcGetPCHHeapSizeRequired(prog)¶
retrieve the required size of the PCH heap required to compile the given program.
- Parameters:
prog (
nvrtcProgram
) – CUDA Runtime Compilation program.- Returns:
nvrtcResult –
NVRTC_ERROR_INVALID_INPUT
The size retrieved using this function is only valid ifnvrtcGetPCHCreateStatus()
returned NVRTC_SUCCESS or NVRTC_ERROR_PCH_CREATE_HEAP_EXHAUSTED
size (int) – pointer to location where the required size of the PCH Heap will be stored
Supported Compile Options¶
NVRTC supports the compile options below. Option names with two preceding dashs (--
) are long option names and option names with one preceding dash (-
) are short option names. Short option names can be used instead of long option names. When a compile option takes an argument, an assignment operator (=
) is used to separate the compile option argument from the compile option name, e.g., "--gpu-architecture=compute_60"
. Alternatively, the compile option name and the argument can be specified in separate strings without an assignment operator, .e.g, "--gpu-architecture"
"compute_60"
. Single-character short option names, such as -D
, -U
, and -I
, do not require an assignment operator, and the compile option name and the argument can be present in the same string with or without spaces between them. For instance, "-D=<def>"
, "-D<def>"
, and "-D <def>"
are all supported.
The valid compiler options are:
Compilation targets
--gpu-architecture=<arch>
(-arch
)
Specify the name of the class of GPU architectures for which the input must be compiled.
Separate compilation / whole-program compilation
--device-c
(-dc
)
Generate relocatable code that can be linked with other relocatable device code. It is equivalent to --relocatable-device-code=true
.
--device-w
(-dw
)
Generate non-relocatable code. It is equivalent to --relocatable-device-code=false
.
--relocatable-device-code={true|false}
(-rdc
)
Enable (disable) the generation of relocatable device code.
--extensible-whole-program
(-ewp
)
Do extensible whole program compilation of device code.
Debugging support
--device-debug
(-G
)
Generate debug information. If --dopt
is not specified, then turns off all optimizations.
--generate-line-info
(-lineinfo
)
Generate line-number information.
Code generation
--dopt
on
(-dopt
)--dopt=on
Enable device code optimization. When specified along with -G
, enables limited debug information generation for optimized device code (currently, only line number information). When -G
is not specified, -dopt=on
is implicit.
--ptxas-options
<options> (-Xptxas
)
--ptxas-options=<options>
Specify options directly to ptxas, the PTX optimizing assembler.
--maxrregcount=<N>
(-maxrregcount
)
Specify the maximum amount of registers that GPU functions can use. Until a function-specific limit, a higher value will generally increase the performance of individual GPU threads that execute this function. However, because thread registers are allocated from a global register pool on each GPU, a higher value of this option will also reduce the maximum thread block size, thereby reducing the amount of thread parallelism. Hence, a good maxrregcount value is the result of a trade-off. If this option is not specified, then no maximum is assumed. Value less than the minimum registers required by ABI will be bumped up by the compiler to ABI minimum limit.
--ftz={true|false}
(-ftz
)
When performing single-precision floating-point operations, flush denormal values to zero or preserve denormal values.
--use_fast_math
implies --ftz=true
.
--prec-sqrt={true|false}
(-prec-sqrt
)
For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation. --use_fast_math
implies --prec-sqrt=false
.
--prec-div={true|false}
(-prec-div
) For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation.--use_fast_math
implies--prec-div=false
.
Default:
true
--fmad={true|false}
(-fmad
)
Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA). --use_fast_math
implies --fmad=true
.
--use_fast_math
(-use_fast_math
)
Make use of fast math operations. --use_fast_math
implies --ftz=true
--prec-div=false
--prec-sqrt=false
--fmad=true
.
--extra-device-vectorization
(-extra-device-vectorization
)
Enables more aggressive device code vectorization in the NVVM optimizer.
--modify-stack-limit={true|false}
(-modify-stack-limit
)
On Linux, during compilation, use setrlimit()
to increase stack size to maximum allowed. The limit is reset to the previous value at the end of compilation. Note: setrlimit()
changes the value for the entire process.
--dlink-time-opt
(-dlto
)
Generate intermediate code for later link-time optimization. It implies -rdc=true
. Note: when this option is used the nvrtcGetLTOIR
API should be used, as PTX or Cubin will not be generated.
--gen-opt-lto
(-gen-opt-lto
)
Run the optimizer passes before generating the LTO IR.
--optix-ir
(-optix-ir
)
Generate OptiX IR. The Optix IR is only intended for consumption by OptiX through appropriate APIs. This feature is not supported with link-time-optimization (-dlto
).
Note: when this option is used the nvrtcGetOptiX API should be used, as PTX or Cubin will not be generated.
--jump-table-density=
[0-101] (-jtd
)
Specify the case density percentage in switch statements, and use it as a minimal threshold to determine whether jump table(brx.idx instruction) will be used to implement a switch statement. Default value is 101. The percentage ranges from 0 to 101 inclusively.
--device-stack-protector={true|false}
(-device-stack-protector
)
Enable (disable) the generation of stack canaries in device code.
Preprocessing
--define-macro=<def>
(-D
)
<def>
can be either <name>
or <name=definitions>
.
--undefine-macro=<def>
(-U
)
Cancel any previous definition of <def>
.
--include-path=<dir>
(-I
)
Add the directory <dir>
to the list of directories to be searched for headers. These paths are searched after the list of headers given to nvrtcCreateProgram.
--pre-include=<header>
(-include
)
Preinclude <header>
during preprocessing.
--no-source-include
(-no-source-include
)
The preprocessor by default adds the directory of each input sources to the include path. This option disables this feature and only considers the path specified explicitly.
Language Dialect
--std={c++03|c++11|c++14|c++17|c++20}
(-std
)
Set language dialect to C++03, C++11, C++14, C++17 or C++20
--builtin-move-forward={true|false}
(-builtin-move-forward
)
Provide builtin definitions of std::move
and std::forward
, when C++11 or later language dialect is selected.
--builtin-initializer-list={true|false}
(-builtin-initializer-list
)
Provide builtin definitions of std::initializer_list
class and member functions when C++11 or later language dialect is selected.
Precompiled header support (CUDA 12.8+)
--pch
(-pch
)
Enable automatic PCH processing.
--create-pch=<file-name>
(-create-pch
)
Create a PCH file.
--use-pch=<file-name>
(-use-pch
)
Use the specified PCH file.
--pch-dir=<directory-name>
(-pch-dir
)
When using automatic PCH (-pch
), look for and create PCH files in the specified directory. When using explicit PCH (-create-pch
or -use-pch
), the directory name is prefixed before the specified file name, unless the file name is an absolute path name.
--pch-verbose={true|false}
(-pch-verbose
)
In automatic PCH mode, for each PCH file that could not be used in current compilation, print the reason in the compilation log.
--pch-messages={true|false}
(-pch-messages
)
Print a message in the compilation log, if a PCH file was created or used in the current compilation.
--instantiate-templates-in-pch={true|false}
(-instantiate-templates-in-pch
)
Enable or disable instantiatiation of templates before PCH creation. Instantiating templates may increase the size of the PCH file, while reducing the compilation cost when using the PCH file (since some template instantiations can be skipped).
Misc.
--disable-warnings
(-w
)
Inhibit all warning messages.
--restrict
(-restrict
)
Programmer assertion that all kernel pointer parameters are restrict pointers.
--device-as-default-execution-space
(-default-device
)
Treat entities with no execution space annotation as device
entities.
--device-int128
(-device-int128
)
Allow the __int128
type in device code. Also causes the macro CUDACC_RTC_INT128
to be defined.
--device-float128
(-device-float128
)
Allow the __float128
and _Float128
types in device code. Also causes the macro D__CUDACC_RTC_FLOAT128__
to be defined.
--optimization-info=<kind>
(-opt-info
)
Provide optimization reports for the specified kind of optimization. The following kind tags are supported:
--display-error-number
(-err-no
)
Display diagnostic number for warning messages. (Default)
--no-display-error-number
(-no-err-no
)
Disables the display of a diagnostic number for warning messages.
--diag-error=<error-number>
,… (-diag-error
)
Emit error for specified diagnostic message number(s). Message numbers can be separated by comma.
--diag-suppress=<error-number>
,… (-diag-suppress
)
Suppress specified diagnostic message number(s). Message numbers can be separated by comma.
--diag-warn=<error-number>
,… (-diag-warn
)
Emit warning for specified diagnostic message number(s). Message numbers can be separated by comma.
--brief-diagnostics={true|false}
(-brief-diag
)
This option disables or enables showing source line and column info in a diagnostic. The --brief-diagnostics=true
will not show the source line and column info.
--time=<file-name>
(-time
)
Generate a comma separated value table with the time taken by each compilation phase, and append it at the end of the file given as the option argument. If the file does not exist, the column headings are generated in the first row of the table. If the file name is ‘-’, the timing data is written to the compilation log.
--split-compile=<number-of-threads>
(-split-compile=<number-of-threads>
)
Perform compiler optimizations in parallel. Split compilation attempts to reduce compile time by enabling the compiler to run certain optimization passes concurrently. This option accepts a numerical value that specifies the maximum number of threads the compiler can use. One can also allow the compiler to use the maximum threads available on the system by setting --split-compile=0
. Setting --split-compile=1
will cause this option to be ignored.
--fdevice-syntax-only
(-fdevice-syntax-only
)
Ends device compilation after front-end syntax checking. This option does not generate valid device code.
--minimal
(-minimal
)
Omit certain language features to reduce compile time for small programs. In particular, the following are omitted:
--device-stack-protector
(-device-stack-protector
)
Enable stack canaries in device code. Stack canaries make it more difficult to exploit certain types of memory safety bugs involving stack-local variables. The compiler uses heuristics to assess the risk of such a bug in each function. Only those functions which are deemed high-risk make use of a stack canary.
--fdevice-time-trace=<file-name>
(-fdevice-time-trace=<file-name>
) Enables the time profiler, outputting a JSON file based on given <file-name>. Results can be analyzed on chrome://tracing for a flamegraph visualization.