nvrtc#

Error Handling#

NVRTC defines the following enumeration type and function for API call error handling.

class cuda.bindings.nvrtc.nvrtcResult(value: int)#
The enumerated type nvrtcResult defines API call result codes.
NVRTC API functions return nvrtcResult to indicate the call result.
NVRTC_SUCCESS = <nvrtcResult.NVRTC_SUCCESS: 0>#
NVRTC_ERROR_OUT_OF_MEMORY = <nvrtcResult.NVRTC_ERROR_OUT_OF_MEMORY: 1>#
NVRTC_ERROR_PROGRAM_CREATION_FAILURE = <nvrtcResult.NVRTC_ERROR_PROGRAM_CREATION_FAILURE: 2>#
NVRTC_ERROR_INVALID_INPUT = <nvrtcResult.NVRTC_ERROR_INVALID_INPUT: 3>#
NVRTC_ERROR_INVALID_PROGRAM = <nvrtcResult.NVRTC_ERROR_INVALID_PROGRAM: 4>#
NVRTC_ERROR_INVALID_OPTION = <nvrtcResult.NVRTC_ERROR_INVALID_OPTION: 5>#
NVRTC_ERROR_COMPILATION = <nvrtcResult.NVRTC_ERROR_COMPILATION: 6>#
NVRTC_ERROR_BUILTIN_OPERATION_FAILURE = <nvrtcResult.NVRTC_ERROR_BUILTIN_OPERATION_FAILURE: 7>#
NVRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION = <nvrtcResult.NVRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION: 8>#
NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION = <nvrtcResult.NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION: 9>#
NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID = <nvrtcResult.NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID: 10>#
NVRTC_ERROR_INTERNAL_ERROR = <nvrtcResult.NVRTC_ERROR_INTERNAL_ERROR: 11>#
NVRTC_ERROR_TIME_FILE_WRITE_FAILED = <nvrtcResult.NVRTC_ERROR_TIME_FILE_WRITE_FAILED: 12>#
NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED = <nvrtcResult.NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED: 13>#
NVRTC_ERROR_PCH_CREATE_HEAP_EXHAUSTED = <nvrtcResult.NVRTC_ERROR_PCH_CREATE_HEAP_EXHAUSTED: 14>#
NVRTC_ERROR_PCH_CREATE = <nvrtcResult.NVRTC_ERROR_PCH_CREATE: 15>#
NVRTC_ERROR_CANCELLED = <nvrtcResult.NVRTC_ERROR_CANCELLED: 16>#
NVRTC_ERROR_TIME_TRACE_FILE_WRITE_FAILED = <nvrtcResult.NVRTC_ERROR_TIME_TRACE_FILE_WRITE_FAILED: 17>#
cuda.bindings.nvrtc.nvrtcGetErrorString(result: nvrtcResult)#
nvrtcGetErrorString is a helper function that returns a string describing the given nvrtcResult code, e.g., NVRTC_SUCCESS to `"NVRTC_SUCCESS"`. For unrecognized enumeration values, it returns `"NVRTC_ERROR unknown"`.

:param result: CUDA Runtime Compilation API result code.
:type result: :py:obj:`~.nvrtcResult`

:returns: * *nvrtcResult.NVRTC_SUCCESS* -- nvrtcResult.NVRTC_SUCCESS
          * *bytes* -- Message string for the given :py:obj:`~.nvrtcResult` code.

General Information Query#

NVRTC defines the following function for general information query.

cuda.bindings.nvrtc.nvrtcVersion()#
nvrtcVersion sets the output parameters `major` and `minor` with the CUDA Runtime Compilation version number.

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          * **major** (*int*) -- CUDA Runtime Compilation major version number.
          * **minor** (*int*) -- CUDA Runtime Compilation minor version number.
cuda.bindings.nvrtc.nvrtcGetNumSupportedArchs()#
nvrtcGetNumSupportedArchs sets the output parameter `numArchs` with the number of architectures supported by NVRTC. This can then be used to pass an array to :py:obj:`~.nvrtcGetSupportedArchs` to get the supported architectures.

see :py:obj:`~.nvrtcGetSupportedArchs`

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          * **numArchs** (*int*) -- number of supported architectures.
cuda.bindings.nvrtc.nvrtcGetSupportedArchs()#
nvrtcGetSupportedArchs populates the array passed via the output parameter `supportedArchs` with the architectures supported by NVRTC. The array is sorted in the ascending order. The size of the array to be passed can be determined using :py:obj:`~.nvrtcGetNumSupportedArchs`.

see :py:obj:`~.nvrtcGetNumSupportedArchs`

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          * **supportedArchs** (*list[int]*) -- sorted array of supported architectures.

Compilation#

NVRTC defines the following type and functions for actual compilation.

class cuda.bindings.nvrtc.nvrtcProgram(*args, **kwargs)#
nvrtcProgram is the unit of compilation, and an opaque handle for a program.

To compile a CUDA program string, an instance of nvrtcProgram must be created first with nvrtcCreateProgram, then compiled with nvrtcCompileProgram.

.. method:: getPtr()

   Get memory address of class instance
cuda.bindings.nvrtc.nvrtcCreateProgram(char *src, char *name, int numHeaders, headers: Optional[tuple[bytes] | list[bytes]], includeNames: Optional[tuple[bytes] | list[bytes]])#
nvrtcCreateProgram creates an instance of nvrtcProgram with the given input parameters, and sets the output parameter `prog` with it.

:param src: CUDA program source.
:type src: bytes
:param name: CUDA program name.  `name` can be `NULL`; `"default_program"` is
             used when `name` is `NULL` or "".
:type name: bytes
:param numHeaders: Number of headers used.  `numHeaders` must be greater than or equal
                   to 0.
:type numHeaders: int
:param headers: Sources of the headers.  `headers` can be `NULL` when `numHeaders`
                is 0.
:type headers: list[bytes]
:param includeNames: Name of each header by which they can be included in the CUDA
                     program source.  `includeNames` can be `NULL` when `numHeaders` is
                     0. These headers must be included with the exact names specified
                     here.
:type includeNames: list[bytes]

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_OUT_OF_MEMORY`
            - :py:obj:`~.NVRTC_ERROR_PROGRAM_CREATION_FAILURE`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
            - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
          * **prog** (:py:obj:`~.nvrtcProgram`) -- CUDA Runtime Compilation program.

.. seealso:: :py:obj:`~.nvrtcDestroyProgram`
cuda.bindings.nvrtc.nvrtcDestroyProgram(prog)#
nvrtcDestroyProgram destroys the given program.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
:rtype: nvrtcResult

.. seealso:: :py:obj:`~.nvrtcCreateProgram`
cuda.bindings.nvrtc.nvrtcCompileProgram(prog, int numOptions, options: Optional[tuple[bytes] | list[bytes]])#
nvrtcCompileProgram compiles the given program.

It supports compile options listed in :py:obj:`~.Supported Compile
Options`.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`
:param numOptions: Number of compiler options passed.
:type numOptions: int
:param options: Compiler options in the form of C string array.  `options` can be
                `NULL` when `numOptions` is 0.
:type options: list[bytes]

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_OUT_OF_MEMORY`
          - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
          - :py:obj:`~.NVRTC_ERROR_INVALID_OPTION`
          - :py:obj:`~.NVRTC_ERROR_COMPILATION`
          - :py:obj:`~.NVRTC_ERROR_BUILTIN_OPERATION_FAILURE`
          - :py:obj:`~.NVRTC_ERROR_TIME_FILE_WRITE_FAILED`
          - :py:obj:`~.NVRTC_ERROR_CANCELLED`
:rtype: nvrtcResult
cuda.bindings.nvrtc.nvrtcGetPTXSize(prog)#
:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
            - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
          * **ptxSizeRet** (*int*) -- Size of the generated PTX (including the trailing `NULL`).

.. seealso:: :py:obj:`~.nvrtcGetPTX`
cuda.bindings.nvrtc.nvrtcGetPTX(prog, char *ptx)#
nvrtcGetPTX stores the PTX generated by the previous compilation of `prog` in the memory pointed by `ptx`.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`
:param ptx: Compiled result.
:type ptx: bytes

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
:rtype: nvrtcResult

.. seealso:: :py:obj:`~.nvrtcGetPTXSize`
cuda.bindings.nvrtc.nvrtcGetCUBINSize(prog)#
nvrtcGetCUBINSize sets the value of `cubinSizeRet` with the size of the cubin generated by the previous compilation of `prog`. The value of cubinSizeRet is set to 0 if the value specified to `-arch` is a virtual architecture instead of an actual architecture.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
            - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
          * **cubinSizeRet** (*int*) -- Size of the generated cubin.

.. seealso:: :py:obj:`~.nvrtcGetCUBIN`
cuda.bindings.nvrtc.nvrtcGetCUBIN(prog, char *cubin)#
nvrtcGetCUBIN stores the cubin generated by the previous compilation of `prog` in the memory pointed by `cubin`. No cubin is available if the value specified to `-arch` is a virtual architecture instead of an actual architecture.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`
:param cubin: Compiled and assembled result.
:type cubin: bytes

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
:rtype: nvrtcResult

.. seealso:: :py:obj:`~.nvrtcGetCUBINSize`
cuda.bindings.nvrtc.nvrtcGetLTOIRSize(prog)#
nvrtcGetLTOIRSize sets the value of `LTOIRSizeRet` with the size of the LTO IR generated by the previous compilation of `prog`. The value of LTOIRSizeRet is set to 0 if the program was not compiled with `-dlto`.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
            - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
          * **LTOIRSizeRet** (*int*) -- Size of the generated LTO IR.

.. seealso:: :py:obj:`~.nvrtcGetLTOIR`
cuda.bindings.nvrtc.nvrtcGetLTOIR(prog, char *LTOIR)#
nvrtcGetLTOIR stores the LTO IR generated by the previous compilation of `prog` in the memory pointed by `LTOIR`. No LTO IR is available if the program was compiled without `-dlto`.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`
:param LTOIR: Compiled result.
:type LTOIR: bytes

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
:rtype: nvrtcResult

.. seealso:: :py:obj:`~.nvrtcGetLTOIRSize`
cuda.bindings.nvrtc.nvrtcGetOptiXIRSize(prog)#
nvrtcGetOptiXIRSize sets the value of `optixirSizeRet` with the size of the OptiX IR generated by the previous compilation of `prog`. The value of nvrtcGetOptiXIRSize is set to 0 if the program was compiled with options incompatible with OptiX IR generation.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
            - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
          * **optixirSizeRet** (*int*) -- Size of the generated LTO IR.

.. seealso:: :py:obj:`~.nvrtcGetOptiXIR`
cuda.bindings.nvrtc.nvrtcGetOptiXIR(prog, char *optixir)#
nvrtcGetOptiXIR stores the OptiX IR generated by the previous compilation of `prog` in the memory pointed by `optixir`. No OptiX IR is available if the program was compiled with options incompatible with OptiX IR generation.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`
:param optixir: Optix IR Compiled result.
:type optixir: bytes

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
:rtype: nvrtcResult

.. seealso:: :py:obj:`~.nvrtcGetOptiXIRSize`
cuda.bindings.nvrtc.nvrtcGetProgramLogSize(prog)#
Note that compilation log may be generated with warnings and
informative messages, even when the compilation of `prog` succeeds.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
            - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
          * **logSizeRet** (*int*) -- Size of the compilation log (including the trailing `NULL`).

.. seealso:: :py:obj:`~.nvrtcGetProgramLog`
cuda.bindings.nvrtc.nvrtcGetProgramLog(prog, char *log)#
nvrtcGetProgramLog stores the log generated by the previous compilation of `prog` in the memory pointed by `log`.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`
:param log: Compilation log.
:type log: bytes

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
:rtype: nvrtcResult

.. seealso:: :py:obj:`~.nvrtcGetProgramLogSize`
cuda.bindings.nvrtc.nvrtcAddNameExpression(prog, char *name_expression)#
nvrtcAddNameExpression notes the given name expression denoting the address of a global function or device/__constant__ variable.

The identical name expression string must be provided on a subsequent
call to nvrtcGetLoweredName to extract the lowered name.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`
:param name_expression: constant expression denoting the address of a global function or
                        device/__constant__ variable.
:type name_expression: bytes

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
          - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          - :py:obj:`~.NVRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION`
:rtype: nvrtcResult

.. seealso:: :py:obj:`~.nvrtcGetLoweredName`
cuda.bindings.nvrtc.nvrtcGetLoweredName(prog, char *name_expression)#
:param prog: CUDA Runtime Compilation program.
:type prog: nvrtcProgram
:param name_expression: constant expression denoting the address of a global function or
                        device/__constant__ variable.
:type name_expression: bytes

:returns: * *nvrtcResult* -- NVRTC_SUCCESS
            NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION
            NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID
          * **lowered_name** (*bytes*) -- initialized by the function to point to a C string containing the
            lowered (mangled) name corresponding to the provided name
            expression.

.. seealso:: :obj:`nvrtcAddNameExpression`
cuda.bindings.nvrtc.nvrtcSetFlowCallback(prog, callback, payload)#
nvrtcSetFlowCallback registers a callback function that the compiler will invoke at different points during a call to nvrtcCompileProgram, and the callback function can decide whether to cancel compilation by returning specific values.

The callback function must satisfy the following constraints:

(1) Its signature should be:

**View CUDA Toolkit Documentation for a C++ code example**

When invoking the callback, the compiler will always pass `payload` to
param1 so that the callback may make decisions based on `payload` .
It'll always pass NULL to param2 for now which is reserved for future
extensions.

(2) It must return 1 to cancel compilation or 0 to continue. Other
return values are reserved for future use.

(3) It must return consistent values. Once it returns 1 at one point,
it must return 1 in all following invocations during the current
nvrtcCompileProgram call in progress.

(4) It must be thread-safe.

(5) It must not invoke any nvrtc/libnvvm/ptx APIs.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`
:param callback: the callback that issues cancellation signal.
:type callback: Any
:param payload: to be passed as a parameter when invoking the callback.
:type payload: Any

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
          - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
:rtype: nvrtcResult

Precompiled header (PCH) (CUDA 12.8+)#

NVRTC defines the following function related to PCH. Also see PCH related flags passed to nvrtcCompileProgram.

cuda.bindings.nvrtc.nvrtcGetPCHHeapSize()#
retrieve the current size of the PCH Heap.

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT`
          * **ret** (*int*) -- pointer to location where the size of the PCH Heap will be stored
cuda.bindings.nvrtc.nvrtcSetPCHHeapSize(size_t size)#
set the size of the PCH Heap.

The requested size may be rounded up to a platform dependent alignment
(e.g. page size). If the PCH Heap has already been allocated, the heap
memory will be freed and a new PCH Heap will be allocated.

:param size: requested size of the PCH Heap, in bytes
:type size: size_t

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
:rtype: nvrtcResult
cuda.bindings.nvrtc.nvrtcGetPCHCreateStatus(prog)#
returns the PCH creation status.

NVRTC_SUCCESS indicates that the PCH was successfully created.
NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED indicates that no PCH creation was
attempted, either because PCH functionality was not requested during
the preceding nvrtcCompileProgram call, or automatic PCH processing was
requested, and compiler chose not to create a PCH file.
NVRTC_ERROR_PCH_CREATE_HEAP_EXHAUSTED indicates that a PCH file could
potentially have been created, but the compiler ran out space in the
PCH heap. In this scenario, the
:py:obj:`~.nvrtcGetPCHHeapSizeRequired()` can be used to query the
required heap size, the heap can be reallocated for this size with
:py:obj:`~.nvrtcSetPCHHeapSize()` and PCH creation may be reattempted
again invoking :py:obj:`~.nvrtcCompileProgram()` with a new NVRTC
program instance. NVRTC_ERROR_PCH_CREATE indicates that an error
condition prevented the PCH file from being created.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`

:returns:

          - :py:obj:`~.NVRTC_SUCCESS`
          - :py:obj:`~.NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED`
          - :py:obj:`~.NVRTC_ERROR_PCH_CREATE`
          - :py:obj:`~.NVRTC_ERROR_PCH_CREATE_HEAP_EXHAUSTED`
          - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
:rtype: nvrtcResult
cuda.bindings.nvrtc.nvrtcGetPCHHeapSizeRequired(prog)#
retrieve the required size of the PCH heap required to compile the given program.

:param prog: CUDA Runtime Compilation program.
:type prog: :py:obj:`~.nvrtcProgram`

:returns: * *nvrtcResult* --

            - :py:obj:`~.NVRTC_SUCCESS`
            - :py:obj:`~.NVRTC_ERROR_INVALID_PROGRAM`
            - :py:obj:`~.NVRTC_ERROR_INVALID_INPUT` The size retrieved using this function is only valid if :py:obj:`~.nvrtcGetPCHCreateStatus()` returned NVRTC_SUCCESS or NVRTC_ERROR_PCH_CREATE_HEAP_EXHAUSTED
          * **size** (*int*) -- pointer to location where the required size of the PCH Heap will be
            stored

Supported Compile Options#

NVRTC supports the compile options below. Option names with two preceding dashs (--) are long option names and option names with one preceding dash (-) are short option names. Short option names can be used instead of long option names. When a compile option takes an argument, an assignment operator (=) is used to separate the compile option argument from the compile option name, e.g., "--gpu-architecture=compute_100". Alternatively, the compile option name and the argument can be specified in separate strings without an assignment operator, .e.g, "--gpu-architecture" "compute_100". Single-character short option names, such as -D, -U, and -I, do not require an assignment operator, and the compile option name and the argument can be present in the same string with or without spaces between them. For instance, "-D=<def>", "-D<def>", and "-D <def>" are all supported.

The valid compiler options are:

  • Compilation targets

    • --gpu-architecture=<arch> (-arch)

Specify the name of the class of GPU architectures for which the input must be compiled.

  • Separate compilation / whole-program compilation

    • --device-c (-dc)

Generate relocatable code that can be linked with other relocatable device code. It is equivalent to --relocatable-device-code=true.

  • --device-w (-dw)

Generate non-relocatable code. It is equivalent to --relocatable-device-code=false.

  • --relocatable-device-code={true|false} (-rdc)

Enable (disable) the generation of relocatable device code.

  • --extensible-whole-program (-ewp)

Do extensible whole program compilation of device code.

  • Debugging support

    • --device-debug (-G)

Generate debug information. If --dopt is not specified, then turns off all optimizations.

  • --generate-line-info (-lineinfo)

Generate line-number information.

  • Code generation

    • --dopt on (-dopt)

    • --dopt=on

Enable device code optimization. When specified along with -G, enables limited debug information generation for optimized device code (currently, only line number information). When -G is not specified, -dopt=on is implicit.

  • --Ofast-compile={0|min|mid|max} (-Ofc)

Specify the fast-compile level for device code, which controls the tradeoff between compilation speed and runtime performance by disabling certain optimizations at varying levels.

  • --ptxas-options <options> (-Xptxas)

  • --ptxas-options=<options>

Specify options directly to ptxas, the PTX optimizing assembler.

  • --maxrregcount=<N> (-maxrregcount)

Specify the maximum amount of registers that GPU functions can use. Until a function-specific limit, a higher value will generally increase the performance of individual GPU threads that execute this function. However, because thread registers are allocated from a global register pool on each GPU, a higher value of this option will also reduce the maximum thread block size, thereby reducing the amount of thread parallelism. Hence, a good maxrregcount value is the result of a trade-off. If this option is not specified, then no maximum is assumed. Value less than the minimum registers required by ABI will be bumped up by the compiler to ABI minimum limit.

  • --ftz={true|false} (-ftz)

When performing single-precision floating-point operations, flush denormal values to zero or preserve denormal values.

--use_fast_math implies --ftz=true.

  • --prec-sqrt={true|false} (-prec-sqrt)

For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation. --use_fast_math implies --prec-sqrt=false.

  • --prec-div={true|false} (-prec-div) For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation. --use_fast_math implies --prec-div=false.

    • Default: true

  • --fmad={true|false} (-fmad)

Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA). --use_fast_math implies --fmad=true.

  • --use_fast_math (-use_fast_math)

Make use of fast math operations. --use_fast_math implies --ftz=true --prec-div=false --prec-sqrt=false --fmad=true.

  • --extra-device-vectorization (-extra-device-vectorization)

Enables more aggressive device code vectorization in the NVVM optimizer.

  • --modify-stack-limit={true|false} (-modify-stack-limit)

On Linux, during compilation, use setrlimit() to increase stack size to maximum allowed. The limit is reset to the previous value at the end of compilation. Note: setrlimit() changes the value for the entire process.

  • --dlink-time-opt (-dlto)

Generate intermediate code for later link-time optimization. It implies -rdc=true. Note: when this option is used the nvrtcGetLTOIR API should be used, as PTX or Cubin will not be generated.

  • --gen-opt-lto (-gen-opt-lto)

Run the optimizer passes before generating the LTO IR.

  • --optix-ir (-optix-ir)

Generate OptiX IR. The Optix IR is only intended for consumption by OptiX through appropriate APIs. This feature is not supported with link-time-optimization (-dlto).

Note: when this option is used the nvrtcGetOptiX API should be used, as PTX or Cubin will not be generated.

  • --jump-table-density=[0-101] (-jtd)

Specify the case density percentage in switch statements, and use it as a minimal threshold to determine whether jump table(brx.idx instruction) will be used to implement a switch statement. Default value is 101. The percentage ranges from 0 to 101 inclusively.

  • --device-stack-protector={true|false} (-device-stack-protector)

Enable (disable) the generation of stack canaries in device code.

  • --no-cache (-no-cache)

Disable the use of cache for both ptx and cubin code generation.

  • --frandom-seed (-frandom-seed)

The user specified random seed will be used to replace random numbers used in generating symbol names and variable names. The option can be used to generate deterministically identical ptx and object files. If the input value is a valid number (decimal, octal, or hex), it will be used directly as the random seed. Otherwise, the CRC value of the passed string will be used instead.

  • Preprocessing

    • --define-macro=<def> (-D)

<def> can be either <name> or <name=definitions>.

  • --undefine-macro=<def> (-U)

Cancel any previous definition of <def>.

  • --include-path=<dir> (-I)

Add the directory <dir> to the list of directories to be searched for headers. These paths are searched after the list of headers given to nvrtcCreateProgram.

  • --pre-include=<header> (-include)

Preinclude <header> during preprocessing.

  • --no-source-include (-no-source-include)

The preprocessor by default adds the directory of each input sources to the include path. This option disables this feature and only considers the path specified explicitly.

  • Language Dialect

    • --std={c++03|c++11|c++14|c++17|c++20} (-std)

Set language dialect to C++03, C++11, C++14, C++17 or C++20

  • --builtin-move-forward={true|false} (-builtin-move-forward)

Provide builtin definitions of std::move and std::forward, when C++11 or later language dialect is selected.

  • --builtin-initializer-list={true|false} (-builtin-initializer-list)

Provide builtin definitions of std::initializer_list class and member functions when C++11 or later language dialect is selected.

  • Precompiled header support (CUDA 12.8+)

    • --pch (-pch)

Enable automatic PCH processing.

  • --create-pch=<file-name> (-create-pch)

Create a PCH file.

  • --use-pch=<file-name> (-use-pch)

Use the specified PCH file.

  • --pch-dir=<directory-name> (-pch-dir)

When using automatic PCH (-pch), look for and create PCH files in the specified directory. When using explicit PCH (-create-pch or -use-pch), the directory name is prefixed before the specified file name, unless the file name is an absolute path name.

  • --pch-verbose={true|false} (-pch-verbose)

In automatic PCH mode, for each PCH file that could not be used in current compilation, print the reason in the compilation log.

  • --pch-messages={true|false} (-pch-messages)

Print a message in the compilation log, if a PCH file was created or used in the current compilation.

  • --instantiate-templates-in-pch={true|false} (-instantiate-templates-in-pch)

Enable or disable instantiatiation of templates before PCH creation. Instantiating templates may increase the size of the PCH file, while reducing the compilation cost when using the PCH file (since some template instantiations can be skipped).

  • Misc.

    • --disable-warnings (-w)

Inhibit all warning messages.

  • --Wreorder (-Wreorder)

Generate warnings when member initializers are reordered.

  • --warning-as-error= <kind>,… (-Werror)

Make warnings of the specified kinds into errors. The following is the list of warning kinds accepted by this option:

  • --restrict (-restrict)

Programmer assertion that all kernel pointer parameters are restrict pointers.

  • --device-as-default-execution-space (-default-device)

Treat entities with no execution space annotation as __device__ entities.

  • --device-int128 (-device-int128)

Allow the __int128 type in device code. Also causes the macro __CUDACC_RTC_INT128__ to be defined.

  • --device-float128 (-device-float128)

Allow the __float128 and _Float128 types in device code. Also causes the macro D__CUDACC_RTC_FLOAT128__ to be defined.

  • --optimization-info=<kind> (-opt-info)

Provide optimization reports for the specified kind of optimization. The following kind tags are supported:

  • --display-error-number (-err-no)

Display diagnostic number for warning messages. (Default)

  • --no-display-error-number (-no-err-no)

Disables the display of a diagnostic number for warning messages.

  • --diag-error=<error-number>,… (-diag-error)

Emit error for specified diagnostic message number(s). Message numbers can be separated by comma.

  • --diag-suppress=<error-number>,… (-diag-suppress)

Suppress specified diagnostic message number(s). Message numbers can be separated by comma.

  • --diag-warn=<error-number>,… (-diag-warn)

Emit warning for specified diagnostic message number(s). Message numbers can be separated by comma.

  • --brief-diagnostics={true|false} (-brief-diag)

This option disables or enables showing source line and column info in a diagnostic. The --brief-diagnostics=true will not show the source line and column info.

  • --time=<file-name> (-time)

Generate a comma separated value table with the time taken by each compilation phase, and append it at the end of the file given as the option argument. If the file does not exist, the column headings are generated in the first row of the table. If the file name is ‘-’, the timing data is written to the compilation log.

  • --split-compile=<number-of-threads> (-split-compile=<number-of-threads>)

Perform compiler optimizations in parallel. Split compilation attempts to reduce compile time by enabling the compiler to run certain optimization passes concurrently. This option accepts a numerical value that specifies the maximum number of threads the compiler can use. One can also allow the compiler to use the maximum threads available on the system by setting --split-compile=0. Setting --split-compile=1 will cause this option to be ignored.

  • --fdevice-syntax-only (-fdevice-syntax-only)

Ends device compilation after front-end syntax checking. This option does not generate valid device code.

  • --minimal (-minimal)

Omit certain language features to reduce compile time for small programs. In particular, the following are omitted:

  • --device-stack-protector (-device-stack-protector)

Enable stack canaries in device code. Stack canaries make it more difficult to exploit certain types of memory safety bugs involving stack-local variables. The compiler uses heuristics to assess the risk of such a bug in each function. Only those functions which are deemed high-risk make use of a stack canary.

  • --fdevice-time-trace=<file-name> (-fdevice-time-trace=<file-name>) Enables the time profiler, outputting a JSON file based on given <file-name>. Results can be analyzed on chrome://tracing for a flamegraph visualization.