nvrtc#

Error Handling#

NVRTC defines the following enumeration type and function for API call error handling.

class cuda.nvrtc.nvrtcResult(value)#

The enumerated type nvrtcResult defines API call result codes. NVRTC API functions return nvrtcResult to indicate the call result.

NVRTC_SUCCESS = 0#
NVRTC_ERROR_OUT_OF_MEMORY = 1#
NVRTC_ERROR_PROGRAM_CREATION_FAILURE = 2#
NVRTC_ERROR_INVALID_INPUT = 3#
NVRTC_ERROR_INVALID_PROGRAM = 4#
NVRTC_ERROR_INVALID_OPTION = 5#
NVRTC_ERROR_COMPILATION = 6#
NVRTC_ERROR_BUILTIN_OPERATION_FAILURE = 7#
NVRTC_ERROR_NO_NAME_EXPRESSIONS_AFTER_COMPILATION = 8#
NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION = 9#
NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID = 10#
NVRTC_ERROR_INTERNAL_ERROR = 11#
cuda.nvrtc.nvrtcGetErrorString(result: nvrtcResult)#

nvrtcGetErrorString is a helper function that returns a string describing the given nvrtcResult code, e.g., NVRTC_SUCCESS to “NVRTC_SUCCESS”. For unrecognized enumeration values, it returns “NVRTC_ERROR unknown”.

Parameters

result (nvrtcResult) – CUDA Runtime Compilation API result code.

Returns

  • nvrtcResult.NVRTC_SUCCESS – nvrtcResult.NVRTC_SUCCESS

  • bytes – Message string for the given nvrtcResult code.

General Information Query#

NVRTC defines the following function for general information query.

cuda.nvrtc.nvrtcVersion()#

nvrtcVersion sets the output parameters major and minor with the CUDA Runtime Compilation version number.

Returns

cuda.nvrtc.nvrtcGetNumSupportedArchs()#

nvrtcGetNumSupportedArchs sets the output parameter numArchs with the number of architectures supported by NVRTC. This can then be used to pass an array to nvrtcGetSupportedArchs to get the supported architectures.

see nvrtcGetSupportedArchs

Returns

cuda.nvrtc.nvrtcGetSupportedArchs()#

nvrtcGetSupportedArchs populates the array passed via the output parameter supportedArchs with the architectures supported by NVRTC. The array is sorted in the ascending order. The size of the array to be passed can be determined using nvrtcGetNumSupportedArchs.

see nvrtcGetNumSupportedArchs

Returns

Compilation#

NVRTC defines the following type and functions for actual compilation.

class cuda.nvrtc.nvrtcProgram(*args, **kwargs)#

nvrtcProgram is the unit of compilation, and an opaque handle for a program.

To compile a CUDA program string, an instance of nvrtcProgram must be created first with nvrtcCreateProgram, then compiled with nvrtcCompileProgram.

getPtr()#

Get memory address of class instance

cuda.nvrtc.nvrtcCreateProgram(char *src, char *name, int numHeaders, list headers, list includeNames)#

nvrtcCreateProgram creates an instance of nvrtcProgram with the given input parameters, and sets the output parameter prog with it.

Parameters
  • src (bytes) – CUDA program source.

  • name (bytes) – CUDA program name. name can be NULL; “default_program” is used when name is NULL or “”.

  • numHeaders (int) – Number of headers used. numHeaders must be greater than or equal to 0.

  • headers (List[bytes]) – Sources of the headers. headers can be NULL when numHeaders is 0.

  • includeNames (List[bytes]) – Name of each header by which they can be included in the CUDA program source. includeNames can be NULL when numHeaders is 0.

Returns

cuda.nvrtc.nvrtcDestroyProgram(prog)#

nvrtcDestroyProgram destroys the given program.

Parameters

prog (nvrtcProgram) – CUDA Runtime Compilation program.

Returns

Return type

nvrtcResult

cuda.nvrtc.nvrtcCompileProgram(prog, int numOptions, list options)#

nvrtcCompileProgram compiles the given program.

It supports compile options listed in Supported Compile Options.

Parameters
  • prog (nvrtcProgram) – CUDA Runtime Compilation program.

  • numOptions (int) – Number of compiler options passed.

  • options (List[bytes]) – Compiler options in the form of C string array. options can be NULL when numOptions is 0.

Returns

Return type

nvrtcResult

cuda.nvrtc.nvrtcGetPTXSize(prog)#

nvrtcGetPTXSize sets ptxSizeRet with the size of the PTX generated by the previous compilation of prog (including the trailing NULL).

Parameters

prog (nvrtcProgram) – CUDA Runtime Compilation program.

Returns

See also

nvrtcGetPTX

cuda.nvrtc.nvrtcGetPTX(prog, char *ptx)#

nvrtcGetPTX stores the PTX generated by the previous compilation of prog in the memory pointed by ptx.

Parameters
  • prog (nvrtcProgram) – CUDA Runtime Compilation program.

  • ptx (bytes) – Compiled result.

Returns

Return type

nvrtcResult

See also

nvrtcGetPTXSize

cuda.nvrtc.nvrtcGetCUBINSize(prog)#

nvrtcGetCUBINSize sets cubinSizeRet with the size of the cubin generated by the previous compilation of prog. The value of cubinSizeRet is set to 0 if the value specified to -arch is a virtual architecture instead of an actual architecture.

Parameters

prog (nvrtcProgram) – CUDA Runtime Compilation program.

Returns

See also

nvrtcGetCUBIN

cuda.nvrtc.nvrtcGetCUBIN(prog, char *cubin)#

nvrtcGetCUBIN stores the cubin generated by the previous compilation of prog in the memory pointed by cubin. No cubin is available if the value specified to -arch is a virtual architecture instead of an actual architecture.

Parameters
  • prog (nvrtcProgram) – CUDA Runtime Compilation program.

  • cubin (bytes) – Compiled and assembled result.

Returns

Return type

nvrtcResult

cuda.nvrtc.nvrtcGetNVVMSize(prog)#

nvrtcGetNVVMSize sets nvvmSizeRet with the size of the NVVM generated by the previous compilation of prog. The value of nvvmSizeRet is set to 0 if the program was not compiled with -dlto.

Parameters

prog (nvrtcProgram) – CUDA Runtime Compilation program.

Returns

See also

nvrtcGetNVVM

cuda.nvrtc.nvrtcGetNVVM(prog, char *nvvm)#

nvrtcGetNVVM stores the NVVM generated by the previous compilation of prog in the memory pointed by nvvm. The program must have been compiled with -dlto, otherwise will return an error.

Parameters
  • prog (nvrtcProgram) – CUDA Runtime Compilation program.

  • nvvm (bytes) – Compiled result.

Returns

Return type

nvrtcResult

See also

nvrtcGetNVVMSize

cuda.nvrtc.nvrtcGetProgramLogSize(prog)#

nvrtcGetProgramLogSize sets logSizeRet with the size of the log generated by the previous compilation of prog (including the trailing NULL).

Note that compilation log may be generated with warnings and informative messages, even when the compilation of prog succeeds.

Parameters

prog (nvrtcProgram) – CUDA Runtime Compilation program.

Returns

cuda.nvrtc.nvrtcGetProgramLog(prog, char *log)#

nvrtcGetProgramLog stores the log generated by the previous compilation of prog in the memory pointed by log.

Parameters
  • prog (nvrtcProgram) – CUDA Runtime Compilation program.

  • log (bytes) – Compilation log.

Returns

Return type

nvrtcResult

cuda.nvrtc.nvrtcAddNameExpression(prog, char *name_expression)#

nvrtcAddNameExpression notes the given name expression denoting the address of a global function or device/__constant__ variable.

The identical name expression string must be provided on a subsequent call to nvrtcGetLoweredName to extract the lowered name.

Parameters
  • prog (nvrtcProgram) – CUDA Runtime Compilation program.

  • name_expression (bytes) – constant expression denoting the address of a global function or device/__constant__ variable.

Returns

Return type

nvrtcResult

cuda.nvrtc.nvrtcGetLoweredName(prog, char *name_expression)#

nvrtcGetLoweredName extracts the lowered (mangled) name for a global function or device/__constant__ variable, and updates lowered_name to point to it. The memory containing the name is released when the NVRTC program is destroyed by nvrtcDestroyProgram. The identical name expression must have been previously provided to nvrtcAddNameExpression.

Parameters
  • prog (nvrtcProgram) – CUDA Runtime Compilation program.

  • name_expression (bytes) – constant expression denoting the address of a global function or device/__constant__ variable.

Returns

  • nvrtcResult – NVRTC_SUCCESS NVRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION NVRTC_ERROR_NAME_EXPRESSION_NOT_VALID

  • lowered_name (bytes) – initialized by the function to point to a C string containing the lowered (mangled) name corresponding to the provided name expression.

Supported Compile Options#

NVRTC supports the compile options below. Option names with two preceding dashs (--) are long option names and option names with one preceding dash (-) are short option names. Short option names can be used instead of long option names. When a compile option takes an argument, an assignment operator (=) is used to separate the compile option argument from the compile option name, e.g., "--gpu-architecture=compute_60". Alternatively, the compile option name and the argument can be specified in separate strings without an assignment operator, .e.g, "--gpu-architecture" "compute_60". Single-character short option names, such as -D, -U, and -I, do not require an assignment operator, and the compile option name and the argument can be present in the same string with or without spaces between them. For instance, "-D=<def>", "-D<def>", and "-D <def>" are all supported.

The valid compiler options are:

  • Compilation targets

    • --gpu-architecture=<arch> (-arch)

      Specify the name of the class of GPU architectures for which the input must be compiled.

      • Valid <arch>s:

        • compute_35

        • compute_37

        • compute_50

        • compute_52

        • compute_53

        • compute_60

        • compute_61

        • compute_62

        • compute_70

        • compute_72

        • compute_75

        • compute_80

        • sm_35

        • sm_37

        • sm_50

        • sm_52

        • sm_53

        • sm_60

        • sm_61

        • sm_62

        • sm_70

        • sm_72

        • sm_75

        • sm_80

      • Default: compute_52

  • Separate compilation / whole-program compilation

    • --device-c (-dc)

      Generate relocatable code that can be linked with other relocatable device code. It is equivalent to –relocatable-device-code=true.

    • --device-w (-dw)

      Generate non-relocatable code. It is equivalent to --relocatable-device-code=false.

    • --relocatable-device-code={true|false} (-rdc)

      Enable (disable) the generation of relocatable device code.

      • Default: false

    • --extensible-whole-program (-ewp)

      Do extensible whole program compilation of device code.

      • Default: false

  • Debugging support

    • --device-debug (-G)

      Generate debug information. If –dopt is not specified, then turns off all optimizations.

    • --generate-line-info (-lineinfo)

      Generate line-number information.

  • Code generation

    • --dopt on (-dopt)

    • --dopt=on

      Enable device code optimization. When specified along with ‘-G’, enables limited debug information generation for optimized device code. When ‘-G’ is not specified, ‘-dopt=on’ is implicit.

    • --ptxas-options <options> (-Xptxas)

    • --ptxas-options=<options>

      Specify options directly to ptxas, the PTX optimizing assembler.

    • --maxrregcount=<N> (-maxrregcount)

      Specify the maximum amount of registers that GPU functions can use. Until a function-specific limit, a higher value will generally increase the performance of individual GPU threads that execute this function. However, because thread registers are allocated from a global register pool on each GPU, a higher value of this option will also reduce the maximum thread block size, thereby reducing the amount of thread parallelism. Hence, a good maxrregcount value is the result of a trade-off. If this option is not specified, then no maximum is assumed. Value less than the minimum registers required by ABI will be bumped up by the compiler to ABI minimum limit.

    • --ftz={true|false} (-ftz)

      When performing single-precision floating-point operations, flush denormal values to zero or preserve denormal values. --use_fast_math implies --ftz=true.

      • Default: false

    • --prec-sqrt={true|false} (-prec-sqrt)

      For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation. --use_fast_math implies --prec-sqrt=false.

      • Default: true

    • --prec-div={true|false} (-prec-div)

      For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation. --use_fast_math implies --prec-div=false.

      • Default: true

    • --fmad={true|false} (-fmad)

      Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA). --use_fast_math implies --fmad=true.

      • Default: true

    • --use_fast_math (-use_fast_math)

      Make use of fast math operations. --use_fast_math implies --ftz=true --prec-div=false --prec-sqrt=false --fmad=true.

    • --extra-device-vectorization (-extra-device-vectorization)

      Enables more aggressive device code vectorization in the NVVM optimizer.

    • --modify-stack-limit={true|false} (-modify-stack-limit)

      On Linux, during compilation, use setrlimit() to increase stack size to maximum allowed. The limit is reset to the previous value at the end of compilation. Note: setrlimit() changes the value for the entire process.

      • Default: true

    • --dlink-time-opt (-dlto)

      Generate intermediate code for later link-time optimization. It implies -rdc=true.

      Note: when this is used the nvvmGetNVVM API should be used, as PTX or Cubin will not be generated.

  • Preprocessing

    • --define-macro=<def> (-D)

      <def> can be either <name> or <name=definitions>.

      • <name>

        Predefine <name> as a macro with definition 1.

      • <name>=<definition>

        The contents of <definition> are tokenized and preprocessed as if they appeared during translation phase three in a #define directive. In particular, the definition will be truncated by embedded new line characters.

    • --undefine-macro=<def> (-U)

      Cancel any previous definition of <def>.

    • --include-path=<dir> (-I)

      Add the directory <dir> to the list of directories to be searched for headers. These paths are searched after the list of headers given to nvrtcCreateProgram.

    • --pre-include=<header> (-include)

      Preinclude <header> during preprocessing.

    • --no-source-include (-no-source-include) The preprocessor by default adds the directory of each input sources to the include path. This option disables this feature and only considers the path specified explicitly.

  • Language Dialect

    • --std={c++03|c++11|c++14|c++17|c++20} (-std={c++11|c++14|c++17|c++20})

      Set language dialect to C++03, C++11, C++14, C++17 or C++20

    • --builtin-move-forward={true|false} (-builtin-move-forward)

      Provide builtin definitions of std::move and std::forward, when C++11 language dialect is selected.

      • Default: true

    • --builtin-initializer-list={true|false} (-builtin-initializer-list)

      Provide builtin definitions of std::initializer_list class and member functions when C++11 language dialect is selected.

      • Default: true

  • Misc.

    • --disable-warnings (-w)

      Inhibit all warning messages.

    • --restrict (-restrict)

      Programmer assertion that all kernel pointer parameters are restrict pointers.

    • --device-as-default-execution-space (-default-device)

      Treat entities with no execution space annotation as device entities.

    • --device-int128 (-device-int128)

      Allow the __int128 type in device code. Also causes the macro CUDACC_RTC_INT128 to be defined.

    • --optimization-info=<kind> (-opt-info)

      Provide optimization reports for the specified kind of optimization. The following kind tags are supported:

      • inline : emit a remark when a function is inlined.

    • --version-ident={true|false} (-dQ)

      Embed used compiler’s version info into generated PTX/CUBIN

      • Default: false

    • --display-error-number (-err-no)

      Display diagnostic number for warning messages. (Default)

    • --no-display-error-number (-no-err-no)

      Disables the display of a diagnostic number for warning messages.

    • --diag-error=<error-number>,… (-diag-error)

      Emit error for specified diagnostic message number(s). Message numbers can be separated by comma.

    • --diag-suppress=<error-number>,… (-diag-suppress)

      Suppress specified diagnostic message number(s). Message numbers can be separated by comma.

    • --diag-warn=<error-number>,… (-diag-warn)

      Emit warning for specified diagnostic message number(s). Message numbers can be separated by comma.