Using CUDA and CUDA-Q in a Project¶
It may be the case that a project that uses CUDA-Q kernels may also want to use CUDA code to do computation on a GPU. This is possible by using both the CUDA Toolkit and CUDA-Q tools. More about programming GPUs in CUDA can be found in the Quick Start Guide.
Once the nvcc
compiler is installed, it is possible to write
CUDA kernels and have them execute on the system GPU. See NVIDIA’s An
Easy Introduction to CUDA C and C++
for more information on getting started with CUDA.
CUDA code uses a unique syntax and is, typically, saved in a file with
the extension .cu
. For our example, assume we have written our
CUDA code in the file my_proj.cu
.
CUDA-Q code is a library-based extension of C++ and uses
standard conforming C++ syntax. Typically, a quantum kernel would be
saved in a file with the .cpp
extension. Again for our
example, let’s assume that we’ve written quantum kernels and saved
them in the file my_proj_quantum.cpp
.
By default, CUDA-Q uses C++ 20 and builds source code against the
LLVM C++ standard library (libc++
). To create a CUDA library that
can link against CUDA-Q code, make sure to define an API that does
not rely on C++ data structures that rely on a specific C++ toolchain
for all functions intended to be called from CUDA-Q (see also
Interfacing between binaries compiled with a different toolchains). For example, if you define a CUDA kernel in my_proj.cu
template <typename CudaDataType>
__global__ void cudaSetFirstNElements(CudaDataType *sv, const CudaDataType *__restrict__ sv2, int64_t N) {
int64_t i = static_cast<int64_t>(blockIdx.x) * blockDim.x + threadIdx.x;
if (i < N) {
sv[i].x = sv2[i].x;
sv[i].y = sv2[i].y;
} else {
sv[i].x = 0.0;
sv[i].y = 0.0;
}
}
define the following template and all desired template specializations
template <typename CudaDataType>
void setFirstNElements(uint32_t n_blocks,
int32_t threads_per_block,
void *newDeviceStateVector,
void *deviceStateVector,
std::size_t previousStateDimension) {
cudaSetFirstNElements<<<n_blocks, threads_per_block>>>(
reinterpret_cast<CudaDataType *>(newDeviceStateVector),
reinterpret_cast<CudaDataType *>(deviceStateVector),
previousStateDimension);
}
template void
setFirstNElements<cuFloatComplex>(uint32_t n_blocks,
int32_t threads_per_block,
void *newDeviceStateVector,
void *deviceStateVector,
std::size_t previousStateDimension);
template void
setFirstNElements<cuDoubleComplex>(uint32_t n_blocks,
int32_t threads_per_block,
void *newDeviceStateVector,
void *deviceStateVector,
std::size_t previousStateDimension);
Create the corresponding header file my_proj.h
declaring the setFirstNElements
template. You can then call setFirstNElements
from within my_proj_quantum.cpp
by including my_proj.h
, for example
#include "cuComplex.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "my_proj.h"
using namespace my_kernels; // the namespace where you defined setFirstNElements
int main () {
const uint32_t n_blocks = 10;
const uint32_t threads_per_block = 5;
void *deviceStateVector;
cudaMalloc((void **)&deviceStateVector, 2 * sizeof(cuDoubleComplex));
// ...
void *newDeviceStateVector;
cudaMalloc((void **)&newDeviceStateVector, 2 * sizeof(cuDoubleComplex));
setFirstNElements<cuDoubleComplex>(n_blocks, threads_per_block,
newDeviceStateVector, deviceStateVector,
2);
return 0;
}
To get an executable, compile the code with
nvcc -c -std=c++17 -Xcompiler -fPIC my_proj.cu -o my_proj.o
nvq++ my_proj_quantum.cpp my_proj.o -I"${CUDA_HOME}/include/" -L"${CUDA_HOME}/lib64" -lcudart -o my_executable
Above, nvq++
is used for the link step and will make sure the CUDA-Q
runtime libraries are linked correctly to the executable program.
The CUDA runtime is explicitly added to this command.