FAQ#
CUDA headers and include paths#
Why does AST Canopy need to find CUDA C/C++ headers?
AST Canopy uses Clang’s CUDA mode to parse the CUDA C++ header. This mode requires the CUDA directory passed via the –cuda-path flag, and the CUDA include directories passed via the -I flag.
How does AST Canopy find CUDA C/C++ headers?
AST Canopy relies on cuda.pathfinder to
locate the CUDA Toolkit header directory (e.g., the folder containing cuda.h). Internally this uses
cuda.pathfinder.find_nvidia_header_directory(), which probes several common locations and environment
variables so you usually don’t need to configure anything.
What is searched (high level)#
This is a high level overview of the search paths, for more details, please refer to the cuda.pathfinder documentation.
Pip installed CUDA Toolkit: If you installed the CUDA Toolkit via pip (e.g.,
cuda-toolkit[cudart,nvcc]), headers are typically insite-packages/nvidia/.Conda environments: If you installed the Toolkit via Conda (e.g.,
cudatoolkit), headers are typically in$CONDA_PREFIX/include.Explicit environment variables: If
CUDA_HOMEorCUDA_PATHis set, theirincludesubdirectory is used (e.g.,$CUDA_HOME/include).
Conda vs. pip nuances#
pip (runtime-focused): Packages like
cuda-toolkit[cudart,nvcc]ship the CUDA headers. But they have different layouts in either CUDA 12 or CUDA 13. See the next section for more details.Conda (recommended for headers): The
cuda-toolkitpackage commonly includes header files, socuda.pathfinderwill find them in$CONDA_PREFIX/include.
Note
We recommend using only one package management system for a single CUDA installation.
CUDA 12 vs. CUDA 13 considerations#
Main difference between CUDA 12 and CUDA 13 is the layout of headers in pip installed CUDA Toolkit.
Runtime headers:
CUDA 12: Headers are found in
site-packages/nvidia/cuda_runtime/includeCUDA 13: Headers are found in
site-packages/nvidia/cu13/include
CCCL headers:
CUDA 12: Headers are found in
site-packages/nvidia/cuda_cccl/includeCUDA 13: Headers are found in
site-packages/nvidia/cu13/include/cccl
Note
For CUDA 12, since the site-package directory is non-standard, clang++ is unable to use the site-package directory
as –cuda-path. Instead, please install the system-wide CUDA Toolkit to corresponding version and set
CUDA_HOME to the system-wide Toolkit root.
Note
texture_fetch_functions.h was removed in CUDA 13. Upstream Clang 22 adds a guard in
__clang_cuda_runtime_wrapper.h to avoid including it when compiling against CUDA 13+. Until that
change lands in your host Clang, AST Canopy ships a small shim header that conditionally forwards to
texture_fetch_functions.h only for CUDA < 13. This avoids errors when using Clang 20/21 with CUDA 13.
Clang requirements (host headers and resources)#
What are the minimum host-side requirements to parse CUDA headers?
AST Canopy drives Clang in CUDA mode. Even for device-only parsing, Clang needs two host-side components:
libstdc++ C++ headers (for headers like
<cstdlib>,<cmath>used via CUDA wrappers). In principle, use the supported libstdc++ versions listed by clang.Clang “resource directory” headers (
<resource>/includeandinclude/cuda_wrapperscontaining__clang_cuda_runtime_wrapper.hand friends).
You do not need to link libstdc++ for device-only parsing, but their headers must be discoverable.
Environment-specific discovery#
We use Clang’s driver logic (in-process) to compute the same include search paths that clang++ would pass to
cc1. How the headers are found depends on your environment:
1. Pip wheel/bare-metal (system toolchain)#
Install system C++ headers (e.g.,
libstdc++-<version>-devon Debian/Ubuntu) and Clang resource headers (e.g.,libclang-common-20-dev).AST Canopy invokes Clang’s driver API with your host triple to discover C++ standard library include dirs and system C headers. On Linux this is typically libstdc++ by default (e.g.,
/usr/include/c++/<ver>, multiarch dirs,/usr/include) along with the resource includes.
2. Conda environments (conda-forge toolchains)#
Conda packages (e.g.,
clangdev+cxx-compiler) may include a GCC with its own libstdc++ headers.We point the driver to the Conda
clang++(via program path/InstalledDir) so it reads the adjacent config and discovers sibling GCC/libstdc++ directories automatically. This reproduces the same include list you see from runningclang++ -###inside the environment.
3. Custom Clang binary#
clang++ binary location usually includes Clang config file to instruct the driver to discover host resources. User
may also specify a custom clang++ binary path to driver discovery explicitly.
Set
ASTCANOPY_CLANG_BINto a specificclang++path if you want to direct discovery through a custom installation.
Notes#
-resource-dironly affects Clang’s builtin headers (includingcuda_wrappers); it does not provide C++ standard library headers.AST Canopy does not enforce a specific standard library: the driver chooses based on the toolchain and flags. To use libc++, ensure its headers are installed (e.g.,
/usr/include/c++/v1or Condalibcxx-devel) and pass-stdlib=libc++; otherwise libstdc++ is typically selected by default on Linux.AST Canopy is linked against Clang 20. For host resources, please use corresponding Clang version.
Generated bindings and Numba-CUDA version requirements#
What version of numba-cuda do generated bindings require?
Bindings generated by Numbast have specific version requirements for numba-cuda at runtime. The version of
Numbast used to generate the bindings determines the compatible numba-cuda versions.
Numbast Version |
Required numba-cuda Version |
|---|---|
0.6.0 (current dev) |
|
0.5.x |
|
Why do generated bindings have version requirements?
Numbast generates Python code that uses Numba-CUDA’s internal APIs. These APIs can change between releases, so
bindings generated with a specific version of Numbast are tested against a specific range of numba-cuda versions.
How do I ensure compatibility?
For dynamic binding generation:
The correct
numba-cudaversion constraints are automatically enforced at the package dependency level and managed by your package manager (pip or conda). When you install Numbast, compatible versions ofnumba-cudaare installed automatically via the dependencies specified inpyproject.tomland Conda environment files.
For static binding generation:
When distributing generated bindings, document the required
numba-cudaversion range in your package dependencies so users can install a compatible version.Generated static bindings (see Static binding generation) can be regenerated with newer versions of Numbast if you need to support newer
numba-cudareleases.
Note
These version restrictions may be relaxed or removed once numba-cuda releases a stable 1.0 version with
stabilized public APIs. Until then, bindings are tested against specific version ranges to ensure compatibility.
C++ Enum Binding Generation Notes#
Why do Numbast bindings treat C++ enums as ``int64`` in Numba?
Numba represents Python IntEnum values using IntEnumMember(..., int64) (i.e., enum values are lowered as
64-bit integers). Numbast follows this convention in both dynamic and static binding generation so that Python-side
typing and lowering are consistent.
But C++ enums can have different underlying integer types. Why don’t we track and truncate to that type in lowering?
Numbast does not keep a per-enum “underlying integer type” registry and does not perform explicit truncation during lowering because the device-side shim is compiled by NVRTC, and the shim call site is where C++ type checking happens. Even though the Python/Numba side lowers enum values as 64-bit integers, NVRTC can resolve the target enum type and emit the appropriate conversion when the shim calls the original function that takes the C++ enum parameter. This means we don’t need to track per-enum underlying integer types in Python or add special-case truncation/casting logic in Numbast lowering.
If you are binding code that depends on unusual enum representations or non-standard ABIs, you may need a custom adapter. For typical CUDA device code, this approach keeps the implementation simpler and avoids maintaining extra metadata for every enum type.