Installation Guide#

As ALCHEMI Toolkit-Ops is intended to be a low footprint library of lower level, high-performance kernels, the number of external dependencies is deliberately kept low as to keep the package lightweight and modular.

Prerequisites#

For the most part, ALCHEMI Toolkit-Ops shares the minimum prerequisites with NVIDIA Warp: the kernels can be run on a variety of CPU platforms (x86, ARM including Apple Silicon), with best performance provided on CUDA-capable NVIDIA GPUs running on the following operating systems:

  • Linux-based distributions with recent CUDA versions, drivers, and firmware, and Linux kernels

  • Windows, through WSL2

  • macOS (Apple Silicon only)

When running on CUDA-capable NVIDIA GPUs, we recommend:

  • CUDA Toolkit: 12 or higher

  • GPU Compute Capability: 8.0 or higher (A100 and newer)

  • Driver: NVIDIA driver 570.xx.xx or newer

CUDA 13#

Blackwell GPUs require CUDA 13. The default PyPI warp-lang package ships with CUDA 12 and needs a CUDA 13 variant. torch>=2.11.0 and jax[cuda13] publish CUDA 13 wheels on the default PyPI index for x86; on Arm platforms (e.g. NVIDIA DGX Spark), an extra index URL is required for PyTorch. See CUDA 13 Installation below for detailed steps.

Installation Methods#

From PyPI#

The most straightforward way to install ALCHEMI Toolkit-Ops is via PyPI:

$ pip install nvalchemi-toolkit-ops

Note

We recommend using uv for virtual environment, package management, and dependency resolution. uv can be obtained through their installation page found here.

Backend Extras#

ALCHEMI Toolkit-Ops provides optional extras for framework-specific bindings. Install the extra matching your deep learning backend:

$ pip install 'nvalchemi-toolkit-ops[torch]'

Verify the PyTorch bindings are available:

$ python -c "from nvalchemiops.torch import neighbors; print('PyTorch bindings available')"
$ pip install 'nvalchemi-toolkit-ops[jax]'

This installs JAX with CUDA 12 support. Verify the JAX bindings are available:

$ python -c "from nvalchemiops.jax import neighbors; print('JAX bindings available')"

From Github Source#

This approach is useful for obtain nightly builds by installing directly from the source repository:

$ pip install git+https://www.github.com/NVIDIA/nvalchemi-toolkit-ops.git

Installation via uv#

Maintainers generally use uv, and is the most reliable (and fastest) way to spin up a virtual environment to use ALCHEMI Toolkit-Ops. Assuming uv is in your path, here are a few ways to get started:

Stable, without cloning

This method is recommended for production use-cases, and when using ALCHEMI Toolkit-Ops as a dependency for your project. The Python version can be substituted for any other version supported by ALCHEMI Toolkit-Ops.

$ uv venv --seed --python 3.12
$ uv pip install nvalchemi-toolkit-ops
Nightly, with cloning

This method is recommended for local development and testing.

$ git clone git@github.com/NVIDIA/nvalchemi-toolkit-ops.git
$ cd nvalchemi-toolkit-ops
$ uv sync
# include torch backend
$ uv sync --extra torch
# include jax backend
$ uv sync --extra jax
# include both backends
$ uv sync --all-extras
Nightly, without cloning

Warning

Installing nightly versions without cloning the codebase is not recommended for production settings!

$ uv venv --seed --python 3.12
$ uv pip install git+https://www.github.com/NVIDIA/nvalchemi-toolkit-ops.git

Includes Sphinx and related tools for building documentation.

Adding nvalchemi-toolkit-ops as a dependency#

Nightly

Warning

Installing nightly versions without cloning the codebase is not recommended for production settings! We recommend pinning this to a release tag or commit hash.

$ uv add "nvalchemi-toolkit-ops @ git+https://www.github.com/NVIDIA/nvalchemi-toolkit-ops.git"
Stable
$ uv add nvalchemi-toolkit-ops

CUDA 13 Installation#

Blackwell GPUs require packages/dependencies that are build for CUDA 13, which includes warp-lang, jax, and torch; currently only torch>=2.11.0 provides x86 + CUDA 13 wheels without needing to specify via the default PyPI index.

Warp#

The PyPI warp-lang package ships with CUDA 12, and as of writing CUDA 13 wheels can be obtained from the Warp GitHub Releases page. Copy the URL of the appropriate +cu13 wheel for your platform and pass it to pip install. Select the wheel matching your architecture:

  • x86: manylinux_2_34_x86_64 variant

  • Arm (e.g. DGX Spark): manylinux_2_34_aarch64 variant

$ uv pip install https://github.com/NVIDIA/warp/releases/download/v1.12.1/warp_lang-1.12.1+cu13-py3-none-manylinux_2_34_aarch64.whl

Tip

Check the Warp releases page for newer versions. See the Warp installation guide for full details on installing specific versions from GitHub releases. The --force-reinstall flag may be needed to overwrite a previous installation.

PyTorch#

Starting with version 2.11.0, PyTorch publishes CUDA 13 (cu130) wheels on the default PyPI index for x86 platforms:

$ uv pip install torch==2.11.0

On Arm platforms (e.g. DGX Spark), cu130 wheels are not on the default index and must be pulled from the PyTorch wheel repository. PyTorch versions 2.9.0, 2.9.1, 2.10.0, and 2.11.0 provide Arm cu130 wheels:

$ uv pip install torch==2.11.0+cu130 \
    --extra-index-url https://download.pytorch.org/whl/cu130

JAX#

jax[cuda13] resolves from the default PyPI index on both x86 and Arm platforms:

$ uv pip install 'jax[cuda13]'

Full installation examples#

With cloning (for developers)#

$ git clone git@github.com:NVIDIA/nvalchemi-toolkit-ops.git
$ cd nvalchemi-toolkit-ops
$ uv sync --group dev
# Replace the default CUDA 12 wheels with CUDA 13 builds
$ uv pip install \
    https://github.com/NVIDIA/warp/releases/download/v1.12.1/warp_lang-1.12.1+cu13-py3-none-manylinux_2_34_x86_64.whl \
    torch==2.11.0 \
    'jax[cuda13]' \
    --force-reinstall
$ git clone git@github.com:NVIDIA/nvalchemi-toolkit-ops.git
$ cd nvalchemi-toolkit-ops
$ uv sync --group dev
# Replace the default CUDA 12 wheels with CUDA 13 builds
$ uv pip install \
    https://github.com/NVIDIA/warp/releases/download/v1.12.1/warp_lang-1.12.1+cu13-py3-none-manylinux_2_34_aarch64.whl \
    torch==2.11.0+cu130 \
    'jax[cuda13]' \
    --force-reinstall \
    --extra-index-url https://download.pytorch.org/whl/cu130
# Remove the CUDA 12 JAX plugins to avoid a plugin conflict
$ uv pip uninstall jax-cuda12-pjrt jax-cuda12-plugin

Note

The --force-reinstall flag is needed in the developer flow because uv sync will have already installed default CUDA 12 wheels. On x86, torch>=2.11.0 and jax[cuda13] resolve from the default PyPI index. On Arm, the --extra-index-url is required for PyTorch; versions 2.9.0, 2.9.1, 2.10.0, and 2.11.0 provide Arm cu130 wheels. jax[cuda13] resolves from the default index on both architectures.

The jax extra in pyproject.toml pulls in jax[cuda12], so uv sync --all-extras installs the CUDA 12 PJRT plugin and shared library. These must be removed after installing jax[cuda13], otherwise JAX will error with ALREADY_EXISTS: PJRT_Api already exists for device type cuda at import time. This only applies to the developer flow — a clean venv (as in the “without cloning” path) does not have the CUDA 12 plugins and does not need this step.

Installation with Conda & Mamba#

The installation procedure should be similar to other environment management tools when using either conda or mamba managers; assuming installation from a fresh environment:

# create a new environment named nvalchemi if needed
mamba create -n nvalchemi python=3.12 pip
mamba activate nvalchemi
pip install nvalchemi-toolkit-ops

Docker Usage#

Given the modular nature of nvalchemiops, we do not provide a base Docker image. Instead, the snippet below is a suggested base image that follows the requirements of NVIDIA warp-lang, and installs uv for Python management:

# uses a lightweight Ubuntu-based image with CUDA 13
FROM nvidia/cuda:13.0.0-runtime-ubuntu24.04

# grab package updates and other system dependencies here
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*
# copy uv for venv management
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

RUN uv venv --seed --python 3.12 /opt/venv
# this sets the default virtual environment to use
ENV VIRTUAL_ENV=/opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# install ALCHEMI Toolkit-Ops
RUN uv pip install nvalchemi-toolkit-ops

This image can potentially be used as a basis for your application and/or development environment. Your host system should have the NVIDIA Container Toolkit installed, and at runtime, include --gpus all as a flag to container run statements to ensure that GPUs are exposed to the container.

Next Steps#

You should now have a local installation of nvalchemiops ready for whatever your use case might be! To verify, you can always run:

$ python -c "import nvalchemiops; print(nvalchemiops.__version__)"

If that doesn’t resolve, make sure you’ve activated your virtual environment. Once you’ve verified your installation, you can:

  1. Explore examples & benchmarks: Check the examples/ directory for tutorials

  2. Read Documentation: Browse the user and API documentation to determine how to integrate ALCHEMI Toolkit-Ops into your application.