Container Images#
TensorRT LLM uses a multi-stage Dockerfile that produces three image types:
Stage |
Purpose |
NGC Image |
|---|---|---|
|
Development environment with all build dependencies pre-installed. No TensorRT LLM source or wheel included. Mount your source checkout and build inside. |
|
|
Intermediate build stage. Extends |
– |
|
Runtime image. Extends |
Pre-built Images on NGC#
The devel and release images are published to NGC and can be pulled directly:
# Pull the development image (for building from source)
docker pull nvcr.io/nvidia/tensorrt-llm/devel:1.3.0rc13
# Pull the release image (ready to run)
docker pull nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc13
Replace x.y.z with the desired version. Browse the available tags for devel and release on NGC.
Container image tags
In the example shell commands, x.y.z corresponds to the TensorRT-LLM container
version to use. If omitted, IMAGE_TAG will default to tensorrt_llm.__version__
(e.g., this documentation was generated from the 1.3.0rc13 source tree).
If this does not work, e.g., because a container for the version you are
currently working with has not been released yet, you can try using a
container published for a previous
GitHub pre-release or release
(see also NGC Catalog).
Building Images Locally#
All local image builds require the TensorRT LLM source tree and approximately 63 GB of free disk space. Clone the repository first if you have not already:
git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
git lfs pull
Build the devel Image#
On systems with GNU make
Create a Docker image for development. The image will be tagged locally with tensorrt_llm/devel:latest.
make -C docker build
Run the container:
make -C docker run
If you prefer to work with your own user account in that container, instead of root, add the LOCAL_USER=1 option.
make -C docker run LOCAL_USER=1
On systems without GNU make
docker build --pull \
--target devel \
--file docker/Dockerfile.multi \
--tag tensorrt_llm/devel:latest \
.
docker run --rm -it \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all \
--volume ${PWD}:/code/tensorrt_llm \
--workdir /code/tensorrt_llm \
tensorrt_llm/devel:latest
Note: please make sure to set --ipc=host as a docker run argument to avoid Bus error (core dumped).
Once inside the container, follow the steps in Building from Source (starting from Step 4) to build TensorRT LLM.
Build the release Image (One Step)#
This builds TensorRT LLM from source and installs it into a single ready-to-run container image.
make -C docker release_build
You can add the CUDA_ARCHS="<list of architectures in CMake format>" optional argument to specify which architectures should be supported by TensorRT LLM. It restricts the supported GPU architectures but helps reduce compilation time:
# Restrict the compilation to Ada and Hopper architectures.
make -C docker release_build CUDA_ARCHS="89-real;90-real"
After the image is built, the Docker container can be run.
make -C docker release_run
The make command supports the LOCAL_USER=1 argument to switch to the local user account instead of root inside the container. The examples of TensorRT LLM are installed in the /app/tensorrt_llm/examples directory.
Using Enroot (Slurm Clusters)#
If you wish to use enroot instead of Docker, you can build a sqsh file that has the identical environment as the development image tensorrt_llm/devel:latest.
Allocate a compute node:
salloc --nodes=1
Create a sqsh file with essential TensorRT LLM dependencies installed:
# Using default sqsh filename (enroot/tensorrt_llm.devel.sqsh) make -C enroot build_sqsh # Or specify a custom path (optional) make -C enroot build_sqsh SQSH_PATH=/path/to/dev_trtllm_image.sqsh
Once this squash file is ready, you can follow the steps under Building from Source by launching an enroot sandbox:
export SQSH_PATH=/path/to/dev_trtllm_image.sqsh # Start a pseudo terminal for interactive session make -C enroot run_sqsh # Or, you could run commands directly make -C enroot run_sqsh RUN_CMD="python3 scripts/build_wheel.py"
Advanced Topics#
For more information on building and running various TensorRT LLM container images, check NVIDIA/TensorRT-LLM.