Builder#
-
class LLMBuilder#
Builder class for Large Language Model TensorRT engines. Handles the complete process of building TensorRT engines from ONNX models for various types of LLMs including standard models, Eagle models, and VLMs.
Public Functions
- LLMBuilder(
- std::filesystem::path const &onnxDir,
- std::filesystem::path const &engineDir,
- LLMBuilderConfig const &config
Constructor for LLMBuilder.
- Parameters:
onnxDir – Directory containing the ONNX model and configuration files
engineDir – Directory where the built engine and related files will be saved
config – Configuration object specifying build parameters
-
~LLMBuilder() = default#
Destructor.
-
bool build()#
Build the TensorRT engine from the ONNX model. This method performs the complete build process including:
Loading and parsing the ONNX model
Setting up optimization profiles
Building the TensorRT engine
Copying necessary files to the engine directory
- Returns:
true if build was successful, false otherwise
-
class VisualBuilder#
Builder class for visual encoder TensorRT engines. Handles the complete process of building TensorRT engines from ONNX models for visual encoders used in Vision-Language Models.
Public Functions
- VisualBuilder(
- std::filesystem::path const &onnxDir,
- std::filesystem::path const &engineDir,
- VisualBuilderConfig const &config
Constructor for VisualBuilder.
- Parameters:
onnxDir – Directory containing the ONNX model and configuration files
engineDir – Directory where the built engine and related files will be saved
config – Configuration object specifying build parameters
-
~VisualBuilder() = default#
Destructor.
-
bool build()#
Build the TensorRT engine from the ONNX model. This method performs the complete build process including:
Loading and parsing the ONNX model
Setting up optimization profiles
Building the TensorRT engine
Copying necessary files to the engine directory
- Returns:
true if build was successful, false otherwise
-
struct LLMBuilderConfig#
Configuration structure for LLM model building. Contains all parameters needed to configure the TensorRT engine building process for Large Language Models, including standard LLMs, Eagle models, and Vision-Language Models.
Public Functions
-
inline Json toJson() const#
Convert configuration to JSON format for serialization.
- Returns:
JSON object containing all configuration parameters
-
inline std::string toString() const#
Convert configuration to human-readable string format.
- Returns:
String representation of the configuration for debugging/logging
Public Members
-
int64_t maxInputLen = {128}#
Maximum input sequence length for the model.
-
bool isVlm = {false}#
Whether this is a Vision-Language Model (VLM)
-
int64_t minImageTokens = {4}#
Minimum number of image tokens (VLM only)
-
int64_t maxImageTokens = {1024}#
Maximum number of image tokens (VLM only)
-
bool eagleDraft = {false}#
Whether this is an Eagle draft model.
-
bool eagleBase = {false}#
Whether this is an Eagle base model.
-
int64_t maxBatchSize = {4}#
Maximum batch size for inference.
-
int64_t maxLoraRank = {0}#
Maximum LoRA rank (0 = no LoRA support)
-
int64_t maxKVCacheCapacity = {4096}#
Maximum KV cache capacity (sequence length)
-
int64_t maxVerifyTreeSize = {60}#
Maximum length of input_ids passed into Eagle base model for tree verification.
-
int64_t maxDraftTreeSize = {60}#
Maximum length of input_ids passed into Eagle draft model for draft generation.
Public Static Functions
-
static inline LLMBuilderConfig fromJson(Json const &json)#
Create configuration from JSON format.
- Parameters:
json – JSON object containing configuration parameters
- Returns:
LLMBuilderConfig object with parsed parameters
-
inline Json toJson() const#
-
struct VisualBuilderConfig#
Configuration structure for visual model building. Contains parameters needed to configure the TensorRT engine building process for visual encoders used in Vision-Language Models.
Public Functions
-
inline Json toJson() const#
Convert configuration to JSON format for serialization.
- Returns:
JSON object containing all configuration parameters
-
inline std::string toString() const#
Convert configuration to human-readable string format.
- Returns:
String representation of the configuration for debugging/logging
Public Members
-
int64_t minImageTokens = {4}#
Minimum number of image tokens in a batch.
-
int64_t maxImageTokens = {1024}#
Maximum number of image tokens in a batch.
-
int64_t maxImageTokensPerImage = {512}#
Maximum number of image tokens per image, used for preprocessing.
Public Static Functions
-
static inline VisualBuilderConfig fromJson(Json const &json)#
Create configuration from JSON format.
- Parameters:
json – JSON object containing configuration parameters
- Returns:
VisualBuilderConfig object with parsed parameters
-
inline Json toJson() const#