LLM Builder#

class LLMBuilder

Builder class for Large Language Model TensorRT engines. Handles the complete process of building TensorRT engines from ONNX models for various types of LLMs including standard models, Eagle models, and VLMs.

Public Functions

LLMBuilder(
std::filesystem::path const &onnxDir,
std::filesystem::path const &engineDir,
LLMBuilderConfig const &config
)

Constructor for LLMBuilder.

Parameters:
  • onnxDir – Directory containing the ONNX model and configuration files

  • engineDir – Directory where the built engine and related files will be saved

  • config – Configuration object specifying build parameters

~LLMBuilder() noexcept = default

Destructor.

bool build()

Build the TensorRT engine from the ONNX model. This method performs the complete build process including:

  • Loading and parsing the ONNX model

  • Setting up optimization profiles

  • Building the TensorRT engine

  • Copying necessary files to the engine directory

Throws:

std::filesystem::filesystem_error – If filesystem operations fail

Returns:

true if build was successful, false otherwise

struct LLMBuilderConfig

Configuration structure for LLM model building. Contains all parameters needed to configure the TensorRT engine building process for Large Language Models, including standard LLMs and Eagle models.

Public Functions

inline Json toJson() const

Convert configuration to JSON format for serialization.

Returns:

JSON object containing all configuration parameters

inline std::string toString() const

Convert configuration to human-readable string format.

Returns:

String representation of the configuration for debugging/logging

Public Members

int64_t maxInputLen = {1024}

Maximum input sequence length for the model.

bool eagleDraft = {false}

Whether this is an Eagle draft model.

bool eagleBase = {false}

Whether this is an Eagle base model.

int64_t maxBatchSize = {4}

Maximum batch size for inference.

int64_t maxLoraRank = {0}

Maximum LoRA rank (0 = no LoRA support)

int64_t maxKVCacheCapacity = {4096}

Maximum KV cache capacity (sequence length)

int64_t maxVerifyTreeSize = {60}

Maximum length of input_ids passed into Eagle base model for tree verification.

int64_t maxDraftTreeSize = {60}

Maximum length of input_ids passed into Eagle draft model for draft generation.

bool useTrtNativeOps = {false}

Whether to use TensorRT native operations instead of custom plugin.

Public Static Functions

static inline LLMBuilderConfig fromJson(Json const &json)

Create configuration from JSON format.

Parameters:

json – JSON object containing configuration parameters

Throws:

nlohmann::json::type_error – If JSON value types don’t match expected types

Returns:

LLMBuilderConfig object with parsed parameters