LLM Builder#
-
class LLMBuilder#
Builder class for Large Language Model TensorRT engines. Handles the complete process of building TensorRT engines from ONNX models for various types of LLMs including standard, speculative-decoding, and VLM models.
Public Functions
- LLMBuilder(
- std::filesystem::path const &onnxDir,
- std::filesystem::path const &engineDir,
- LLMBuilderConfig const &config
Constructor for LLMBuilder.
- Parameters:
onnxDir – Directory containing the ONNX model and configuration files
engineDir – Directory where the built engine and related files will be saved
config – Configuration object specifying build parameters
-
~LLMBuilder() noexcept = default#
Destructor.
-
bool build()#
Build the TensorRT engine from the ONNX model. This method performs the complete build process including:
Loading and parsing the ONNX model
Setting up optimization profiles
Building the TensorRT engine
Copying necessary files to the engine directory
- Throws:
std::filesystem::filesystem_error – If filesystem operations fail
- Returns:
true if build was successful, false otherwise
-
struct LLMBuilderConfig#
Configuration structure for LLM model building. Contains all parameters needed to configure the TensorRT engine building process for Large Language Models, including standard and speculative-decoding engines.
Public Functions
-
inline Json toJson() const#
Convert configuration to JSON format for serialization.
- Returns:
JSON object containing all configuration parameters
-
inline std::string toString() const#
Convert configuration to human-readable string format.
- Returns:
String representation of the configuration for debugging/logging
Public Members
-
int64_t maxInputLen = {1024}#
Maximum input sequence length for the model.
-
bool specDraft = {false}#
Whether this is a speculative-decoding draft model.
-
bool specBase = {false}#
Whether this is a speculative-decoding base model.
-
int64_t maxBatchSize = {4}#
Maximum batch size for inference.
-
int64_t maxLoraRank = {0}#
Maximum LoRA rank (0 = no LoRA support)
-
int64_t maxKVCacheCapacity = {4096}#
Maximum KV cache capacity (sequence length)
-
int64_t maxVerifyTreeSize = {60}#
Maximum length of input_ids passed into spec base model for verification.
-
int64_t maxDraftTreeSize = {60}#
Maximum length of input_ids passed into spec draft model for draft generation.
-
bool useTrtNativeOps = {false}#
Whether to use TensorRT native operations instead of custom plugin.
-
bool profilingDetailed = {false}#
Enable detailed profiling verbosity for layer info extraction.
Public Static Functions
-
static inline LLMBuilderConfig fromJson(Json const &json)#
Create configuration from JSON format.
- Parameters:
json – JSON object containing configuration parameters
- Throws:
nlohmann::json::type_error – If JSON value types don’t match expected types
- Returns:
LLMBuilderConfig object with parsed parameters
-
inline Json toJson() const#