Audio Builder#

class AudioBuilder#

Builder class for audio-related TensorRT engines. Handles the complete process of building TensorRT engines from ONNX models for audio encoders and Code2Wav vocoders used in multimodal models (e.g., Qwen3-Omni).

Build type is auto-detected from config.json:

  • If “audio_config” exists → builds audio_encoder.engine

  • If “code2wav_config” exists → builds code2wav.engine

Public Functions

AudioBuilder(
std::filesystem::path const &onnxDir,
std::filesystem::path const &engineDir,
AudioBuilderConfig const &config
)#

Constructor for AudioBuilder.

Parameters:
  • onnxDir – Directory containing the ONNX model and configuration files

  • engineDir – Directory where the built engine and related files will be saved

  • config – Configuration object specifying build parameters

Throws:

std::filesystem::filesystem_error – if path operations fail

~AudioBuilder() noexcept = default#

Destructor.

bool build()#

Build the TensorRT engine from the ONNX model. This method performs the complete build process including:

  • Loading and parsing the ONNX model

  • Setting up optimization profiles

  • Building the TensorRT engine

  • Copying necessary files to the engine directory

Throws:
  • std::runtime_error – if critical build errors occur (file I/O, TensorRT API failures)

  • nlohmann::json::exception – if JSON parsing fails

Returns:

true if build was successful, false otherwise

struct AudioBuilderConfig#

Configuration structure for audio model building. Contains user-specified build parameters for optimization profiles. Model-specific parameters (mel_bins, num_quantizers, etc.) are automatically read from config.json - do NOT specify them here.

Public Functions

Json toJson() const noexcept#

Convert configuration to JSON format for serialization.

Returns:

JSON object containing all configuration parameters

std::string toString() const#

Convert configuration to human-readable string format.

Returns:

String representation of the configuration for debugging/logging

Public Members

int64_t minTimeSteps = {100}#

Minimum audio time steps.

int64_t maxTimeSteps = {6000}#

Maximum audio time steps.

int64_t minCodeLen = {1}#

Minimum code sequence length in frames.

int64_t optCodeLen = {300}#

Optimal code sequence length (default matches Qwen3-Omni chunked decode)

int64_t maxCodeLen = {2000}#

Maximum code sequence length in frames.

Public Static Functions

static AudioBuilderConfig fromJson(Json const &json)#

Create configuration from JSON format.

Parameters:

json – JSON object containing configuration parameters

Returns:

AudioBuilderConfig object with parsed parameters