Audio Builder#
-
class AudioBuilder#
Builder class for audio-related TensorRT engines. Handles the complete process of building TensorRT engines from ONNX models for audio encoders and Code2Wav vocoders used in multimodal models (e.g., Qwen3-Omni).
Build type is auto-detected from config.json:
If “audio_config” exists → builds audio_encoder.engine
If “code2wav_config” exists → builds code2wav.engine
Public Functions
- AudioBuilder(
- std::filesystem::path const &onnxDir,
- std::filesystem::path const &engineDir,
- AudioBuilderConfig const &config
Constructor for AudioBuilder.
- Parameters:
onnxDir – Directory containing the ONNX model and configuration files
engineDir – Directory where the built engine and related files will be saved
config – Configuration object specifying build parameters
- Throws:
std::filesystem::filesystem_error – if path operations fail
-
~AudioBuilder() noexcept = default#
Destructor.
-
bool build()#
Build the TensorRT engine from the ONNX model. This method performs the complete build process including:
Loading and parsing the ONNX model
Setting up optimization profiles
Building the TensorRT engine
Copying necessary files to the engine directory
- Throws:
std::runtime_error – if critical build errors occur (file I/O, TensorRT API failures)
nlohmann::json::exception – if JSON parsing fails
- Returns:
true if build was successful, false otherwise
-
struct AudioBuilderConfig#
Configuration structure for audio model building. Contains user-specified build parameters for optimization profiles. Model-specific parameters (mel_bins, num_quantizers, etc.) are automatically read from config.json - do NOT specify them here.
Public Functions
-
Json toJson() const noexcept#
Convert configuration to JSON format for serialization.
- Returns:
JSON object containing all configuration parameters
-
std::string toString() const#
Convert configuration to human-readable string format.
- Returns:
String representation of the configuration for debugging/logging
Public Members
-
int64_t minTimeSteps = {100}#
Minimum audio time steps.
-
int64_t maxTimeSteps = {6000}#
Maximum audio time steps.
-
int64_t minCodeLen = {1}#
Minimum code sequence length in frames.
-
int64_t optCodeLen = {300}#
Optimal code sequence length (default matches Qwen3-Omni chunked decode)
-
int64_t maxCodeLen = {2000}#
Maximum code sequence length in frames.
Public Static Functions
-
static AudioBuilderConfig fromJson(Json const &json)#
Create configuration from JSON format.
- Parameters:
json – JSON object containing configuration parameters
- Returns:
AudioBuilderConfig object with parsed parameters
-
Json toJson() const noexcept#