Code2 Wav Runner#

class Code2WavRunner#

Runner for Qwen3-Omni Code2Wav vocoder.

This class handles Code2Wav vocoder inference, converting RVQ codec codes to audio waveform. It follows the same pattern as AudioRunner and QwenViTRunner for consistency.

Public Functions

Code2WavRunner(std::string const &engineDir, cudaStream_t stream)#

Constructor for Code2WavRunner.

Parameters:
  • engineDir[in] Directory containing the Code2Wav engine

  • stream[in] CUDA stream for execution

~Code2WavRunner() noexcept = default#
bool generateWaveform(
std::vector<std::vector<int32_t>> const &codes,
rt::audioUtils::AudioData &outputAudio,
cudaStream_t stream
)#

Generate audio waveform from RVQ codes (single sample)

Parameters:
  • codes[in] RVQ codec codes [numQuantizers][seqLen]

  • outputAudio[out] Output audio waveform data

  • stream[in] CUDA stream for execution

Returns:

True if generation succeeded, false otherwise

inline Code2WavConfig const &getConfig() const#

Get Code2Wav configuration.

Returns:

Reference to Code2Wav config

inline int64_t getExpectedWaveformLength(int64_t codeLen) const#

Get expected waveform length for given code length.

Parameters:

codeLen[in] Number of codec frames

Returns:

Expected waveform length in samples

struct Code2WavConfig#

Configuration for Qwen3-Omni Code2Wav vocoder All values initialized to 0 and must be read from config.json. Do NOT use hardcoded defaults that may not match the actual model.

Public Members

int32_t numQuantizers = {0}#

Number of RVQ quantizer layers.

int32_t codebookSize = {0}#

Codebook vocabulary size per layer.

int32_t hiddenSize = {0}#

Code2Wav hidden dimension.

int32_t decoderDim = {0}#

Decoder base dimension.

int32_t upsampleRate = {0}#

Total upsampling rate (codes → samples)

int32_t sampleRate = {24000}#

Output audio sample rate (24kHz for Qwen3-Omni)

int32_t chunkSize = {300}#

Chunk size for long sequences (in codec frames)

int32_t leftContextSize = {25}#

Overlap size to avoid boundary artifacts.