Audio Utils#

namespace trt_edgellm
namespace rt
namespace audioUtils
struct AudioData#
#include <audioUtils.h>

Audio input container. Holds raw mono FP32 PCM (host) decoded from .wav / .mp3 / .flac by audioLoader. The audio runner reads pcm, extracts mel internally per its audio/config.json, and consumes the resulting GPU mel.

TTS-side output fields (waveform / codebookCodes) are documented inline.

Public Members

std::shared_ptr<rt::audio::AudioPCM> pcm#

Raw audio waveform: mono FP32, host. Sample rate matches the runner’s MelExtractor expectation (16 kHz for whisper / parakeet).

std::shared_ptr<Tensor> waveform#

Waveform samples [1, numSamples], FP16, range [-1, 1], CPU.

int32_t sampleRate = {24000}#

Sample rate in Hz.

int32_t numChannels = {1}#

Number of audio channels (typically 1 for mono)

std::vector<std::vector<int32_t>> codebookCodes#

RVQ codebook codes [numCodebooks][seqLen].

bool hasWaveform = {false}#

True if waveform contains valid data.