Nemotron Omni Audio Runner#

class NemotronOmniAudioRunner : public trt_edgellm::rt::MultimodalRunner#

Runner for Nemotron-Omni Parakeet audio encoder.

Handles audio preprocessing and encoder inference for Nemotron-Omni’s Parakeet-based audio encoder. The encoder takes mel-spectrogram features and an attention mask, producing audio embeddings projected to LLM hidden size.

Public Functions

NemotronOmniAudioRunner(
std::string const &engineDir,
cudaStream_t stream
)#

Constructor for NemotronOmniAudioRunner.

Parameters:
  • engineDir[in] Directory containing the audio encoder engine

  • stream[in] CUDA stream for execution

Throws:

std::runtime_error – if engine loading fails or configuration is invalid

~NemotronOmniAudioRunner() noexcept = default#
virtual bool preprocess(
rt::LLMGenerationRequest const &request,
std::vector<std::vector<int32_t>> &batchedInputIds,
tokenizer::Tokenizer const *tokenizer,
rt::Tensor &ropeRotaryCosSinDevice,
cudaStream_t stream,
bool imageOnly = false
) override#

Preprocess multimodal input including audio and text.

Parameters:
  • request[in] LLM generation request containing audio and text

  • batchedInputIds[inout] Batched input token IDs after preprocessing

  • tokenizer[in] Tokenizer for text processing

  • ropeRotaryCosSinDevice[inout] RoPE rotary position encoding cache (unused by this model)

  • stream[in] CUDA stream for execution

Returns:

True if preprocessing succeeded, false otherwise

virtual bool infer(cudaStream_t stream) override#

Run inference on the audio encoder.

Parameters:

stream[in] CUDA stream for execution

Returns:

True if inference succeeded, false otherwise

virtual bool validateAndFillConfig(
std::string const &engineDir
) override#

Validate and load configuration from JSON file.

Parameters:

engineDir[in] Path to engine directory

Returns:

True if configuration is valid and loaded successfully, false otherwise

virtual bool allocateBuffer(cudaStream_t stream) override#

Allocate buffers for inference.

Parameters:

stream[in] CUDA stream for execution

Returns:

True if allocation succeeded, false otherwise

virtual rt::Tensor &getOutputEmbedding() override#

Get audio embeddings from encoder output.

Returns:

Reference to audio embedding tensor

struct NemotronOmniAudioConfig#

Configuration for Nemotron-Omni Parakeet audio encoder.

Public Members

int32_t melBins = {0}#

Number of mel-frequency bins.

int32_t audioFeatureDim = {0}#

Audio feature dimension (LLM hidden size)

int32_t subsamplingFactor = {0}#

Parakeet subsampling factor.

int32_t samplingRate = {0}#

Audio sampling rate.

int32_t soundContextTokenId = {0}#

<so_embedding> token ID

int32_t vocabSize = {0}#

Vocabulary size (audio token ID offset)