Mel Spectrogram#
-
class MelExtractor#
CPU mel-spectrogram extractor.
Takes mono float32 PCM and produces a host-resident
Tensorof shape determined by the config’s layout + padding. Pipeline mirrors HF feature extractors numerically (windowing, hand-rolled radix-2 FFT with direct-DFT fallback for odd sizes, power spectrum, mel filter mat-mul, log, optional post-normalize).Public Functions
-
MelExtractor() noexcept#
Default-constructed extractor is empty; runners move-assign one from
makeWhisperExtractor/makeParakeetExtractorinvalidateAndFillConfigbefore anyextractcall.
-
explicit MelExtractor(MelExtractorConfig cfg)#
-
~MelExtractor()#
-
MelExtractor(MelExtractor const&) = delete#
-
MelExtractor &operator=(MelExtractor const&) = delete#
-
MelExtractor(MelExtractor&&) noexcept#
-
MelExtractor &operator=(MelExtractor&&) noexcept#
-
bool extract(AudioPCM const &pcm, Tensor &out)#
Extract mel-spectrogram from
pcmintoout.pcm.sampleRatemust equalconfig.sampleRate(caller responsibility — pass the righttargetSampleRatetoloadAudioBytes).- Parameters:
pcm – Mono float32 PCM.
out – Populated with mel data on host memory (caller may copy to device).
- Returns:
true on success, false on bad input / config mismatch.
-
inline MelExtractorConfig const &config() const noexcept#
-
struct Impl#
Public Members
-
std::vector<float> windowFn#
Length winLength.
-
std::vector<float> melFilterStorage#
Used only when config.melFilter is null.
-
float const *melFilterPtr = {nullptr}#
-
int32_t nBins = {0}#
-
std::vector<float> preempBuf#
Scratch for full-waveform preemph.
-
SinCosTable sinCos#
Twiddle factors, size = cfg.nFFT (built at init).
-
std::vector<float> windowFn#
-
MelExtractor() noexcept#
-
struct MelExtractorConfig#
Configuration for one mel-spectrogram extractor instance.
Public Members
-
std::string name#
Display name used in log messages.
-
int32_t sampleRate = {16000}#
-
int32_t nFFT = {400}#
-
int32_t hopLength = {160}#
-
int32_t winLength = {400}#
Window length (typically == nFFT for Whisper, 400 in a 512 FFT for Parakeet).
-
int32_t nMel = {128}#
-
float minFrequencyHz = {0.0f}#
Min/max frequency for the mel filter bank. Default 0..sr/2 matches HF Whisper.
-
float maxFrequencyHz = {-1.0f}#
Negative -> sample_rate / 2.
-
float preemphCoeff = {0.0f}#
Pre-emphasis filter
y[t] = x[t] - coeff * x[t-1]. Disabled whenpreemphCoeff == 0. WhenpreemphPostScale != 0the filtered frame is also multiplied by it.
-
float preemphPostScale = {0.0f}#
-
WindowType windowType = {WindowType::kHannPeriodic}#
-
bool windowCentredInFft = {true}#
Where the window sits inside the nFFT-sized FFT input buffer when
winLength < nFFT. HF Whisper / Parakeet (torch.stft-style) centre the window: source[start+pad, start+pad+winLen)-> buffer[pad, pad+winLen),pad = (nFFT-winLen)/2. Left-aligned mode (unfold + rfft(n=nFFT)) maps source[start, start+winLen)-> buffer[0, winLen)with trailing zeros. Ignored when winLen == nFFT.
-
MelScale melScale = {MelScale::kHtk}#
-
MelNorm melNorm = {MelNorm::kSlaney}#
-
bool triangulariseInMelSpace = {false}#
When true, build triangle filters with their slopes linear in mel space rather than Hz. Used together with
MelScale::kKaldi.
-
LogType logType = {LogType::kLog10}#
-
LogFloorMode logFloorMode = {LogFloorMode::kMax}#
-
float logFloor = {1e-10f}#
Per-FE: Whisper 1e-10, Parakeet 2^-24.
-
MelLayout layout = {MelLayout::kMelTime}#
-
PostNormalize postNormalize = {PostNormalize::kWhisperClamp}#
-
TimePadding timePadding = {TimePadding::kNone}#
-
int32_t staticTimeLength = {0}#
-
FramePadding framePadding = {FramePadding::kLeftAlignedZero}#
-
bool dropLastStftFrame = {false}#
When true, drop the last STFT frame before mel filter / log / post-norm, matching HF Whisper / Parakeet’s
stft[..., :-1]and the original Whisper reference. Without this the post-normalize statistics (whisper max-clamp, parakeet per-feature mean/std) drift from HF by O(1e-1) at frame boundaries even though the underlying mel-power values are byte-identical.
-
float const *melFilter = {nullptr}#
Pointer to a precomputed mel filter bank of shape
[nMel × (nFFT/2 + 1)]in row-major order. Generated offline byscripts/gen_mel_filter_bank.pyand embedded as a static array. Lifetime must outlive the extractor (typically pointer to astatic constexprtable).
-
std::string name#