Alpamayo1 Action Runner#

class Alpamayo1ActionRunner#

Standalone action / diffusion head for Alpamayo 1 trajectory prediction.

Consumes the VLM KV cache after generation and produces future trajectory waypoints via a flow-matching denoising loop.

Public Functions

Alpamayo1ActionRunner( std::string const &engineDir, cudaStream_t stream, KVCacheManager::Config const &kvCacheConfig, )#

Load action engine, config, and allocate tensors.

config.json must include rope_theta and num_hidden_layers (decoder layer count)

Parameters:

engineDir – Path to directory containing action.engine and config.json
stream – CUDA stream for operations
kvCacheConfig – KV cache layout from the LLM (from KVCacheManager::Config())

Throws:

std::runtime_error – If engine loading, configuration parsing, or allocation fails

~Alpamayo1ActionRunner() noexcept = default#

int64_t getRequiredContextMemorySize() const#

Get the required context memory size for this engine.

Returns:: Required context memory size in bytes

bool setContextMemory(rt::Tensor &sharedContextMemory)#

Set shared context memory for the execution context.

Note

The tensor size must be >= getRequiredContextMemorySize(). Must be called before infer().

Parameters:: sharedContextMemory – Tensor containing the shared device memory (must be on GPU)
Returns:: True on success, false if the tensor is too small

inline action::ActionModelType getModelType() const noexcept#

Get action head model type.

Returns:: Action model type

inline void setNoiseSeed(int32_t seed) noexcept#

Set the random seed used when initializing the diffusion noise trajectory.

Parameters:: seed – Random seed value

inline int32_t getMaxKVCacheCapacity() const noexcept#

Get the max KV cache capacity the action engine was built with.

Returns:: Maximum KV cache capacity (from builder_config in engine config.json)

std::vector<std::vector<FutureTrajectoryPoint>> sampleTrajectory( cudaStream_t stream, int32_t activeBatchSize, HybridCacheManager &kvcache, std::vector<int64_t> const &vlmOutputsRopeDeltas, )#

Run one batched diffusion/flow-matching loop and return future trajectory waypoints for all batch items. Call preprocess() once per request before this (prefill path).

Parameters:

stream – CUDA stream for operations
activeBatchSize – Number of active sequences
kvcache – KV cache containing the VLM outputs; used for KV cache lengths and layer tensors
vlmOutputsRopeDeltas – Per-batch VLM RoPE deltas (e.g. from vision runner getMropeRopeDeltasPerBatch); size must match batch.

bool preprocess( LLMGenerationRequest const &request, std::vector<std::vector<int32_t>> &batchedInputIds, tokenizer::Tokenizer const *tokenizer, )#

Preprocess batched token IDs for Alpamayo (e.g. replace <|traj_history|> pads with trajectory tokens), validate batch size against allocated action buffers, and initialize diffusion noise for the request.

Returns:: True on success, false if trajectory placeholder fill or noise setup fails

struct ActionConfig#

Configuration parsed from the action engine’s config.json.

Public Members

float ropeTheta = {0.0F}#: RoPE base frequency (from rope_theta)

int32_t mropeSectionH = {0}#: MRoPE frequency pairs for height dimension (from rope_parameters.mrope_section[1])

int32_t mropeSectionW = {0}#: MRoPE frequency pairs for width dimension (from rope_parameters.mrope_section[2])

int32_t numDecoderLayers = {0}#: Number of transformer decoder layers (from num_hidden_layers)

int32_t numTrajTokens = {0}#: Number of trajectory history tokens (from num_traj_tokens)

int32_t trajTokenStart = {0}#: Vocabulary ID of the first trajectory token (from traj_token_start)

int32_t maxKVCacheCapacity = {0}#: Maximum KV cache sequence capacity (from builder_config.max_kv_cache_capacity)

int32_t numKVHeads = {0}#: Number of key-value heads (from num_key_value_heads)

int32_t headDim = {0}#: Head dimension (from head_dim)