Alpamayo1 Action Runner#

class Alpamayo1ActionRunner#

Standalone action / diffusion head for Alpamayo 1 trajectory prediction.

Consumes the VLM KV cache after generation and produces future trajectory waypoints via a flow-matching denoising loop.

Public Functions

Alpamayo1ActionRunner(
std::string const &engineDir,
cudaStream_t stream,
KVCacheManager::Config const &kvCacheConfig
)#

Load action engine, config, and allocate tensors.

config.json must include rope_theta and num_hidden_layers (decoder layer count)

Parameters:
  • engineDir – Path to directory containing action.engine and config.json

  • stream – CUDA stream for operations

  • kvCacheConfig – KV cache layout from the LLM (from KVCacheManager::Config())

Throws:

std::runtime_error – If engine loading, configuration parsing, or allocation fails

~Alpamayo1ActionRunner() noexcept = default#
int64_t getRequiredContextMemorySize() const#

Get the required context memory size for this engine.

Returns:

Required context memory size in bytes

bool setContextMemory(rt::Tensor &sharedContextMemory)#

Set shared context memory for the execution context.

Note

The tensor size must be >= getRequiredContextMemorySize(). Must be called before infer().

Parameters:

sharedContextMemoryTensor containing the shared device memory (must be on GPU)

Returns:

True on success, false if the tensor is too small

inline action::ActionModelType getModelType() const noexcept#

Get action head model type.

Returns:

Action model type

inline void setNoiseSeed(int32_t seed) noexcept#

Set the random seed used when initializing the diffusion noise trajectory.

Parameters:

seed – Random seed value

inline int32_t getMaxKVCacheCapacity() const noexcept#

Get the max KV cache capacity the action engine was built with.

Returns:

Maximum KV cache capacity (from builder_config in engine config.json)

std::vector<std::vector<FutureTrajectoryPoint>> sampleTrajectory(
cudaStream_t stream,
int32_t activeBatchSize,
HybridCacheManager &kvcache,
std::vector<int64_t> const &vlmOutputsRopeDeltas
)#

Run one batched diffusion/flow-matching loop and return future trajectory waypoints for all batch items. Call preprocess() once per request before this (prefill path).

Parameters:
  • stream – CUDA stream for operations

  • activeBatchSize – Number of active sequences

  • kvcache – KV cache containing the VLM outputs; used for KV cache lengths and layer tensors

  • vlmOutputsRopeDeltas – Per-batch VLM RoPE deltas (e.g. from vision runner getMropeRopeDeltasPerBatch); size must match batch.

bool preprocess(
LLMGenerationRequest const &request,
std::vector<std::vector<int32_t>> &batchedInputIds,
tokenizer::Tokenizer const *tokenizer
)#

Preprocess batched token IDs for Alpamayo (e.g. replace <|traj_history|> pads with trajectory tokens), validate batch size against allocated action buffers, and initialize diffusion noise for the request.

Returns:

True on success, false if trajectory placeholder fill or noise setup fails

struct ActionConfig#

Configuration parsed from the action engine’s config.json.

Public Members

float ropeTheta = {0.0F}#

RoPE base frequency (from rope_theta)

int32_t mropeSectionH = {0}#

MRoPE frequency pairs for height dimension (from rope_parameters.mrope_section[1])

int32_t mropeSectionW = {0}#

MRoPE frequency pairs for width dimension (from rope_parameters.mrope_section[2])

int32_t numDecoderLayers = {0}#

Number of transformer decoder layers (from num_hidden_layers)

int32_t numTrajTokens = {0}#

Number of trajectory history tokens (from num_traj_tokens)

int32_t trajTokenStart = {0}#

Vocabulary ID of the first trajectory token (from traj_token_start)

int32_t maxKVCacheCapacity = {0}#

Maximum KV cache sequence capacity (from builder_config.max_kv_cache_capacity)

int32_t numKVHeads = {0}#

Number of key-value heads (from num_key_value_heads)

int32_t headDim = {0}#

Head dimension (from head_dim)