Alpamayo1 Action Runner#
-
class Alpamayo1ActionRunner#
Standalone action / diffusion head for Alpamayo 1 trajectory prediction.
Consumes the VLM KV cache after generation and produces future trajectory waypoints via a flow-matching denoising loop.
Public Functions
- Alpamayo1ActionRunner(
- std::string const &engineDir,
- cudaStream_t stream,
- KVCacheManager::Config const &kvCacheConfig
Load action engine, config, and allocate tensors.
config.json must include rope_theta and num_hidden_layers (decoder layer count)
- Parameters:
engineDir – Path to directory containing action.engine and config.json
stream – CUDA stream for operations
kvCacheConfig – KV cache layout from the LLM (from KVCacheManager::Config())
- Throws:
std::runtime_error – If engine loading, configuration parsing, or allocation fails
-
~Alpamayo1ActionRunner() noexcept = default#
-
int64_t getRequiredContextMemorySize() const#
Get the required context memory size for this engine.
- Returns:
Required context memory size in bytes
-
bool setContextMemory(rt::Tensor &sharedContextMemory)#
Set shared context memory for the execution context.
Note
The tensor size must be >= getRequiredContextMemorySize(). Must be called before infer().
- Parameters:
sharedContextMemory – Tensor containing the shared device memory (must be on GPU)
- Returns:
True on success, false if the tensor is too small
-
inline action::ActionModelType getModelType() const noexcept#
Get action head model type.
- Returns:
Action model type
-
inline void setNoiseSeed(int32_t seed) noexcept#
Set the random seed used when initializing the diffusion noise trajectory.
- Parameters:
seed – Random seed value
-
inline int32_t getMaxKVCacheCapacity() const noexcept#
Get the max KV cache capacity the action engine was built with.
- Returns:
Maximum KV cache capacity (from builder_config in engine config.json)
- std::vector<std::vector<FutureTrajectoryPoint>> sampleTrajectory(
- cudaStream_t stream,
- int32_t activeBatchSize,
- HybridCacheManager &kvcache,
- std::vector<int64_t> const &vlmOutputsRopeDeltas
Run one batched diffusion/flow-matching loop and return future trajectory waypoints for all batch items. Call preprocess() once per request before this (prefill path).
- Parameters:
stream – CUDA stream for operations
activeBatchSize – Number of active sequences
kvcache – KV cache containing the VLM outputs; used for KV cache lengths and layer tensors
vlmOutputsRopeDeltas – Per-batch VLM RoPE deltas (e.g. from vision runner getMropeRopeDeltasPerBatch); size must match batch.
- bool preprocess(
- LLMGenerationRequest const &request,
- std::vector<std::vector<int32_t>> &batchedInputIds,
- tokenizer::Tokenizer const *tokenizer
Preprocess batched token IDs for Alpamayo (e.g. replace <|traj_history|> pads with trajectory tokens), validate batch size against allocated action buffers, and initialize diffusion noise for the request.
- Returns:
True on success, false if trajectory placeholder fill or noise setup fails
-
struct ActionConfig#
Configuration parsed from the action engine’s config.json.
Public Members
-
float ropeTheta = {0.0F}#
RoPE base frequency (from rope_theta)
-
int32_t mropeSectionH = {0}#
MRoPE frequency pairs for height dimension (from rope_parameters.mrope_section[1])
-
int32_t mropeSectionW = {0}#
MRoPE frequency pairs for width dimension (from rope_parameters.mrope_section[2])
-
int32_t numDecoderLayers = {0}#
Number of transformer decoder layers (from num_hidden_layers)
-
int32_t numTrajTokens = {0}#
Number of trajectory history tokens (from num_traj_tokens)
-
int32_t trajTokenStart = {0}#
Vocabulary ID of the first trajectory token (from traj_token_start)
-
int32_t maxKVCacheCapacity = {0}#
Maximum KV cache sequence capacity (from builder_config.max_kv_cache_capacity)
-
int32_t numKVHeads = {0}#
Number of key-value heads (from num_key_value_heads)
-
int32_t headDim = {0}#
Head dimension (from head_dim)
-
float ropeTheta = {0.0F}#