Lora Manager#

class LoRAManager#

Manages LoRA adapter weights and switching.

Maintains a map of named adapters, each containing a set of weight tensors identified by binding name. At any time at most one adapter is “active”. Switching adapters is O(1) — no GPU copies are required because the TensorRT engine binds directly to the stored tensor pointers.

Usage:

  1. loadWeights() or addWeights() to register adapter(s).

  2. switchWeights(name) to activate an adapter.

  3. getActiveWeight(bindingName) to retrieve the tensor for engine binding.

  4. resetWeights() to deactivate (bind dummy zero-tensors).

Public Functions

LoRAManager() = default#

Default constructor.

LoRAManager(LoRAManager const&) = delete#

Deleted copy to prevent accidental duplication of GPU resources.

LoRAManager &operator=(LoRAManager const&) = delete#
LoRAManager(LoRAManager&&) noexcept = default#

Allow move.

LoRAManager &operator=(LoRAManager&&) noexcept = default#
void loadWeights(
std::string const &name,
std::filesystem::path const &path,
cudaStream_t stream
)#

Load LoRA adapter weights from a safetensors file.

Each tensor in the safetensors file is stored under its tensor name as the binding name.

Parameters:
  • name – Adapter name (user-facing identifier).

  • path – Path to the .safetensors file.

  • stream – CUDA stream for async loading.

Throws:

std::runtime_error – if file cannot be read or format is invalid.

void addWeights(
std::string const &name,
std::map<std::string, rt::Tensor> weights
)#

Register adapter weights directly (useful for unit testing without I/O).

Parameters:
  • name – Adapter name.

  • weights – Map of binding-name to tensor (tensors are moved from).

void switchWeights(std::string const &name)#

Activate an adapter by name.

Parameters:

name – Adapter name (must have been loaded/added previously).

Throws:

std::runtime_error – if the adapter name is not found.

void resetWeights()#

Deactivate any adapter. After this call getActiveWeight() returns a reference to a zero-filled dummy tensor (of shape [1]).

rt::Tensor &getActiveWeight(std::string const &bindingName)#

Retrieve the currently active tensor for a given binding name. O(1).

Parameters:

bindingName – The engine binding name (e.g. “lora_A_layer_0”).

Throws:

std::runtime_error – if bindingName is not found in the active adapter.

Returns:

Reference to the weight tensor (dummy tensor if no adapter is active).

std::string const &getActiveAdapterName() const noexcept#

Return the name of the currently active adapter, or an empty string if no adapter is active.

std::vector<std::string> getBindingNames() const#

Return all binding names across all loaded adapters. Useful for initialising a TensorMap with the correct keys.

std::vector<std::string> getAdapterNames() const#

Return all loaded adapter names.

bool hasActiveAdapter() const noexcept#

Check whether any adapter is currently active.

bool hasWeightFor(std::string const &bindingName) const noexcept#

Check whether the active adapter contains a weight under bindingName.

Fused engines and non-fused adapters sometimes use different naming conventions (e.g. qkv_proj.* vs separate q_proj.* / k_proj.* / v_proj.*). refreshTensorMap uses this predicate to decide whether to bind the adapter’s weight or fall back to a dummy tensor, without paying the cost of try / catch around getActiveWeight.

Returns false when no adapter is active.

void initializeEngineBindings(EngineExecutor const &runner)#

Register the engine’s LoRA I/O bindings and create rank=1 dummy tensors with the correct engine shapes. Must be called once after the EngineExecutor is constructed so that refreshTensorMap() knows which names to populate.

Encapsulates the LoRA binding-shape convention:

  • lora_A_* weights have shape [k, rank]; dummy sets last dim to 1.

  • lora_B_* weights have shape [rank, n]; dummy sets first dim to 1.

Parameters:

runner – Source of the engine I/O list and per-binding max shapes.

void refreshTensorMap(TensorMap &map)#

Refresh all LoRA entries in the given TensorMap.

For each registered engine binding name, either the active adapter’s weight tensor or the per-binding dummy tensor is written into map. Must be called after every switchWeights() / resetWeights().

Parameters:

mapTensorMap to update with the current LoRA bindings.