Additional Registered Transforms#

These transforms are registered in the AutoDeploy transform library but are not part of the standard graph-mode or transformers-mode pipelines. They are useful for specialized experiments, explicit opt-in configurations, or development workflows.

Fuse Finegrained FP8 Gemms#

Transform key: fuse_finegrained_fp8_gemms

Source module: tensorrt_llm._torch.auto_deploy.transform.library.fusion

class tensorrt_llm._torch.auto_deploy.transform.library.fusion.FuseFineGrainedFP8Gemms(
config: TransformConfig,
)[source]#

Bases: QuantizationFusionMixin, BaseTransform

Fuse FineGrained (block-wise) FP8 GEMMs sharing the same input activation.

FineGrained FP8 uses per-block weight scales (weight_scale_inv) and dynamic input quantization, so fusion simply concatenates weights and their block scales along the output dimension.

build_custom_args_for_linear(
scale_getattrs: Dict[str, Node],
) Tuple[object, ...][source]#

Return the positional tail after bias for the fused call.

YAML configuration

Uses the common TransformConfig fields documented in Core Transform APIs.

Fuse Mamba A Log#

Transform key: fuse_mamba_a_log

Source module: tensorrt_llm._torch.auto_deploy.transform.library.fuse_mamba_a_log

class tensorrt_llm._torch.auto_deploy.transform.library.fuse_mamba_a_log.FuseMambaALog(
config: TransformConfig,
)[source]#

Bases: BaseTransform

Fuse A_log parameter into A constant/parameter.

Replaces:

A = -torch.exp(self.A_log.float())

With:

A = self.A_fused

YAML configuration

Uses the common TransformConfig fields documented in Core Transform APIs.

Match FP8 MoE Pattern#

Transform key: match_fp8_moe_pattern

Source module: tensorrt_llm._torch.auto_deploy.transform.library.fused_moe

class tensorrt_llm._torch.auto_deploy.transform.library.fused_moe.MatchFP8MoePattern(
config: TransformConfig,
)[source]#

Bases: MatchMoePattern

Match and fuse FP8-quantized MoE subgraph.

scale_arg_indices() Dict[str, int][source]#

Map scale names -> arg index in the matched linear op.

scale_keys() List[str][source]#

Order of scale keys to emit into fused MoE op (e.g., [‘input_scale’,’weight_scale’,…]).

YAML configuration

Uses the common TransformConfig fields documented in Core Transform APIs.

Match NVFP4 MoE Pattern#

Transform key: match_nvfp4_moe_pattern

Source module: tensorrt_llm._torch.auto_deploy.transform.library.fused_moe

class tensorrt_llm._torch.auto_deploy.transform.library.fused_moe.MatchNVFP4MoePattern(
config: TransformConfig,
)[source]#

Bases: MatchMoePattern

Match and fuse NVFP4-quantized MoE subgraph.

scale_arg_indices() Dict[str, int][source]#

Map scale names -> arg index in the matched linear op.

scale_keys() List[str][source]#

Order of scale keys to emit into fused MoE op (e.g., [‘input_scale’,’weight_scale’,…]).

YAML configuration

Uses the common TransformConfig fields documented in Core Transform APIs.