Additional Registered Transforms#

These transforms are registered in the AutoDeploy transform library but are not part of the standard graph-mode or transformers-mode pipelines. They are useful for specialized experiments, explicit opt-in configurations, or development workflows.

Fuse Finegrained FP8 Gemms#

Transform key: fuse_finegrained_fp8_gemms

Source module: tensorrt_llm._torch.auto_deploy.transform.library.fusion

class tensorrt_llm._torch.auto_deploy.transform.library.fusion.FuseFineGrainedFP8Gemms( config: TransformConfig, )[source]#

Bases: QuantizationFusionMixin, BaseTransform

Fuse FineGrained (block-wise) FP8 GEMMs sharing the same input activation.

FineGrained FP8 uses per-block weight scales (weight_scale_inv) and dynamic input quantization, so fusion simply concatenates weights and their block scales along the output dimension.

build_custom_args_for_linear( scale_getattrs: Dict[str, Node], ) → Tuple[object, ...][source]#: Return the positional tail after bias for the fused call.

YAML configuration

Uses the common TransformConfig fields documented in Core Transform APIs.

Fuse Mamba A Log#

Transform key: fuse_mamba_a_log

Source module: tensorrt_llm._torch.auto_deploy.transform.library.fuse_mamba_a_log

class tensorrt_llm._torch.auto_deploy.transform.library.fuse_mamba_a_log.FuseMambaALog( config: TransformConfig, )[source]#

Bases: BaseTransform

Fuse A_log parameter into A constant/parameter.

Replaces:: A = -torch.exp(self.A_log.float())
With:: A = self.A_fused

YAML configuration

Uses the common TransformConfig fields documented in Core Transform APIs.

Match FP8 MoE Pattern#

Transform key: match_fp8_moe_pattern

Source module: tensorrt_llm._torch.auto_deploy.transform.library.fused_moe

class tensorrt_llm._torch.auto_deploy.transform.library.fused_moe.MatchFP8MoePattern( config: TransformConfig, )[source]#

Bases: MatchMoePattern

Match and fuse FP8-quantized MoE subgraph.

scale_arg_indices() → Dict[str, int][source]#: Map scale names -> arg index in the matched linear op.

scale_keys() → List[str][source]#: Order of scale keys to emit into fused MoE op (e.g., [‘input_scale’,’weight_scale’,…]).

YAML configuration

Uses the common TransformConfig fields documented in Core Transform APIs.

Match NVFP4 MoE Pattern#

Transform key: match_nvfp4_moe_pattern

Source module: tensorrt_llm._torch.auto_deploy.transform.library.fused_moe

class tensorrt_llm._torch.auto_deploy.transform.library.fused_moe.MatchNVFP4MoePattern( config: TransformConfig, )[source]#

Bases: MatchMoePattern

Match and fuse NVFP4-quantized MoE subgraph.

scale_arg_indices() → Dict[str, int][source]#: Map scale names -> arg index in the matched linear op.

scale_keys() → List[str][source]#: Order of scale keys to emit into fused MoE op (e.g., [‘input_scale’,’weight_scale’,…]).

YAML configuration

Uses the common TransformConfig fields documented in Core Transform APIs.