Additional Registered Transforms#
These transforms are registered in the AutoDeploy transform library but are not part of the standard graph-mode or transformers-mode pipelines. They are useful for specialized experiments, explicit opt-in configurations, or development workflows.
Fuse Finegrained FP8 Gemms#
Transform key: fuse_finegrained_fp8_gemms
Source module: tensorrt_llm._torch.auto_deploy.transform.library.fusion
- class tensorrt_llm._torch.auto_deploy.transform.library.fusion.FuseFineGrainedFP8Gemms(
- config: TransformConfig,
Bases:
QuantizationFusionMixin,BaseTransformFuse FineGrained (block-wise) FP8 GEMMs sharing the same input activation.
FineGrained FP8 uses per-block weight scales (weight_scale_inv) and dynamic input quantization, so fusion simply concatenates weights and their block scales along the output dimension.
YAML configuration
Uses the common TransformConfig fields documented in Core Transform APIs.
Fuse Mamba A Log#
Transform key: fuse_mamba_a_log
Source module: tensorrt_llm._torch.auto_deploy.transform.library.fuse_mamba_a_log
- class tensorrt_llm._torch.auto_deploy.transform.library.fuse_mamba_a_log.FuseMambaALog(
- config: TransformConfig,
Bases:
BaseTransformFuse A_log parameter into A constant/parameter.
- Replaces:
A = -torch.exp(self.A_log.float())
- With:
A = self.A_fused
YAML configuration
Uses the common TransformConfig fields documented in Core Transform APIs.
Match FP8 MoE Pattern#
Transform key: match_fp8_moe_pattern
Source module: tensorrt_llm._torch.auto_deploy.transform.library.fused_moe
- class tensorrt_llm._torch.auto_deploy.transform.library.fused_moe.MatchFP8MoePattern(
- config: TransformConfig,
Bases:
MatchMoePatternMatch and fuse FP8-quantized MoE subgraph.
YAML configuration
Uses the common TransformConfig fields documented in Core Transform APIs.
Match NVFP4 MoE Pattern#
Transform key: match_nvfp4_moe_pattern
Source module: tensorrt_llm._torch.auto_deploy.transform.library.fused_moe
- class tensorrt_llm._torch.auto_deploy.transform.library.fused_moe.MatchNVFP4MoePattern(
- config: TransformConfig,
Bases:
MatchMoePatternMatch and fuse NVFP4-quantized MoE subgraph.
YAML configuration
Uses the common TransformConfig fields documented in Core Transform APIs.