Sharding Stage#
Sharding determines and applies distributed execution layout. These transforms identify tensor, expert, and batch-matmul sharding choices, then apply graph rewrites and communication hints needed for multi-rank execution.
Detect Sharding#
Transform key: detect_sharding
Source module: tensorrt_llm._torch.auto_deploy.transform.library.sharding
Configured modes: graph
- class tensorrt_llm._torch.auto_deploy.transform.library.sharding.Sharding(
- config: TransformConfig,
Bases:
BaseTransformA transformation to apply sharding to the model following tensor parallelism.
The transformation is based on the following steps:
Identify boundary nodes between residual nodes to identify enable_sharding regions.
Identify the GEMM nodes that can be sharded
Trace through the subgraph using DFS/BFS between each pair of boundary nodes
Account for each node in the trace to ensure the op is correct even after sharding. This is necessary to ensure that the sharding is correct and we need to be able to account for all nodes in the subgraph. The subgraph here is defined as the region between the first linear node to the last linear node of an identified sharding region.
# 5. Shard the GEMM nodes or skip accordingly.
min_local_shape is the minimum size of the local tensor shard, to prevent TP parallelism splitting, e.g., the individual heads into smaller shards.
- classmethod get_config_class() → Type[TransformConfig][source]#
Get the configuration class for the transform.
This is used to validate the configuration of the transform.
YAML configuration
The fields below can be set under this transform’s entry in the AutoDeploy config YAML.
- pydantic model tensorrt_llm._torch.auto_deploy.transform.library.sharding.ShardingTransformConfig[source]
Bases:
TransformConfigConfiguration for sharding the model.
Show JSON schema
{ "title": "ShardingTransformConfig", "description": "Configuration for sharding the model.", "type": "object", "properties": { "stage": { "$ref": "#/$defs/Stages", "description": "The stage of the transformation pipeline where this transform should run." }, "run_per_gm": { "default": true, "description": "Whether to run the transform per graph (sub)module or on whole module.", "title": "Run Per Gm", "type": "boolean" }, "enabled": { "default": true, "description": "Whether to enable this transform.", "title": "Enabled", "type": "boolean" }, "skip_on_error": { "default": false, "description": "Whether to skip the transform if an error occurs.", "title": "Skip On Error", "type": "boolean" }, "run_graph_cleanup": { "default": true, "description": "Whether to run graph cleanup/canonicalization after this transform.", "title": "Run Graph Cleanup", "type": "boolean" }, "run_shape_prop": { "default": false, "description": "Whether to run shape propagation after this transform.", "title": "Run Shape Prop", "type": "boolean" }, "requires_clean_graph": { "default": true, "description": "Whether this transform requires the graph to be clean before it is applied.", "title": "Requires Clean Graph", "type": "boolean" }, "requires_shape_prop": { "default": false, "description": "Whether this transform requires shape propagation before it is applied.", "title": "Requires Shape Prop", "type": "boolean" }, "debug_visualize_dir": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.", "title": "Debug Visualize Dir" }, "expect_mem_change": { "default": false, "description": "Whether this transform is expected to cause changes in CUDA memory stats.", "title": "Expect Mem Change", "type": "boolean" }, "factory_source": { "$ref": "#/$defs/ShardingConfigSource", "default": "unknown" }, "factory_config": { "additionalProperties": true, "title": "Factory Config", "type": "object" }, "manual_config": { "additionalProperties": true, "title": "Manual Config", "type": "object" }, "simple_shard_only": { "default": false, "title": "Simple Shard Only", "type": "boolean" }, "support_partial_config": { "default": true, "title": "Support Partial Config", "type": "boolean" }, "sharding_source": { "items": { "$ref": "#/$defs/ShardingSource" }, "title": "Sharding Source", "type": "array" }, "sharding_dims": { "items": { "$ref": "#/$defs/ShardingDim" }, "title": "Sharding Dims", "type": "array" }, "shard_all_unprocessed": { "default": false, "description": "When True, apply simple shard (column split + all_gather) to 'leftover' linear nodes that are not part of any layer subgraph.", "title": "Shard All Unprocessed", "type": "boolean" }, "simple_shard_filter": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Comma-separated list of substrings to filter which unprocessed linear nodes are simple-sharded. A node is included if its name contains ANY of the listed keywords. Example: 'lm_head,shared_expert'. Only effective when shard_all_unprocessed is True. When None, all unprocessed linear nodes are sharded.", "title": "Simple Shard Filter" }, "allreduce_strategy": { "$ref": "#/$defs/AllReduceStrategy", "default": 3, "description": "AllReduce strategy for distributed operations. Options: AUTO (automatic selection), NCCL, ONESHOT, TWOSHOT, MIN_LATENCY, LOWPRECISION, UB, MNNVL, NCCL_SYMMETRIC" }, "allgather_strategy": { "$ref": "#/$defs/AllGatherStrategy", "default": "AUTO", "description": "AllGather strategy for distributed operations. Options: AUTO (NCCL AllGather), SYMM_MEM (symmetric memory with MULTIMEM, falls back to NCCL for unsupported cases)." }, "dist_backend": { "$ref": "#/$defs/DistBackend", "default": "auto" }, "enable_attention_dp": { "default": false, "description": "When True, skip TP sharding as attention data parallelism is enabled.", "title": "Enable Attention Dp", "type": "boolean" }, "shard_layers": { "anyOf": [ { "items": { "type": "string" }, "type": "array" }, { "type": "null" } ], "default": null, "description": "When set, only shard nodes whose layer_type hint is in this list. Nodes with layer_type='unknown' or missing are NOT sharded. When None (default), all enable_sharding nodes are processed regardless of layer_type.", "title": "Shard Layers" }, "dist_mapping": { "additionalProperties": { "type": "integer" }, "title": "Dist Mapping", "type": "object" }, "mapping": { "default": null, "title": "Mapping" }, "dist_config": { "$ref": "#/$defs/DistConfig" } }, "$defs": { "AllGatherStrategy": { "description": "Enum for AllGather strategy.\n\nAUTO: Use NCCL AllGather (default).\nSYMM_MEM: Use PyTorch symmetric memory with MULTIMEM hardware instructions.\n Falls back to NCCL for unsupported cases (variable sizes, dim!=0, large tensors).", "enum": [ "AUTO", "SYMM_MEM" ], "title": "AllGatherStrategy", "type": "string" }, "AllReduceStrategy": { "enum": [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], "title": "AllReduceStrategy", "type": "integer" }, "DistBackend": { "description": "Enum for distributed backend.", "enum": [ "auto", "trtllm", "torch" ], "title": "DistBackend", "type": "string" }, "DistConfig": { "additionalProperties": true, "description": "Distributed parallelism configuration for AutoDeploy.", "properties": { "world_size": { "default": 1, "minimum": 1, "title": "World Size", "type": "integer" }, "rank": { "default": 0, "minimum": 0, "title": "Rank", "type": "integer" }, "tp_size": { "default": 1, "minimum": 1, "title": "Tp Size", "type": "integer" }, "pp_size": { "default": 1, "minimum": 1, "title": "Pp Size", "type": "integer" }, "moe_tp_size": { "default": 1, "minimum": 1, "title": "Moe Tp Size", "type": "integer" }, "moe_ep_size": { "default": 1, "minimum": 1, "title": "Moe Ep Size", "type": "integer" }, "moe_cluster_size": { "default": 1, "minimum": 1, "title": "Moe Cluster Size", "type": "integer" }, "enable_attention_dp": { "default": false, "title": "Enable Attention Dp", "type": "boolean" }, "allreduce_strategy": { "default": "NCCL", "title": "Allreduce Strategy", "type": "string" } }, "title": "DistConfig", "type": "object" }, "ShardingConfigSource": { "description": "Enum for factory source.", "enum": [ "huggingface", "unknown" ], "title": "ShardingConfigSource", "type": "string" }, "ShardingDim": { "description": "Enum for sharding dimension.", "enum": [ "tp", "ep", "bmm" ], "title": "ShardingDim", "type": "string" }, "ShardingSource": { "description": "Enum for sharding source.", "enum": [ "heuristic", "factory", "manual" ], "title": "ShardingSource", "type": "string" }, "Stages": { "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.", "enum": [ "factory", "export", "post_export", "pattern_matcher", "sharding", "weight_load", "post_load_fusion", "cache_init", "visualize", "compile" ], "title": "Stages", "type": "string" } }, "additionalProperties": true, "required": [ "stage" ] }
- Config:
extra: str = allow
arbitrary_types_allowed: bool = True
- Fields:
allgather_strategy (tensorrt_llm._torch.auto_deploy.transform.library.sharding.AllGatherStrategy)allreduce_strategy (tensorrt_llm.functional.AllReduceStrategy)dist_backend (tensorrt_llm._torch.auto_deploy.transform.library.sharding.DistBackend)dist_config (tensorrt_llm._torch.auto_deploy.utils.dist_config.DistConfig)dist_mapping (dict[str, int])enable_attention_dp (bool)factory_config (Dict[str, Any])factory_source (tensorrt_llm._torch.auto_deploy.models.factory.ShardingConfigSource)manual_config (Dict[str, Any])mapping (Any)shard_all_unprocessed (bool)shard_layers (List[str] | None)sharding_dims (List[tensorrt_llm._torch.auto_deploy.transform.library.sharding.ShardingDim])sharding_source (List[tensorrt_llm._torch.auto_deploy.transform.library.sharding.ShardingSource])simple_shard_filter (str | None)simple_shard_only (bool)support_partial_config (bool)
- Validators:
_validate_allgather_strategy»allgather_strategy_validate_allreduce_strategy»allreduce_strategy
- field allgather_strategy: AllGatherStrategy = AllGatherStrategy.AUTO
AllGather strategy for distributed operations. Options: AUTO (NCCL AllGather), SYMM_MEM (symmetric memory with MULTIMEM, falls back to NCCL for unsupported cases).
- field allreduce_strategy: AllReduceStrategy = AllReduceStrategy.AUTO
AllReduce strategy for distributed operations. Options: AUTO (automatic selection), NCCL, ONESHOT, TWOSHOT, MIN_LATENCY, LOWPRECISION, UB, MNNVL, NCCL_SYMMETRIC
- field dist_backend: DistBackend = DistBackend.AUTO
- field dist_config: DistConfig [Optional]
- field dist_mapping: dict[str, int] [Optional]
- field enable_attention_dp: bool = False
When True, skip TP sharding as attention data parallelism is enabled.
- field factory_config: Dict[str, Any] [Optional]
- field factory_source: ShardingConfigSource = ShardingConfigSource.UNKNOWN
- field manual_config: Dict[str, Any] [Optional]
- field mapping: Any = None
- field shard_all_unprocessed: bool = False
When True, apply simple shard (column split + all_gather) to ‘leftover’ linear nodes that are not part of any layer subgraph.
- field shard_layers: List[str] | None = None
When set, only shard nodes whose layer_type hint is in this list. Nodes with layer_type=’unknown’ or missing are NOT sharded. When None (default), all enable_sharding nodes are processed regardless of layer_type.
- field sharding_dims: List[ShardingDim] [Optional]
- field sharding_source: List[ShardingSource] [Optional]
- field simple_shard_filter: str | None = None
Comma-separated list of substrings to filter which unprocessed linear nodes are simple-sharded. A node is included if its name contains ANY of the listed keywords. Example: ‘lm_head,shared_expert’. Only effective when shard_all_unprocessed is True. When None, all unprocessed linear nodes are sharded.
- field simple_shard_only: bool = False
- field support_partial_config: bool = True
- validate_config(
- sources: ShardingSource | List[ShardingSource] = None,
Sharding Transform Executor#
Transform key: sharding_transform_executor
Source module: tensorrt_llm._torch.auto_deploy.transform.library.sharding
Configured modes: graph
- class tensorrt_llm._torch.auto_deploy.transform.library.sharding.ShardingTransformExecutor(
- config: TransformConfig,
Bases:
BaseTransformApply transformations to the graph module.
- Parameters:
gm – Graph module to apply transformations to
sharding_config – Transformation configuration containing list of transformations to apply
- classmethod get_config_class() → Type[TransformConfig][source]#
Get the configuration class for the transform.
This is used to validate the configuration of the transform.
YAML configuration
The fields below can be set under this transform’s entry in the AutoDeploy config YAML.
- pydantic model tensorrt_llm._torch.auto_deploy.transform.library.sharding.ShardingTransformConfig[source]
Bases:
TransformConfigConfiguration for sharding the model.
Show JSON schema
{ "title": "ShardingTransformConfig", "description": "Configuration for sharding the model.", "type": "object", "properties": { "stage": { "$ref": "#/$defs/Stages", "description": "The stage of the transformation pipeline where this transform should run." }, "run_per_gm": { "default": true, "description": "Whether to run the transform per graph (sub)module or on whole module.", "title": "Run Per Gm", "type": "boolean" }, "enabled": { "default": true, "description": "Whether to enable this transform.", "title": "Enabled", "type": "boolean" }, "skip_on_error": { "default": false, "description": "Whether to skip the transform if an error occurs.", "title": "Skip On Error", "type": "boolean" }, "run_graph_cleanup": { "default": true, "description": "Whether to run graph cleanup/canonicalization after this transform.", "title": "Run Graph Cleanup", "type": "boolean" }, "run_shape_prop": { "default": false, "description": "Whether to run shape propagation after this transform.", "title": "Run Shape Prop", "type": "boolean" }, "requires_clean_graph": { "default": true, "description": "Whether this transform requires the graph to be clean before it is applied.", "title": "Requires Clean Graph", "type": "boolean" }, "requires_shape_prop": { "default": false, "description": "Whether this transform requires shape propagation before it is applied.", "title": "Requires Shape Prop", "type": "boolean" }, "debug_visualize_dir": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.", "title": "Debug Visualize Dir" }, "expect_mem_change": { "default": false, "description": "Whether this transform is expected to cause changes in CUDA memory stats.", "title": "Expect Mem Change", "type": "boolean" }, "factory_source": { "$ref": "#/$defs/ShardingConfigSource", "default": "unknown" }, "factory_config": { "additionalProperties": true, "title": "Factory Config", "type": "object" }, "manual_config": { "additionalProperties": true, "title": "Manual Config", "type": "object" }, "simple_shard_only": { "default": false, "title": "Simple Shard Only", "type": "boolean" }, "support_partial_config": { "default": true, "title": "Support Partial Config", "type": "boolean" }, "sharding_source": { "items": { "$ref": "#/$defs/ShardingSource" }, "title": "Sharding Source", "type": "array" }, "sharding_dims": { "items": { "$ref": "#/$defs/ShardingDim" }, "title": "Sharding Dims", "type": "array" }, "shard_all_unprocessed": { "default": false, "description": "When True, apply simple shard (column split + all_gather) to 'leftover' linear nodes that are not part of any layer subgraph.", "title": "Shard All Unprocessed", "type": "boolean" }, "simple_shard_filter": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Comma-separated list of substrings to filter which unprocessed linear nodes are simple-sharded. A node is included if its name contains ANY of the listed keywords. Example: 'lm_head,shared_expert'. Only effective when shard_all_unprocessed is True. When None, all unprocessed linear nodes are sharded.", "title": "Simple Shard Filter" }, "allreduce_strategy": { "$ref": "#/$defs/AllReduceStrategy", "default": 3, "description": "AllReduce strategy for distributed operations. Options: AUTO (automatic selection), NCCL, ONESHOT, TWOSHOT, MIN_LATENCY, LOWPRECISION, UB, MNNVL, NCCL_SYMMETRIC" }, "allgather_strategy": { "$ref": "#/$defs/AllGatherStrategy", "default": "AUTO", "description": "AllGather strategy for distributed operations. Options: AUTO (NCCL AllGather), SYMM_MEM (symmetric memory with MULTIMEM, falls back to NCCL for unsupported cases)." }, "dist_backend": { "$ref": "#/$defs/DistBackend", "default": "auto" }, "enable_attention_dp": { "default": false, "description": "When True, skip TP sharding as attention data parallelism is enabled.", "title": "Enable Attention Dp", "type": "boolean" }, "shard_layers": { "anyOf": [ { "items": { "type": "string" }, "type": "array" }, { "type": "null" } ], "default": null, "description": "When set, only shard nodes whose layer_type hint is in this list. Nodes with layer_type='unknown' or missing are NOT sharded. When None (default), all enable_sharding nodes are processed regardless of layer_type.", "title": "Shard Layers" }, "dist_mapping": { "additionalProperties": { "type": "integer" }, "title": "Dist Mapping", "type": "object" }, "mapping": { "default": null, "title": "Mapping" }, "dist_config": { "$ref": "#/$defs/DistConfig" } }, "$defs": { "AllGatherStrategy": { "description": "Enum for AllGather strategy.\n\nAUTO: Use NCCL AllGather (default).\nSYMM_MEM: Use PyTorch symmetric memory with MULTIMEM hardware instructions.\n Falls back to NCCL for unsupported cases (variable sizes, dim!=0, large tensors).", "enum": [ "AUTO", "SYMM_MEM" ], "title": "AllGatherStrategy", "type": "string" }, "AllReduceStrategy": { "enum": [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], "title": "AllReduceStrategy", "type": "integer" }, "DistBackend": { "description": "Enum for distributed backend.", "enum": [ "auto", "trtllm", "torch" ], "title": "DistBackend", "type": "string" }, "DistConfig": { "additionalProperties": true, "description": "Distributed parallelism configuration for AutoDeploy.", "properties": { "world_size": { "default": 1, "minimum": 1, "title": "World Size", "type": "integer" }, "rank": { "default": 0, "minimum": 0, "title": "Rank", "type": "integer" }, "tp_size": { "default": 1, "minimum": 1, "title": "Tp Size", "type": "integer" }, "pp_size": { "default": 1, "minimum": 1, "title": "Pp Size", "type": "integer" }, "moe_tp_size": { "default": 1, "minimum": 1, "title": "Moe Tp Size", "type": "integer" }, "moe_ep_size": { "default": 1, "minimum": 1, "title": "Moe Ep Size", "type": "integer" }, "moe_cluster_size": { "default": 1, "minimum": 1, "title": "Moe Cluster Size", "type": "integer" }, "enable_attention_dp": { "default": false, "title": "Enable Attention Dp", "type": "boolean" }, "allreduce_strategy": { "default": "NCCL", "title": "Allreduce Strategy", "type": "string" } }, "title": "DistConfig", "type": "object" }, "ShardingConfigSource": { "description": "Enum for factory source.", "enum": [ "huggingface", "unknown" ], "title": "ShardingConfigSource", "type": "string" }, "ShardingDim": { "description": "Enum for sharding dimension.", "enum": [ "tp", "ep", "bmm" ], "title": "ShardingDim", "type": "string" }, "ShardingSource": { "description": "Enum for sharding source.", "enum": [ "heuristic", "factory", "manual" ], "title": "ShardingSource", "type": "string" }, "Stages": { "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.", "enum": [ "factory", "export", "post_export", "pattern_matcher", "sharding", "weight_load", "post_load_fusion", "cache_init", "visualize", "compile" ], "title": "Stages", "type": "string" } }, "additionalProperties": true, "required": [ "stage" ] }
- Config:
extra: str = allow
arbitrary_types_allowed: bool = True
- Fields:
allgather_strategy (tensorrt_llm._torch.auto_deploy.transform.library.sharding.AllGatherStrategy)allreduce_strategy (tensorrt_llm.functional.AllReduceStrategy)debug_visualize_dir (Optional[str])dist_backend (tensorrt_llm._torch.auto_deploy.transform.library.sharding.DistBackend)dist_config (tensorrt_llm._torch.auto_deploy.utils.dist_config.DistConfig)dist_mapping (dict[str, int])enable_attention_dp (bool)enabled (bool)expect_mem_change (bool)factory_config (Dict[str, Any])factory_source (tensorrt_llm._torch.auto_deploy.models.factory.ShardingConfigSource)manual_config (Dict[str, Any])mapping (Any)requires_clean_graph (bool)requires_shape_prop (bool)run_graph_cleanup (bool)run_per_gm (bool)run_shape_prop (bool)shard_all_unprocessed (bool)shard_layers (List[str] | None)sharding_dims (List[tensorrt_llm._torch.auto_deploy.transform.library.sharding.ShardingDim])sharding_source (List[tensorrt_llm._torch.auto_deploy.transform.library.sharding.ShardingSource])simple_shard_filter (str | None)simple_shard_only (bool)skip_on_error (bool)stage (Stages)support_partial_config (bool)
- Validators:
_validate_allgather_strategy»allgather_strategy_validate_allreduce_strategy»allreduce_strategy
- field allgather_strategy: AllGatherStrategy = AllGatherStrategy.AUTO
AllGather strategy for distributed operations. Options: AUTO (NCCL AllGather), SYMM_MEM (symmetric memory with MULTIMEM, falls back to NCCL for unsupported cases).
- field allreduce_strategy: AllReduceStrategy = AllReduceStrategy.AUTO
AllReduce strategy for distributed operations. Options: AUTO (automatic selection), NCCL, ONESHOT, TWOSHOT, MIN_LATENCY, LOWPRECISION, UB, MNNVL, NCCL_SYMMETRIC
- field debug_visualize_dir: str | None = None
Debug visualization directory. None to disable visualization, or a path string to specify the output directory.
- field dist_backend: DistBackend = DistBackend.AUTO
- field dist_config: DistConfig [Optional]
- field dist_mapping: dict[str, int] [Optional]
- field enable_attention_dp: bool = False
When True, skip TP sharding as attention data parallelism is enabled.
- field enabled: bool = True
Whether to enable this transform.
- field expect_mem_change: bool = False
Whether this transform is expected to cause changes in CUDA memory stats.
- field factory_config: Dict[str, Any] [Optional]
- field factory_source: ShardingConfigSource = ShardingConfigSource.UNKNOWN
- field manual_config: Dict[str, Any] [Optional]
- field mapping: Any = None
- field requires_clean_graph: bool = True
Whether this transform requires the graph to be clean before it is applied.
- field requires_shape_prop: bool = False
Whether this transform requires shape propagation before it is applied.
- field run_graph_cleanup: bool = True
Whether to run graph cleanup/canonicalization after this transform.
- field run_per_gm: bool = True
Whether to run the transform per graph (sub)module or on whole module.
- field run_shape_prop: bool = False
Whether to run shape propagation after this transform.
- field shard_all_unprocessed: bool = False
When True, apply simple shard (column split + all_gather) to ‘leftover’ linear nodes that are not part of any layer subgraph.
- field shard_layers: List[str] | None = None
When set, only shard nodes whose layer_type hint is in this list. Nodes with layer_type=’unknown’ or missing are NOT sharded. When None (default), all enable_sharding nodes are processed regardless of layer_type.
- field sharding_dims: List[ShardingDim] [Optional]
- field sharding_source: List[ShardingSource] [Optional]
- field simple_shard_filter: str | None = None
Comma-separated list of substrings to filter which unprocessed linear nodes are simple-sharded. A node is included if its name contains ANY of the listed keywords. Example: ‘lm_head,shared_expert’. Only effective when shard_all_unprocessed is True. When None, all unprocessed linear nodes are sharded.
- field simple_shard_only: bool = False
- field skip_on_error: bool = False
Whether to skip the transform if an error occurs.
- field stage: Stages [Required]
The stage of the transformation pipeline where this transform should run.
- field support_partial_config: bool = True
- validate_config(
- sources: ShardingSource | List[ShardingSource] = None,
Apply Sharding Hints#
Transform key: apply_sharding_hints
Source module: tensorrt_llm._torch.auto_deploy.transform.library.sharding_ir
Configured modes: graph
- class tensorrt_llm._torch.auto_deploy.transform.library.sharding_ir.ApplyShardingHints(
- config: TransformConfig,
Bases:
BaseTransformDeterministic, node-local sharding transform driven by hint kwargs.
Iterates graph nodes and applies sharding based on explicit hint arguments (tp_mode, tp_scaled_dim, tp_scale_sizes, etc.) together with the runtime DistConfig. No cross-node propagation, no topology inference.
- classmethod get_config_class() → Type[TransformConfig][source]#
Get the configuration class for the transform.
This is used to validate the configuration of the transform.
YAML configuration
The fields below can be set under this transform’s entry in the AutoDeploy config YAML.
- pydantic model tensorrt_llm._torch.auto_deploy.transform.library.sharding_ir.IRShardingConfig[source]
Bases:
TransformConfigMinimal configuration for the hint-driven IR sharding transform.
This replaces the legacy
ShardingTransformConfigforApplyShardingHints, carrying only the fields that the IR path actually reads. When the legacy sharding path is removed, this is the only sharding config class.Show JSON schema
{ "title": "IRShardingConfig", "description": "Minimal configuration for the hint-driven IR sharding transform.\n\nThis replaces the legacy ``ShardingTransformConfig`` for\n``ApplyShardingHints``, carrying only the fields that the IR path actually\nreads. When the legacy sharding path is removed, this is the only sharding\nconfig class.", "type": "object", "properties": { "stage": { "$ref": "#/$defs/Stages", "description": "The stage of the transformation pipeline where this transform should run." }, "run_per_gm": { "default": true, "description": "Whether to run the transform per graph (sub)module or on whole module.", "title": "Run Per Gm", "type": "boolean" }, "enabled": { "default": true, "description": "Whether to enable this transform.", "title": "Enabled", "type": "boolean" }, "skip_on_error": { "default": false, "description": "Whether to skip the transform if an error occurs.", "title": "Skip On Error", "type": "boolean" }, "run_graph_cleanup": { "default": true, "description": "Whether to run graph cleanup/canonicalization after this transform.", "title": "Run Graph Cleanup", "type": "boolean" }, "run_shape_prop": { "default": false, "description": "Whether to run shape propagation after this transform.", "title": "Run Shape Prop", "type": "boolean" }, "requires_clean_graph": { "default": true, "description": "Whether this transform requires the graph to be clean before it is applied.", "title": "Requires Clean Graph", "type": "boolean" }, "requires_shape_prop": { "default": false, "description": "Whether this transform requires shape propagation before it is applied.", "title": "Requires Shape Prop", "type": "boolean" }, "debug_visualize_dir": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.", "title": "Debug Visualize Dir" }, "expect_mem_change": { "default": false, "description": "Whether this transform is expected to cause changes in CUDA memory stats.", "title": "Expect Mem Change", "type": "boolean" }, "allreduce_strategy": { "$ref": "#/$defs/AllReduceStrategy", "default": 3, "description": "AllReduce strategy for distributed operations." }, "simple_shard_only": { "default": false, "title": "Simple Shard Only", "type": "boolean" }, "shard_layers": { "anyOf": [ { "items": { "type": "string" }, "type": "array" }, { "type": "null" } ], "default": null, "description": "When set, only shard nodes whose layer_type hint is in this list.", "title": "Shard Layers" }, "simple_shard_filter": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Comma-separated weight-name keywords (e.g. 'lm_head'). Matching linears are gather-sharded (column split + all_gather) regardless of shard_layers -- used for the lm_head vocab projection, which the hint-driven sharder would otherwise replicate.", "title": "Simple Shard Filter" }, "enable_attention_dp": { "default": false, "title": "Enable Attention Dp", "type": "boolean" }, "dist_mapping": { "additionalProperties": { "type": "integer" }, "title": "Dist Mapping", "type": "object" }, "dist_config": { "$ref": "#/$defs/DistConfig" } }, "$defs": { "AllReduceStrategy": { "enum": [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], "title": "AllReduceStrategy", "type": "integer" }, "DistConfig": { "additionalProperties": true, "description": "Distributed parallelism configuration for AutoDeploy.", "properties": { "world_size": { "default": 1, "minimum": 1, "title": "World Size", "type": "integer" }, "rank": { "default": 0, "minimum": 0, "title": "Rank", "type": "integer" }, "tp_size": { "default": 1, "minimum": 1, "title": "Tp Size", "type": "integer" }, "pp_size": { "default": 1, "minimum": 1, "title": "Pp Size", "type": "integer" }, "moe_tp_size": { "default": 1, "minimum": 1, "title": "Moe Tp Size", "type": "integer" }, "moe_ep_size": { "default": 1, "minimum": 1, "title": "Moe Ep Size", "type": "integer" }, "moe_cluster_size": { "default": 1, "minimum": 1, "title": "Moe Cluster Size", "type": "integer" }, "enable_attention_dp": { "default": false, "title": "Enable Attention Dp", "type": "boolean" }, "allreduce_strategy": { "default": "NCCL", "title": "Allreduce Strategy", "type": "string" } }, "title": "DistConfig", "type": "object" }, "Stages": { "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.", "enum": [ "factory", "export", "post_export", "pattern_matcher", "sharding", "weight_load", "post_load_fusion", "cache_init", "visualize", "compile" ], "title": "Stages", "type": "string" } }, "additionalProperties": true, "required": [ "stage" ] }
- Config:
extra: str = allow
- Fields:
allreduce_strategy (tensorrt_llm.functional.AllReduceStrategy)dist_config (tensorrt_llm._torch.auto_deploy.utils.dist_config.DistConfig)dist_mapping (dict[str, int])enable_attention_dp (bool)shard_layers (List[str] | None)simple_shard_filter (str | None)simple_shard_only (bool)
- Validators:
_validate_allreduce_strategy»allreduce_strategy
- field allreduce_strategy: AllReduceStrategy = AllReduceStrategy.AUTO
AllReduce strategy for distributed operations.
- field dist_config: DistConfig [Optional]
- field dist_mapping: dict[str, int] [Optional]
- field enable_attention_dp: bool = False
- field shard_layers: List[str] | None = None
When set, only shard nodes whose layer_type hint is in this list.
- field simple_shard_filter: str | None = None
Comma-separated weight-name keywords (e.g. ‘lm_head’). Matching linears are gather-sharded (column split + all_gather) regardless of shard_layers – used for the lm_head vocab projection, which the hint-driven sharder would otherwise replicate.
- field simple_shard_only: bool = False
Pipeline Cache#
Transform key: pipeline_cache
Source module: tensorrt_llm._torch.auto_deploy.transform.pipeline_cache.pipeline_cache
Configured modes: graph
- class tensorrt_llm._torch.auto_deploy.transform.pipeline_cache.pipeline_cache.PipelineCache(
- config: TransformConfig,
Bases:
BaseTransformTransform that snapshots/restores the model at its configured pipeline position.
- classmethod get_config_class() → type[TransformConfig][source]#
Get the configuration class for the transform.
This is used to validate the configuration of the transform.
- maybe_restore(
- _cm: CachedSequenceInterface,
- factory: ModelFactory,
- shared_config: SharedConfig,
- transform_index: int,
Return a cached module for this transform point, or
Noneon a miss.
YAML configuration
The fields below can be set under this transform’s entry in the AutoDeploy config YAML.
- pydantic model tensorrt_llm._torch.auto_deploy.transform.pipeline_cache.pipeline_cache.PipelineCacheConfig[source]
Bases:
TransformConfigConfiguration for the torch-save pipeline cache transform.
Show JSON schema
{ "title": "PipelineCacheConfig", "description": "Configuration for the torch-save pipeline cache transform.", "type": "object", "properties": { "stage": { "$ref": "#/$defs/Stages", "description": "The stage of the transformation pipeline where this transform should run." }, "enabled": { "default": false, "description": "Whether to enable the torch-save pipeline cache transform.", "title": "Enabled", "type": "boolean" }, "root": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Cache root. Defaults to ~/.cache/tensorrt_llm/auto_deploy/pipeline_cache when the transform is enabled.", "title": "Root" } }, "$defs": { "Stages": { "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.", "enum": [ "factory", "export", "post_export", "pattern_matcher", "sharding", "weight_load", "post_load_fusion", "cache_init", "visualize", "compile" ], "title": "Stages", "type": "string" } }, "additionalProperties": false, "required": [ "stage" ] }
- Config:
extra: str = forbid
- Fields:
enabled (bool)root (str | None)
- Validators:
validate_enabled_cache»all fields
- field enabled: bool = False
Whether to enable the torch-save pipeline cache transform.
- field root: str | None = None
Cache root. Defaults to ~/.cache/tensorrt_llm/auto_deploy/pipeline_cache when the transform is enabled.
- validator validate_enabled_cache » all fields[source]
- debug_visualize_dir: ClassVar[str | None] = None
- expect_mem_change: ClassVar[bool] = False
- requires_clean_graph: ClassVar[bool] = False
- requires_shape_prop: ClassVar[bool] = False
- run_graph_cleanup: ClassVar[bool] = False
- run_per_gm: ClassVar[bool] = False
- run_shape_prop: ClassVar[bool] = False
- skip_on_error: ClassVar[bool] = True