Weight Loading Stage#
Weight loading materializes model weights and moves required state to the target device after graph structure and sharding decisions have been made. This stage bridges graph preparation and weight-dependent fusion.
Strip Sharding Hints#
Transform key: strip_sharding_hints
Source module: tensorrt_llm._torch.auto_deploy.transform.library.sharding_ir
Configured modes: graph
- class tensorrt_llm._torch.auto_deploy.transform.library.sharding_ir.StripShardingHints(
- config: TransformConfig,
Bases:
BaseTransformStrip sharding hints and lower placeholder ops to zero-copy aten equivalents.
Placeholder ops (
auto_deploy.view,split_with_sizes,all_reduce) are replaced with native aten ops to eliminate the.clone()overhead required by PyTorch’s custom op framework. Other enable_sharding ops that have no aten equivalent get their hint kwargs stripped so downstream transforms see canonical op signatures.- classmethod get_config_class() Type[TransformConfig][source]#
Get the configuration class for the transform.
This is used to validate the configuration of the transform.
YAML configuration
Uses the common TransformConfig fields documented in Core Transform APIs.
Load Weights#
Transform key: load_weights
Source module: tensorrt_llm._torch.auto_deploy.transform.library.load_weights
Configured modes: graph
- class tensorrt_llm._torch.auto_deploy.transform.library.load_weights.LoadWeightsToDevice(
- config: TransformConfig,
Bases:
BaseTransformA simple wrapper transform to load weights into a model.
- classmethod get_config_class() Type[TransformConfig][source]#
Get the configuration class for the transform.
This is used to validate the configuration of the transform.
YAML configuration
The fields below can be set under this transform’s entry in the AutoDeploy config YAML.
- pydantic model tensorrt_llm._torch.auto_deploy.transform.library.load_weights.MoveDeviceConfig[source]
Bases:
TransformConfigConfiguration for the moving inputs/arguments to the device transform.
Show JSON schema
{ "title": "MoveDeviceConfig", "description": "Configuration for the moving inputs/arguments to the device transform.", "type": "object", "properties": { "stage": { "$ref": "#/$defs/Stages", "description": "The stage of the transformation pipeline where this transform should run." }, "run_per_gm": { "default": true, "description": "Whether to run the transform per graph (sub)module or on whole module.", "title": "Run Per Gm", "type": "boolean" }, "enabled": { "default": true, "description": "Whether to enable this transform.", "title": "Enabled", "type": "boolean" }, "skip_on_error": { "default": false, "description": "Whether to skip the transform if an error occurs.", "title": "Skip On Error", "type": "boolean" }, "run_graph_cleanup": { "default": true, "description": "Whether to run graph cleanup/canonicalization after this transform.", "title": "Run Graph Cleanup", "type": "boolean" }, "run_shape_prop": { "default": false, "description": "Whether to run shape propagation after this transform.", "title": "Run Shape Prop", "type": "boolean" }, "requires_clean_graph": { "default": true, "description": "Whether this transform requires the graph to be clean before it is applied.", "title": "Requires Clean Graph", "type": "boolean" }, "requires_shape_prop": { "default": false, "description": "Whether this transform requires shape propagation before it is applied.", "title": "Requires Shape Prop", "type": "boolean" }, "debug_visualize_dir": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.", "title": "Debug Visualize Dir" }, "expect_mem_change": { "default": false, "description": "Whether this transform is expected to cause changes in CUDA memory stats.", "title": "Expect Mem Change", "type": "boolean" }, "checkpoint_device": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Optional device to init checkpoint before move to shared_config.local_device.", "title": "Checkpoint Device" }, "disable_preload": { "default": false, "description": "If True, disable preloading weights.", "title": "Disable Preload", "type": "boolean" } }, "$defs": { "Stages": { "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.", "enum": [ "factory", "export", "post_export", "pattern_matcher", "sharding", "weight_load", "post_load_fusion", "cache_init", "visualize", "compile" ], "title": "Stages", "type": "string" } }, "additionalProperties": true, "required": [ "stage" ] }
- Config:
extra: str = allow
- Fields:
checkpoint_device (str | None)disable_preload (bool)
- field checkpoint_device: str | None = None
Optional device to init checkpoint before move to shared_config.local_device.
- field disable_preload: bool = False
If True, disable preloading weights.
Move Inputs To Device#
Transform key: move_inputs_to_device
Source module: tensorrt_llm._torch.auto_deploy.transform.library.load_weights
Configured modes: graph, transformers
- class tensorrt_llm._torch.auto_deploy.transform.library.load_weights.LoadFactoryModelWeights(
- config: TransformConfig,
Bases:
BaseTransformWrapper transform to move all inputs/arguments to the device.
- classmethod get_config_class() Type[TransformConfig][source]#
Get the configuration class for the transform.
This is used to validate the configuration of the transform.
YAML configuration
The fields below can be set under this transform’s entry in the AutoDeploy config YAML.
- pydantic model tensorrt_llm._torch.auto_deploy.transform.library.load_weights.MoveDeviceConfig[source]
Bases:
TransformConfigConfiguration for the moving inputs/arguments to the device transform.
Show JSON schema
{ "title": "MoveDeviceConfig", "description": "Configuration for the moving inputs/arguments to the device transform.", "type": "object", "properties": { "stage": { "$ref": "#/$defs/Stages", "description": "The stage of the transformation pipeline where this transform should run." }, "run_per_gm": { "default": true, "description": "Whether to run the transform per graph (sub)module or on whole module.", "title": "Run Per Gm", "type": "boolean" }, "enabled": { "default": true, "description": "Whether to enable this transform.", "title": "Enabled", "type": "boolean" }, "skip_on_error": { "default": false, "description": "Whether to skip the transform if an error occurs.", "title": "Skip On Error", "type": "boolean" }, "run_graph_cleanup": { "default": true, "description": "Whether to run graph cleanup/canonicalization after this transform.", "title": "Run Graph Cleanup", "type": "boolean" }, "run_shape_prop": { "default": false, "description": "Whether to run shape propagation after this transform.", "title": "Run Shape Prop", "type": "boolean" }, "requires_clean_graph": { "default": true, "description": "Whether this transform requires the graph to be clean before it is applied.", "title": "Requires Clean Graph", "type": "boolean" }, "requires_shape_prop": { "default": false, "description": "Whether this transform requires shape propagation before it is applied.", "title": "Requires Shape Prop", "type": "boolean" }, "debug_visualize_dir": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.", "title": "Debug Visualize Dir" }, "expect_mem_change": { "default": false, "description": "Whether this transform is expected to cause changes in CUDA memory stats.", "title": "Expect Mem Change", "type": "boolean" }, "checkpoint_device": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "description": "Optional device to init checkpoint before move to shared_config.local_device.", "title": "Checkpoint Device" }, "disable_preload": { "default": false, "description": "If True, disable preloading weights.", "title": "Disable Preload", "type": "boolean" } }, "$defs": { "Stages": { "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.", "enum": [ "factory", "export", "post_export", "pattern_matcher", "sharding", "weight_load", "post_load_fusion", "cache_init", "visualize", "compile" ], "title": "Stages", "type": "string" } }, "additionalProperties": true, "required": [ "stage" ] }
- Config:
extra: str = allow
- Fields:
checkpoint_device (str | None)debug_visualize_dir (Optional[str])disable_preload (bool)enabled (bool)expect_mem_change (bool)requires_clean_graph (bool)requires_shape_prop (bool)run_graph_cleanup (bool)run_per_gm (bool)run_shape_prop (bool)skip_on_error (bool)stage (Stages)
- field checkpoint_device: str | None = None
Optional device to init checkpoint before move to shared_config.local_device.
- field debug_visualize_dir: str | None = None
Debug visualization directory. None to disable visualization, or a path string to specify the output directory.
- field disable_preload: bool = False
If True, disable preloading weights.
- field enabled: bool = True
Whether to enable this transform.
- field expect_mem_change: bool = False
Whether this transform is expected to cause changes in CUDA memory stats.
- field requires_clean_graph: bool = True
Whether this transform requires the graph to be clean before it is applied.
- field requires_shape_prop: bool = False
Whether this transform requires shape propagation before it is applied.
- field run_graph_cleanup: bool = True
Whether to run graph cleanup/canonicalization after this transform.
- field run_per_gm: bool = True
Whether to run the transform per graph (sub)module or on whole module.
- field run_shape_prop: bool = False
Whether to run shape propagation after this transform.
- field skip_on_error: bool = False
Whether to skip the transform if an error occurs.
- field stage: Stages [Required]
The stage of the transformation pipeline where this transform should run.