Weight Loading Stage#

Weight loading materializes model weights and moves required state to the target device after graph structure and sharding decisions have been made. This stage bridges graph preparation and weight-dependent fusion.

Strip Sharding Hints#

Transform key: strip_sharding_hints

Source module: tensorrt_llm._torch.auto_deploy.transform.library.sharding_ir

Configured modes: graph

class tensorrt_llm._torch.auto_deploy.transform.library.sharding_ir.StripShardingHints( config: TransformConfig, )[source]#

Bases: BaseTransform

Strip sharding hints and lower placeholder ops to zero-copy aten equivalents.

Placeholder ops (auto_deploy.view, split_with_sizes, all_reduce) are replaced with native aten ops to eliminate the .clone() overhead required by PyTorch’s custom op framework. Other enable_sharding ops that have no aten equivalent get their hint kwargs stripped so downstream transforms see canonical op signatures.

classmethod get_config_class() → Type[TransformConfig][source]#

Get the configuration class for the transform.

This is used to validate the configuration of the transform.

YAML configuration

Uses the common TransformConfig fields documented in Core Transform APIs.

Load Weights#

Transform key: load_weights

Source module: tensorrt_llm._torch.auto_deploy.transform.library.load_weights

Configured modes: graph

class tensorrt_llm._torch.auto_deploy.transform.library.load_weights.LoadWeightsToDevice( config: TransformConfig, )[source]#

Bases: BaseTransform

A simple wrapper transform to load weights into a model.

classmethod get_config_class() → Type[TransformConfig][source]#

Get the configuration class for the transform.

This is used to validate the configuration of the transform.

YAML configuration

The fields below can be set under this transform’s entry in the AutoDeploy config YAML.

pydantic model tensorrt_llm._torch.auto_deploy.transform.library.load_weights.MoveDeviceConfig[source]

Bases: TransformConfig

Configuration for the moving inputs/arguments to the device transform.

Show JSON schema

{
   "title": "MoveDeviceConfig",
   "description": "Configuration for the moving inputs/arguments to the device transform.",
   "type": "object",
   "properties": {
      "stage": {
         "$ref": "#/$defs/Stages",
         "description": "The stage of the transformation pipeline where this transform should run."
      },
      "run_per_gm": {
         "default": true,
         "description": "Whether to run the transform per graph (sub)module or on whole module.",
         "title": "Run Per Gm",
         "type": "boolean"
      },
      "enabled": {
         "default": true,
         "description": "Whether to enable this transform.",
         "title": "Enabled",
         "type": "boolean"
      },
      "skip_on_error": {
         "default": false,
         "description": "Whether to skip the transform if an error occurs.",
         "title": "Skip On Error",
         "type": "boolean"
      },
      "run_graph_cleanup": {
         "default": true,
         "description": "Whether to run graph cleanup/canonicalization after this transform.",
         "title": "Run Graph Cleanup",
         "type": "boolean"
      },
      "run_shape_prop": {
         "default": false,
         "description": "Whether to run shape propagation after this transform.",
         "title": "Run Shape Prop",
         "type": "boolean"
      },
      "requires_clean_graph": {
         "default": true,
         "description": "Whether this transform requires the graph to be clean before it is applied.",
         "title": "Requires Clean Graph",
         "type": "boolean"
      },
      "requires_shape_prop": {
         "default": false,
         "description": "Whether this transform requires shape propagation before it is applied.",
         "title": "Requires Shape Prop",
         "type": "boolean"
      },
      "debug_visualize_dir": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.",
         "title": "Debug Visualize Dir"
      },
      "expect_mem_change": {
         "default": false,
         "description": "Whether this transform is expected to cause changes in CUDA memory stats.",
         "title": "Expect Mem Change",
         "type": "boolean"
      },
      "checkpoint_device": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Optional device to init checkpoint before move to shared_config.local_device.",
         "title": "Checkpoint Device"
      },
      "disable_preload": {
         "default": false,
         "description": "If True, disable preloading weights.",
         "title": "Disable Preload",
         "type": "boolean"
      }
   },
   "$defs": {
      "Stages": {
         "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.",
         "enum": [
            "factory",
            "export",
            "post_export",
            "pattern_matcher",
            "sharding",
            "weight_load",
            "post_load_fusion",
            "cache_init",
            "visualize",
            "compile"
         ],
         "title": "Stages",
         "type": "string"
      }
   },
   "additionalProperties": true,
   "required": [
      "stage"
   ]
}

Config:

extra: str = allow

Fields:

checkpoint_device (str | None)
disable_preload (bool)

field checkpoint_device: str | None = None: Optional device to init checkpoint before move to shared_config.local_device.

field disable_preload: bool = False: If True, disable preloading weights.

Move Inputs To Device#

Transform key: move_inputs_to_device

Source module: tensorrt_llm._torch.auto_deploy.transform.library.load_weights

Configured modes: graph, transformers

class tensorrt_llm._torch.auto_deploy.transform.library.load_weights.LoadFactoryModelWeights( config: TransformConfig, )[source]#

Bases: BaseTransform

Wrapper transform to move all inputs/arguments to the device.

classmethod get_config_class() → Type[TransformConfig][source]#

Get the configuration class for the transform.

This is used to validate the configuration of the transform.

YAML configuration

The fields below can be set under this transform’s entry in the AutoDeploy config YAML.

pydantic model tensorrt_llm._torch.auto_deploy.transform.library.load_weights.MoveDeviceConfig[source]

Bases: TransformConfig

Configuration for the moving inputs/arguments to the device transform.

Show JSON schema

{
   "title": "MoveDeviceConfig",
   "description": "Configuration for the moving inputs/arguments to the device transform.",
   "type": "object",
   "properties": {
      "stage": {
         "$ref": "#/$defs/Stages",
         "description": "The stage of the transformation pipeline where this transform should run."
      },
      "run_per_gm": {
         "default": true,
         "description": "Whether to run the transform per graph (sub)module or on whole module.",
         "title": "Run Per Gm",
         "type": "boolean"
      },
      "enabled": {
         "default": true,
         "description": "Whether to enable this transform.",
         "title": "Enabled",
         "type": "boolean"
      },
      "skip_on_error": {
         "default": false,
         "description": "Whether to skip the transform if an error occurs.",
         "title": "Skip On Error",
         "type": "boolean"
      },
      "run_graph_cleanup": {
         "default": true,
         "description": "Whether to run graph cleanup/canonicalization after this transform.",
         "title": "Run Graph Cleanup",
         "type": "boolean"
      },
      "run_shape_prop": {
         "default": false,
         "description": "Whether to run shape propagation after this transform.",
         "title": "Run Shape Prop",
         "type": "boolean"
      },
      "requires_clean_graph": {
         "default": true,
         "description": "Whether this transform requires the graph to be clean before it is applied.",
         "title": "Requires Clean Graph",
         "type": "boolean"
      },
      "requires_shape_prop": {
         "default": false,
         "description": "Whether this transform requires shape propagation before it is applied.",
         "title": "Requires Shape Prop",
         "type": "boolean"
      },
      "debug_visualize_dir": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.",
         "title": "Debug Visualize Dir"
      },
      "expect_mem_change": {
         "default": false,
         "description": "Whether this transform is expected to cause changes in CUDA memory stats.",
         "title": "Expect Mem Change",
         "type": "boolean"
      },
      "checkpoint_device": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Optional device to init checkpoint before move to shared_config.local_device.",
         "title": "Checkpoint Device"
      },
      "disable_preload": {
         "default": false,
         "description": "If True, disable preloading weights.",
         "title": "Disable Preload",
         "type": "boolean"
      }
   },
   "$defs": {
      "Stages": {
         "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.",
         "enum": [
            "factory",
            "export",
            "post_export",
            "pattern_matcher",
            "sharding",
            "weight_load",
            "post_load_fusion",
            "cache_init",
            "visualize",
            "compile"
         ],
         "title": "Stages",
         "type": "string"
      }
   },
   "additionalProperties": true,
   "required": [
      "stage"
   ]
}

Config:

extra: str = allow

Fields:

checkpoint_device (str | None)
debug_visualize_dir (Optional[str])
disable_preload (bool)
enabled (bool)
expect_mem_change (bool)
requires_clean_graph (bool)
requires_shape_prop (bool)
run_graph_cleanup (bool)
run_per_gm (bool)
run_shape_prop (bool)
skip_on_error (bool)
stage (Stages)

field checkpoint_device: str | None = None: Optional device to init checkpoint before move to shared_config.local_device.

field debug_visualize_dir: str | None = None: Debug visualization directory. None to disable visualization, or a path string to specify the output directory.

field disable_preload: bool = False: If True, disable preloading weights.

field enabled: bool = True: Whether to enable this transform.

field expect_mem_change: bool = False: Whether this transform is expected to cause changes in CUDA memory stats.

field requires_clean_graph: bool = True: Whether this transform requires the graph to be clean before it is applied.

field requires_shape_prop: bool = False: Whether this transform requires shape propagation before it is applied.

field run_graph_cleanup: bool = True: Whether to run graph cleanup/canonicalization after this transform.

field run_per_gm: bool = True: Whether to run the transform per graph (sub)module or on whole module.

field run_shape_prop: bool = False: Whether to run shape propagation after this transform.

field skip_on_error: bool = False: Whether to skip the transform if an error occurs.

field stage: Stages [Required]: The stage of the transformation pipeline where this transform should run.