Factory Stage#

Factory transforms create or wrap the starting model object for AutoDeploy. This stage establishes the module that later graph, weight-loading, cache, and runtime transforms will optimize.

Build Model#

Transform key: build_model

Source module: tensorrt_llm._torch.auto_deploy.transform.library.build_model

Configured modes: graph

class tensorrt_llm._torch.auto_deploy.transform.library.build_model.BuildModel( config: TransformConfig, )[source]#

Bases: BaseTransform

A simple wrapper transform to build a model via the model factory build_model method.

This transform will build the model via the build_model method of the model factory on the meta device (or the set device) and not load the weights.

classmethod get_config_class() → Type[TransformConfig][source]#

Get the configuration class for the transform.

This is used to validate the configuration of the transform.

YAML configuration

The fields below can be set under this transform’s entry in the AutoDeploy config YAML.

pydantic model tensorrt_llm._torch.auto_deploy.transform.library.build_model.BuildModelConfig[source]

Bases: TransformConfig

Configuration for the build model transform.

Show JSON schema

{
   "title": "BuildModelConfig",
   "description": "Configuration for the build model transform.",
   "type": "object",
   "properties": {
      "stage": {
         "$ref": "#/$defs/Stages",
         "description": "The stage of the transformation pipeline where this transform should run."
      },
      "run_per_gm": {
         "default": true,
         "description": "Whether to run the transform per graph (sub)module or on whole module.",
         "title": "Run Per Gm",
         "type": "boolean"
      },
      "enabled": {
         "default": true,
         "description": "Whether to enable this transform.",
         "title": "Enabled",
         "type": "boolean"
      },
      "skip_on_error": {
         "default": false,
         "description": "Whether to skip the transform if an error occurs.",
         "title": "Skip On Error",
         "type": "boolean"
      },
      "run_graph_cleanup": {
         "default": true,
         "description": "Whether to run graph cleanup/canonicalization after this transform.",
         "title": "Run Graph Cleanup",
         "type": "boolean"
      },
      "run_shape_prop": {
         "default": false,
         "description": "Whether to run shape propagation after this transform.",
         "title": "Run Shape Prop",
         "type": "boolean"
      },
      "requires_clean_graph": {
         "default": true,
         "description": "Whether this transform requires the graph to be clean before it is applied.",
         "title": "Requires Clean Graph",
         "type": "boolean"
      },
      "requires_shape_prop": {
         "default": false,
         "description": "Whether this transform requires shape propagation before it is applied.",
         "title": "Requires Shape Prop",
         "type": "boolean"
      },
      "debug_visualize_dir": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Debug visualization directory. None to disable visualization, or a path string to specify the output directory.",
         "title": "Debug Visualize Dir"
      },
      "expect_mem_change": {
         "default": false,
         "description": "Whether this transform is expected to cause changes in CUDA memory stats.",
         "title": "Expect Mem Change",
         "type": "boolean"
      },
      "device": {
         "default": "meta",
         "description": "The device to build the model on.",
         "title": "Device",
         "type": "string"
      }
   },
   "$defs": {
      "Stages": {
         "description": "Enumerated (ordered!) stages of the transformation pipeline.\n\nThis is used to classify and pre-order transforms.",
         "enum": [
            "factory",
            "export",
            "post_export",
            "pattern_matcher",
            "sharding",
            "weight_load",
            "post_load_fusion",
            "cache_init",
            "visualize",
            "compile"
         ],
         "title": "Stages",
         "type": "string"
      }
   },
   "additionalProperties": true,
   "required": [
      "stage"
   ]
}

Config:

extra: str = allow

Fields:

device (str)

field device: str = 'meta': The device to build the model on.

Build And Load Factory Model#

Transform key: build_and_load_factory_model

Source module: tensorrt_llm._torch.auto_deploy.transform.library.build_model

Configured modes: transformers

class tensorrt_llm._torch.auto_deploy.transform.library.build_model.BuildAndLoadFactoryModel( config: TransformConfig, )[source]#

Bases: BuildModel

A simple wrapper transform to build AND load a model via the factory’s build_and_load API.

Under the hood, the factory can use a different way to build and load the model at the same time rather than just building the model. For example, the HF factory uses the .from_pretrained API to directly build and load the model at the same time.

We also assume that the build_and_load_model method will auto-shard the model appropriately.