Adding a Model
This document describes how to add a typical decoder-only model in TensorRT-LLM.
Step 1. Write Modeling Part
TensorRT-LLM provides different levels of APIs:
Low-level functions, for example,
concat
,add
, andsum
.Basic layers, such as,
Linear
andLayerNorm
.High-level layers, such as,
MLP
andAttention
.Base class for typical decoder-only models, such as,
DecoderModelForCausalLM
.
Create a model directory in
tensorrt_llm/models
, for examplemy_model
.Write a
model.py
with TensorRT-LLM’s APIs
class MyDecoderLayer(Module):
def __init__(self, config: PretrainedConfig, layer_idx: int):
self.layer_idx = layer_idx
self.config = config
self.input_layernorm = LayerNorm(...)
self.attention = Attention(...)
self.post_layernorm = LayerNorm(...)
self.mlp = MLP(...)
def forward(self, hidden_states, ...):
# decoder layer forward
return hidden_states
class MyModel(Module):
def __init__(self, config: PretrainedConfig):
self.config = config
self.vocab_embedding = Embedding(...)
self.layers = DecoderLayerList(MyDecoderLayer, config)
self.ln_f = LayerNorm(...)
def forward(self, input_ids, ...):
# model forward
return hidden_states
class MyModelForCausalLM(DecoderModelForCausalLM):
def __init__(self, config: PretrainedConfig):
transformer = MyModel(config)
lm_head = ColumnLinear(...)
super().__init__(config, transformer, lm_head)
Step 2. Implement Weight Conversion
The weights from source framework need to be converted and bound to the new added TensorRT-LLM model. Here is an example of converting HuggingFace weights:
class MyModelForCausalLM(DecoderModelForCausalLM):
@classmethod
def from_hugging_face(
cls,
hf_model_dir,
dtype='float16',
mapping: Optional[Mapping] = None) -> MyModelForCausalLM
# create a TensorRT-LLM MyModelForCausalLM model object
# convert HuggingFace checkpoint to TensorRT-LLM expected weights dict
# load the weights to MyModelForCausalLM object
It’s optional to develop a convert_checkpoint.py
script in the examples/my_model/
directory for the convenience of offline weights conversion.
Step 3. Register New Model
Please register the new model class MyModelForCausalLM
in tensorrt_llm/models/__init__.py
.
Step 4. Verify New Model
At last, let’s verify the new model. The typical commands are as following:
cd examples/my_model/
python convert_checkpoint.py --model_dir hf_model_dir --output_dir tllm_ckpt_dir
trtllm-build --checkpoint_dir tllm_ckpt_dir --output_dir tllm_engine_dir
# try the model with a single prompt
python ../run.py --engine_dir tllm_engine_dir --tokenizer_dir hf_model_dir --input_text "Born in north-east France, Soyer trained as a"
# run summarization task
python ../summarize.py --engine_dir tllm_engine_dir --hf_model_dir hf_model_dir --test_trt_llm
Reference
It’s recommended to read the workflow[./workflow.md] and checkpoint[./checkpoint.md] documents for more details.