LLM API Change Guide#
This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.
Overview#
TensorRT LLM provides multiple API levels:
LLM API - The highest-level API (e.g., the
LLMclass)PyExecutor API - The mid-level API (e.g., the
PyExecutorclass)
This guide focuses on the LLM API, which is the primary interface for most users.
API Types and Stability Guarantees#
TensorRT LLM classifies APIs into two categories:
1. Committed APIs#
Stable and guaranteed to remain consistent across releases
No breaking changes without major version updates
Schema stored in:
tests/unittest/api_stability/references_committed/
2. Non-committed APIs#
Under active development and may change between releases
Marked with a
statusfield in the docstring:prototype- Early experimental stagebeta- More stable but still subject to changedeprecated- Scheduled for removal
Schema stored in:
tests/unittest/api_stability/references/See API status documentation for complete details
API Schema Management#
All API schemas are:
Stored as YAML files in the codebase
Protected by unit tests in
tests/unittest/api_stability/Automatically validated to ensure consistency
Modifying LLM Constructor Arguments#
The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called LlmArgs.
Architecture#
The LLM’s
__init__method parameters map directly toLlmArgsfieldsLlmArgsis an alias forTorchLlmArgs(defined intensorrt_llm/llmapi/llm_args.py)All arguments are validated and type-checked through Pydantic
Adding a New Argument#
Follow these steps to add a new constructor argument:
1. Add the field to TorchLlmArgs#
garbage_collection_gen0_threshold: int = Field(
default=20000,
description=(
"Threshold for Python garbage collection of generation 0 objects. "
"Lower values trigger more frequent garbage collection."
),
status="beta" # Required for non-committed arguments
)
Field requirements:
Type annotation: Required for all fields
Default value: Recommended unless the field is mandatory
Description: Clear explanation of the parameter’s purpose
Status: Required for non-committed arguments (
prototype,beta, etc.)
2. Update the API schema#
Add the field to the appropriate schema file:
Non-committed arguments:
tests/unittest/api_stability/references/llm_args.yamlgarbage_collection_gen0_threshold: type: int default: 20000 status: beta # Must match the status in code
Committed arguments:
tests/unittest/api_stability/references_committed/llm_args.yamlgarbage_collection_gen0_threshold: type: int default: 20000 # No status field for committed arguments
3. Run validation tests#
python -m pytest tests/unittest/api_stability/test_llm_api.py
Modifying LLM Class Methods#
Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.
Implementation Details#
The actual implementation is in the
_TorchLLMclass (llm.py)Public methods (not starting with
_) are automatically exposed as APIs
Adding a New Method#
Follow these steps to add a new API method:
1. Implement the method in _TorchLLM#
For non-committed APIs, use the @set_api_status decorator:
@set_api_status("beta")
def generate_with_streaming(
self,
prompts: List[str],
**kwargs
) -> Iterator[GenerationOutput]:
"""Generate text with streaming output.
Args:
prompts: Input prompts for generation
**kwargs: Additional generation parameters
Returns:
Iterator of generation outputs
"""
# Implementation here
pass
For committed APIs, no decorator is needed:
def generate(self, prompts: List[str], **kwargs) -> GenerationOutput:
"""Generate text from prompts."""
# Implementation here
pass
2. Update the API schema#
Add the method to the appropriate llm.yaml file:
Non-committed API (tests/unittest/api_stability/references/llm.yaml):
generate_with_streaming:
status: beta # Must match @set_api_status
parameters:
- name: prompts
type: List[str]
- name: kwargs
type: dict
returns: Iterator[GenerationOutput]
Committed API (tests/unittest/api_stability/references_committed/llm.yaml):
generate:
parameters:
- name: prompts
type: List[str]
- name: kwargs
type: dict
returns: GenerationOutput
Modifying Existing Methods#
When modifying existing methods:
Non-breaking changes (adding optional parameters):
Update the method signature
Update the schema file
No status change needed
Breaking changes (changing required parameters, return types):
Only allowed for non-committed APIs
Consider deprecation path for beta APIs
Update documentation with migration guide
Best Practices#
Documentation: Always include comprehensive docstrings
Type hints: Use proper type annotations for all parameters and returns
Testing: Add unit tests for new methods
Examples: Provide usage examples in the docstring
Validation: Run API stability tests before submitting changes
Running Tests#
Validate your changes:
# Run API stability tests
python -m pytest tests/unittest/api_stability/
# Run specific test for LLM API
python -m pytest tests/unittest/api_stability/test_llm_api.py -v
Common Workflows#
Promoting an API from Beta to Committed#
Remove the
@set_api_status("beta")decorator from the methodMove the schema entry from
tests/unittest/api_stability/references/totests/unittest/api_stability/references_committed/Remove the
statusfield from the schemaUpdate any documentation referring to the API’s beta status
Deprecating an API#
Add
@set_api_status("deprecated")to the methodUpdate the schema with
status: deprecatedAdd deprecation warning in the method:
import warnings warnings.warn( "This method is deprecated and will be removed in v2.0. " "Use new_method() instead.", DeprecationWarning, stacklevel=2 )
Document the migration path