LLM API Change Guide#
This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.
Overview#
TensorRT LLM provides multiple API levels:
LLM API - The highest-level API (e.g., the
LLM
class)PyExecutor API - The mid-level API (e.g., the
PyExecutor
class)
This guide focuses on the LLM API, which is the primary interface for most users.
API Types and Stability Guarantees#
TensorRT LLM classifies APIs into two categories:
1. Committed APIs#
Stable and guaranteed to remain consistent across releases
No breaking changes without major version updates
Schema stored in:
tests/unittest/api_stability/references_committed/
2. Non-committed APIs#
Under active development and may change between releases
Marked with a
status
field in the docstring:prototype
- Early experimental stagebeta
- More stable but still subject to changedeprecated
- Scheduled for removal
Schema stored in:
tests/unittest/api_stability/references/
See API status documentation for complete details
API Schema Management#
All API schemas are:
Stored as YAML files in the codebase
Protected by unit tests in
tests/unittest/api_stability/
Automatically validated to ensure consistency
Modifying LLM Constructor Arguments#
The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called LlmArgs
.
Architecture#
The LLM’s
__init__
method parameters map directly toLlmArgs
fieldsLlmArgs
is an alias forTorchLlmArgs
(defined intensorrt_llm/llmapi/llm_args.py
)All arguments are validated and type-checked through Pydantic
Adding a New Argument#
Follow these steps to add a new constructor argument:
1. Add the field to TorchLlmArgs
#
garbage_collection_gen0_threshold: int = Field(
default=20000,
description=(
"Threshold for Python garbage collection of generation 0 objects. "
"Lower values trigger more frequent garbage collection."
),
status="beta" # Required for non-committed arguments
)
Field requirements:
Type annotation: Required for all fields
Default value: Recommended unless the field is mandatory
Description: Clear explanation of the parameter’s purpose
Status: Required for non-committed arguments (
prototype
,beta
, etc.)
2. Update the API schema#
Add the field to the appropriate schema file:
Non-committed arguments:
tests/unittest/api_stability/references/llm_args.yaml
garbage_collection_gen0_threshold: type: int default: 20000 status: beta # Must match the status in code
Committed arguments:
tests/unittest/api_stability/references_committed/llm_args.yaml
garbage_collection_gen0_threshold: type: int default: 20000 # No status field for committed arguments
3. Run validation tests#
python -m pytest tests/unittest/api_stability/test_llm_api.py
Modifying LLM Class Methods#
Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.
Implementation Details#
The actual implementation is in the
_TorchLLM
class (llm.py)Public methods (not starting with
_
) are automatically exposed as APIs
Adding a New Method#
Follow these steps to add a new API method:
1. Implement the method in _TorchLLM
#
For non-committed APIs, use the @set_api_status
decorator:
@set_api_status("beta")
def generate_with_streaming(
self,
prompts: List[str],
**kwargs
) -> Iterator[GenerationOutput]:
"""Generate text with streaming output.
Args:
prompts: Input prompts for generation
**kwargs: Additional generation parameters
Returns:
Iterator of generation outputs
"""
# Implementation here
pass
For committed APIs, no decorator is needed:
def generate(self, prompts: List[str], **kwargs) -> GenerationOutput:
"""Generate text from prompts."""
# Implementation here
pass
2. Update the API schema#
Add the method to the appropriate llm.yaml
file:
Non-committed API (tests/unittest/api_stability/references/llm.yaml
):
generate_with_streaming:
status: beta # Must match @set_api_status
parameters:
- name: prompts
type: List[str]
- name: kwargs
type: dict
returns: Iterator[GenerationOutput]
Committed API (tests/unittest/api_stability/references_committed/llm.yaml
):
generate:
parameters:
- name: prompts
type: List[str]
- name: kwargs
type: dict
returns: GenerationOutput
Modifying Existing Methods#
When modifying existing methods:
Non-breaking changes (adding optional parameters):
Update the method signature
Update the schema file
No status change needed
Breaking changes (changing required parameters, return types):
Only allowed for non-committed APIs
Consider deprecation path for beta APIs
Update documentation with migration guide
Best Practices#
Documentation: Always include comprehensive docstrings
Type hints: Use proper type annotations for all parameters and returns
Testing: Add unit tests for new methods
Examples: Provide usage examples in the docstring
Validation: Run API stability tests before submitting changes
Running Tests#
Validate your changes:
# Run API stability tests
python -m pytest tests/unittest/api_stability/
# Run specific test for LLM API
python -m pytest tests/unittest/api_stability/test_llm_api.py -v
Common Workflows#
Promoting an API from Beta to Committed#
Remove the
@set_api_status("beta")
decorator from the methodMove the schema entry from
tests/unittest/api_stability/references/
totests/unittest/api_stability/references_committed/
Remove the
status
field from the schemaUpdate any documentation referring to the API’s beta status
Deprecating an API#
Add
@set_api_status("deprecated")
to the methodUpdate the schema with
status: deprecated
Add deprecation warning in the method:
import warnings warnings.warn( "This method is deprecated and will be removed in v2.0. " "Use new_method() instead.", DeprecationWarning, stacklevel=2 )
Document the migration path