LLM API Change Guide#
This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.
Overview#
TensorRT LLM provides multiple API levels:
LLM API - The highest-level API (e.g., the
LLMclass)PyExecutor API - The mid-level API (e.g., the
PyExecutorclass)
This guide focuses on the LLM API, which is the primary interface for most users.
API Types and Stability Guarantees#
TensorRT LLM classifies APIs into two categories:
1. Committed APIs#
Stable and guaranteed to remain consistent across releases
No breaking changes without major version updates
Schema stored in:
tests/unittest/api_stability/references_committed/
2. Non-committed APIs#
Under active development and may change between releases
Marked with a
statusfield in the docstring:prototype- Early experimental stagebeta- More stable but still subject to changedeprecated- Scheduled for removal
Schema stored in:
tests/unittest/api_stability/references/See API status documentation for complete details
API Schema Management#
All API schemas are:
Stored as YAML files in the codebase
Protected by unit tests in
tests/unittest/api_stability/Automatically validated to ensure consistency
Modifying LLM Constructor Arguments#
The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called LlmArgs.
Architecture#
The LLM’s
__init__method parameters map directly toLlmArgsfieldsLlmArgsis an alias forTorchLlmArgs(defined intensorrt_llm/llmapi/llm_args.py)All arguments are validated and type-checked through Pydantic
Adding a New Argument#
Follow these steps to add a new constructor argument:
1. Add the field to TorchLlmArgs#
garbage_collection_gen0_threshold: int = Field(
    default=20000,
    description=(
        "Threshold for Python garbage collection of generation 0 objects. "
        "Lower values trigger more frequent garbage collection."
    ),
    status="beta"  # Required for non-committed arguments
)
Field requirements:
Type annotation: Required for all fields
Default value: Recommended unless the field is mandatory
Description: Clear explanation of the parameter’s purpose
Status: Required for non-committed arguments (
prototype,beta, etc.)
2. Update the API schema#
Add the field to the appropriate schema file:
Non-committed arguments:
tests/unittest/api_stability/references/llm_args.yamlgarbage_collection_gen0_threshold: type: int default: 20000 status: beta # Must match the status in code
Committed arguments:
tests/unittest/api_stability/references_committed/llm_args.yamlgarbage_collection_gen0_threshold: type: int default: 20000 # No status field for committed arguments
3. Run validation tests#
python -m pytest tests/unittest/api_stability/test_llm_api.py
Modifying LLM Class Methods#
Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.
Implementation Details#
The actual implementation is in the
_TorchLLMclass (llm.py)Public methods (not starting with
_) are automatically exposed as APIs
Adding a New Method#
Follow these steps to add a new API method:
1. Implement the method in _TorchLLM#
For non-committed APIs, use the @set_api_status decorator:
@set_api_status("beta")
def generate_with_streaming(
    self, 
    prompts: List[str], 
    **kwargs
) -> Iterator[GenerationOutput]:
    """Generate text with streaming output.
    
    Args:
        prompts: Input prompts for generation
        **kwargs: Additional generation parameters
        
    Returns:
        Iterator of generation outputs
    """
    # Implementation here
    pass
For committed APIs, no decorator is needed:
def generate(self, prompts: List[str], **kwargs) -> GenerationOutput:
    """Generate text from prompts."""
    # Implementation here
    pass
2. Update the API schema#
Add the method to the appropriate llm.yaml file:
Non-committed API (tests/unittest/api_stability/references/llm.yaml):
generate_with_streaming:
  status: beta  # Must match @set_api_status
  parameters:
    - name: prompts
      type: List[str]
    - name: kwargs
      type: dict
  returns: Iterator[GenerationOutput]
Committed API (tests/unittest/api_stability/references_committed/llm.yaml):
generate:
  parameters:
    - name: prompts
      type: List[str]
    - name: kwargs
      type: dict
  returns: GenerationOutput
Modifying Existing Methods#
When modifying existing methods:
Non-breaking changes (adding optional parameters):
Update the method signature
Update the schema file
No status change needed
Breaking changes (changing required parameters, return types):
Only allowed for non-committed APIs
Consider deprecation path for beta APIs
Update documentation with migration guide
Best Practices#
Documentation: Always include comprehensive docstrings
Type hints: Use proper type annotations for all parameters and returns
Testing: Add unit tests for new methods
Examples: Provide usage examples in the docstring
Validation: Run API stability tests before submitting changes
Running Tests#
Validate your changes:
# Run API stability tests
python -m pytest tests/unittest/api_stability/
# Run specific test for LLM API
python -m pytest tests/unittest/api_stability/test_llm_api.py -v
Common Workflows#
Promoting an API from Beta to Committed#
Remove the
@set_api_status("beta")decorator from the methodMove the schema entry from
tests/unittest/api_stability/references/totests/unittest/api_stability/references_committed/Remove the
statusfield from the schemaUpdate any documentation referring to the API’s beta status
Deprecating an API#
Add
@set_api_status("deprecated")to the methodUpdate the schema with
status: deprecatedAdd deprecation warning in the method:
import warnings warnings.warn( "This method is deprecated and will be removed in v2.0. " "Use new_method() instead.", DeprecationWarning, stacklevel=2 )
Document the migration path