LLM API Change Guide#

This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.

Overview#

TensorRT LLM provides multiple API levels:

LLM API - The highest-level API (e.g., the LLM class)
PyExecutor API - The mid-level API (e.g., the PyExecutor class)

This guide focuses on the LLM API, which is the primary interface for most users.

API Types and Stability Guarantees#

TensorRT LLM classifies APIs into two categories:

1. Committed APIs#

Stable and guaranteed to remain consistent across releases
No breaking changes without major version updates
Schema stored in: tests/unittest/api_stability/references_committed/

2. Non-committed APIs#

Under active development and may change between releases
Marked with a status field in the docstring:
- prototype - Early experimental stage
- beta - More stable but still subject to change
- deprecated - Scheduled for removal
Schema stored in: tests/unittest/api_stability/references/
See API status documentation for complete details

API Schema Management#

All API schemas are:

Stored as YAML files in the codebase
Protected by unit tests in tests/unittest/api_stability/
Automatically validated to ensure consistency

Modifying LLM Constructor Arguments#

The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called LlmArgs.

Architecture#

The LLM’s __init__ method parameters map directly to LlmArgs fields
LlmArgs is an alias for TorchLlmArgs (defined in tensorrt_llm/llmapi/llm_args.py)
All arguments are validated and type-checked through Pydantic

Adding a New Argument#

Follow these steps to add a new constructor argument:

1. Add the field to `TorchLlmArgs`#

garbage_collection_gen0_threshold: int = Field(
    default=20000,
    description=(
        "Threshold for Python garbage collection of generation 0 objects. "
        "Lower values trigger more frequent garbage collection."
    ),
    status="beta"  # Required for non-committed arguments
)

Field requirements:

Type annotation: Required for all fields
Default value: Recommended unless the field is mandatory
Description: Clear explanation of the parameter’s purpose
Status: Required for non-committed arguments (prototype, beta, etc.)

2. Update the API schema#

Add the field to the appropriate schema file:

Non-committed arguments: tests/unittest/api_stability/references/llm_args.yaml

garbage_collection_gen0_threshold:
  type: int
  default: 20000
  status: beta  # Must match the status in code

Committed arguments: tests/unittest/api_stability/references_committed/llm_args.yaml

garbage_collection_gen0_threshold:
  type: int
  default: 20000
  # No status field for committed arguments

3. Run validation tests#

python -m pytest tests/unittest/api_stability/test_llm_api.py

Modifying LLM Class Methods#

Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.

Implementation Details#

The actual implementation is in the _TorchLLM class (llm.py)
Public methods (not starting with _) are automatically exposed as APIs

Adding a New Method#

Follow these steps to add a new API method:

1. Implement the method in `_TorchLLM`#

For non-committed APIs, use the @set_api_status decorator:

@set_api_status("beta")
def generate_with_streaming(
    self, 
    prompts: List[str], 
    **kwargs
) -> Iterator[GenerationOutput]:
    """Generate text with streaming output.
    
    Args:
        prompts: Input prompts for generation
        **kwargs: Additional generation parameters
        
    Returns:
        Iterator of generation outputs
    """
    # Implementation here
    pass

For committed APIs, no decorator is needed:

def generate(self, prompts: List[str], **kwargs) -> GenerationOutput:
    """Generate text from prompts."""
    # Implementation here
    pass

2. Update the API schema#

Add the method to the appropriate llm.yaml file:

Non-committed API (tests/unittest/api_stability/references/llm.yaml):

generate_with_streaming:
  status: beta  # Must match @set_api_status
  parameters:
    - name: prompts
      type: List[str]
    - name: kwargs
      type: dict
  returns: Iterator[GenerationOutput]

Committed API (tests/unittest/api_stability/references_committed/llm.yaml):

generate:
  parameters:
    - name: prompts
      type: List[str]
    - name: kwargs
      type: dict
  returns: GenerationOutput

Modifying Existing Methods#

When modifying existing methods:

Non-breaking changes (adding optional parameters):
- Update the method signature
- Update the schema file
- No status change needed
Breaking changes (changing required parameters, return types):
- Only allowed for non-committed APIs
- Consider deprecation path for beta APIs
- Update documentation with migration guide

Best Practices#

Documentation: Always include comprehensive docstrings
Type hints: Use proper type annotations for all parameters and returns
Testing: Add unit tests for new methods
Examples: Provide usage examples in the docstring
Validation: Run API stability tests before submitting changes

Running Tests#

Validate your changes:

# Run API stability tests
python -m pytest tests/unittest/api_stability/

# Run specific test for LLM API
python -m pytest tests/unittest/api_stability/test_llm_api.py -v

Common Workflows#

Promoting an API from Beta to Committed#

Remove the @set_api_status("beta") decorator from the method
Move the schema entry from tests/unittest/api_stability/references/ to tests/unittest/api_stability/references_committed/
Remove the status field from the schema
Update any documentation referring to the API’s beta status

Deprecating an API#

Add @set_api_status("deprecated") to the method
Update the schema with status: deprecated

Add deprecation warning in the method:

import warnings
warnings.warn(
    "This method is deprecated and will be removed in v2.0. "
    "Use new_method() instead.",
    DeprecationWarning,
    stacklevel=2
)

Document the migration path

LLM API Change Guide#

Overview#

API Types and Stability Guarantees#

1. Committed APIs#

2. Non-committed APIs#

API Schema Management#

Modifying LLM Constructor Arguments#

Architecture#

Adding a New Argument#

1. Add the field to TorchLlmArgs#

2. Update the API schema#

3. Run validation tests#

Modifying LLM Class Methods#

Implementation Details#

Adding a New Method#

1. Implement the method in _TorchLLM#

2. Update the API schema#

Modifying Existing Methods#

Best Practices#

Running Tests#

Common Workflows#

Promoting an API from Beta to Committed#

Deprecating an API#

1. Add the field to `TorchLlmArgs`#

1. Implement the method in `_TorchLLM`#