LLM API Change Guide#

This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.

Overview#

TensorRT LLM provides multiple API levels:

  1. LLM API - The highest-level API (e.g., the LLM class)

  2. PyExecutor API - The mid-level API (e.g., the PyExecutor class)

This guide focuses on the LLM API, which is the primary interface for most users.

API Types and Stability Guarantees#

TensorRT LLM classifies APIs into two categories:

1. Committed APIs#

  • Stable and guaranteed to remain consistent across releases

  • No breaking changes without major version updates

  • Schema stored in: tests/unittest/api_stability/references_committed/

2. Non-committed APIs#

  • Under active development and may change between releases

  • Marked with a status field in the docstring:

    • prototype - Early experimental stage

    • beta - More stable but still subject to change

    • deprecated - Scheduled for removal

  • Schema stored in: tests/unittest/api_stability/references/

  • See API status documentation for complete details

API Schema Management#

All API schemas are:

  • Stored as YAML files in the codebase

  • Protected by unit tests in tests/unittest/api_stability/

  • Automatically validated to ensure consistency

Modifying LLM Constructor Arguments#

The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called LlmArgs.

Architecture#

  • The LLM’s __init__ method parameters map directly to LlmArgs fields

  • LlmArgs is an alias for TorchLlmArgs (defined in tensorrt_llm/llmapi/llm_args.py)

  • All arguments are validated and type-checked through Pydantic

Adding a New Argument#

Follow these steps to add a new constructor argument:

1. Add the field to TorchLlmArgs#

garbage_collection_gen0_threshold: int = Field(
    default=20000,
    description=(
        "Threshold for Python garbage collection of generation 0 objects. "
        "Lower values trigger more frequent garbage collection."
    ),
    status="beta"  # Required for non-committed arguments
)

Field requirements:

  • Type annotation: Required for all fields

  • Default value: Recommended unless the field is mandatory

  • Description: Clear explanation of the parameter’s purpose

  • Status: Required for non-committed arguments (prototype, beta, etc.)

2. Update the API schema#

Add the field to the appropriate schema file:

  • Non-committed arguments: tests/unittest/api_stability/references/llm_args.yaml

    garbage_collection_gen0_threshold:
      type: int
      default: 20000
      status: beta  # Must match the status in code
    
  • Committed arguments: tests/unittest/api_stability/references_committed/llm_args.yaml

    garbage_collection_gen0_threshold:
      type: int
      default: 20000
      # No status field for committed arguments
    

3. Run validation tests#

python -m pytest tests/unittest/api_stability/test_llm_api.py

Modifying LLM Class Methods#

Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.

Implementation Details#

  • The actual implementation is in the _TorchLLM class (llm.py)

  • Public methods (not starting with _) are automatically exposed as APIs

Adding a New Method#

Follow these steps to add a new API method:

1. Implement the method in _TorchLLM#

For non-committed APIs, use the @set_api_status decorator:

@set_api_status("beta")
def generate_with_streaming(
    self, 
    prompts: List[str], 
    **kwargs
) -> Iterator[GenerationOutput]:
    """Generate text with streaming output.
    
    Args:
        prompts: Input prompts for generation
        **kwargs: Additional generation parameters
        
    Returns:
        Iterator of generation outputs
    """
    # Implementation here
    pass

For committed APIs, no decorator is needed:

def generate(self, prompts: List[str], **kwargs) -> GenerationOutput:
    """Generate text from prompts."""
    # Implementation here
    pass

2. Update the API schema#

Add the method to the appropriate llm.yaml file:

Non-committed API (tests/unittest/api_stability/references/llm.yaml):

generate_with_streaming:
  status: beta  # Must match @set_api_status
  parameters:
    - name: prompts
      type: List[str]
    - name: kwargs
      type: dict
  returns: Iterator[GenerationOutput]

Committed API (tests/unittest/api_stability/references_committed/llm.yaml):

generate:
  parameters:
    - name: prompts
      type: List[str]
    - name: kwargs
      type: dict
  returns: GenerationOutput

Modifying Existing Methods#

When modifying existing methods:

  1. Non-breaking changes (adding optional parameters):

    • Update the method signature

    • Update the schema file

    • No status change needed

  2. Breaking changes (changing required parameters, return types):

    • Only allowed for non-committed APIs

    • Consider deprecation path for beta APIs

    • Update documentation with migration guide

Best Practices#

  1. Documentation: Always include comprehensive docstrings

  2. Type hints: Use proper type annotations for all parameters and returns

  3. Testing: Add unit tests for new methods

  4. Examples: Provide usage examples in the docstring

  5. Validation: Run API stability tests before submitting changes

Running Tests#

Validate your changes:

# Run API stability tests
python -m pytest tests/unittest/api_stability/

# Run specific test for LLM API
python -m pytest tests/unittest/api_stability/test_llm_api.py -v

Common Workflows#

Promoting an API from Beta to Committed#

  1. Remove the @set_api_status("beta") decorator from the method

  2. Move the schema entry from tests/unittest/api_stability/references/ to tests/unittest/api_stability/references_committed/

  3. Remove the status field from the schema

  4. Update any documentation referring to the API’s beta status

Deprecating an API#

  1. Add @set_api_status("deprecated") to the method

  2. Update the schema with status: deprecated

  3. Add deprecation warning in the method:

    import warnings
    warnings.warn(
        "This method is deprecated and will be removed in v2.0. "
        "Use new_method() instead.",
        DeprecationWarning,
        stacklevel=2
    )
    
  4. Document the migration path