API Reference — TensorRT-LLM

class tensorrt_llm.llmapi.LLM(

model: str | Path,

tokenizer: str | Path | TokenizerBase | PreTrainedTokenizerBase | None = None,

tokenizer_mode: Literal['auto', 'slow'] = 'auto',

skip_tokenizer_init: bool = False,

trust_remote_code: bool = False,

tensor_parallel_size: int = 1,

dtype: str = 'auto',

revision: str | None = None,

tokenizer_revision: str | None = None,

**kwargs: Any,

)[source]#

Bases: _TorchLLM

LLM class is the main class for running a LLM model.

For more details about the arguments, please refer to TorchLlmArgs.

Parameters:

model (Union[str, pathlib.Path]) – The path to the model checkpoint or the model name from the Hugging Face Hub.
tokenizer (Union[str, pathlib.Path, transformers.tokenization_utils_base.PreTrainedTokenizerBase, tensorrt_llm.llmapi.tokenizer.TokenizerBase, NoneType]) – The path to the tokenizer checkpoint or the tokenizer name from the Hugging Face Hub. Defaults to None.
tokenizer_mode (Literal['auto', 'slow']) – The mode to initialize the tokenizer. Defaults to auto.
skip_tokenizer_init (bool) – Whether to skip the tokenizer initialization. Defaults to False.
trust_remote_code (bool) – Whether to trust the remote code. Defaults to False.
tensor_parallel_size (int) – The tensor parallel size. Defaults to 1.
dtype (str) – The data type to use for the model. Defaults to auto.
revision (Optional[str]) – The revision to use for the model. Defaults to None.
tokenizer_revision (Optional[str]) – The revision to use for the tokenizer. Defaults to None.
pipeline_parallel_size (int) – The pipeline parallel size. Defaults to 1.
context_parallel_size (int) – The context parallel size. Defaults to 1.
gpus_per_node (Optional[int]) – The number of GPUs per node. Defaults to None.
moe_cluster_parallel_size (Optional[int]) – The cluster parallel size for MoE models’s expert weights. Defaults to None.
moe_tensor_parallel_size (Optional[int]) – The tensor parallel size for MoE models’s expert weights. Defaults to None.
moe_expert_parallel_size (Optional[int]) – The expert parallel size for MoE models’s expert weights. Defaults to None.
enable_attention_dp (bool) – Enable attention data parallel. Defaults to False.
cp_config (Optional[dict]) – Context parallel config. Defaults to None.
load_format (Union[str, tensorrt_llm.llmapi.llm_args.LoadFormat]) – How to load the model weights. By default, detect the weight type from the model checkpoint. Defaults to 0.
fail_fast_on_attention_window_too_large (bool) – Fail fast when attention window is too large to fit even a single sequence in the KV cache. Defaults to False.
enable_lora (bool) – Enable LoRA. Defaults to False.
lora_config (Optional[tensorrt_llm.lora_helper.LoraConfig]) – LoRA configuration for the model. Defaults to None.
kv_cache_config (tensorrt_llm.llmapi.llm_args.KvCacheConfig) – KV cache config. Defaults to None.
enable_chunked_prefill (bool) – Enable chunked prefill. Defaults to False.
guided_decoding_backend (Optional[Literal['xgrammar', 'llguidance']]) – Guided decoding backend. llguidance is supported in PyTorch backend only. Defaults to None.
batched_logits_processor (Optional[tensorrt_llm.sampling_params.BatchedLogitsProcessor]) – Batched logits processor. Defaults to None.
iter_stats_max_iterations (Optional[int]) – The maximum number of iterations for iter stats. Defaults to None.
request_stats_max_iterations (Optional[int]) – The maximum number of iterations for request stats. Defaults to None.
peft_cache_config (Optional[tensorrt_llm.llmapi.llm_args.PeftCacheConfig]) – PEFT cache config. Defaults to None.
scheduler_config (tensorrt_llm.llmapi.llm_args.SchedulerConfig) – Scheduler config. Defaults to None.
cache_transceiver_config (Optional[tensorrt_llm.llmapi.llm_args.CacheTransceiverConfig]) – Cache transceiver config. Defaults to None.
speculative_config (Union[tensorrt_llm.llmapi.llm_args.DraftTargetDecodingConfig, tensorrt_llm.llmapi.llm_args.EagleDecodingConfig, tensorrt_llm.llmapi.llm_args.LookaheadDecodingConfig, tensorrt_llm.llmapi.llm_args.MedusaDecodingConfig, tensorrt_llm.llmapi.llm_args.MTPDecodingConfig, tensorrt_llm.llmapi.llm_args.NGramDecodingConfig, tensorrt_llm.llmapi.llm_args.UserProvidedDecodingConfig, tensorrt_llm.llmapi.llm_args.AutoDecodingConfig, NoneType]) – Speculative decoding config. Defaults to None.
max_batch_size (Optional[int]) – The maximum batch size. Defaults to None.
max_input_len (Optional[int]) – The maximum input length. Defaults to None.
max_seq_len (Optional[int]) – The maximum sequence length. Defaults to None.
max_beam_width (Optional[int]) – The maximum beam width. Defaults to None.
max_num_tokens (Optional[int]) – The maximum number of tokens. Defaults to None.
gather_generation_logits (bool) – Gather generation logits. Defaults to False.
num_postprocess_workers (int) – The number of processes used for postprocessing the generated tokens, including detokenization. Defaults to 0.
postprocess_tokenizer_dir (Optional[str]) – The path to the tokenizer directory for postprocessing. Defaults to None.
reasoning_parser (Optional[str]) – The parser to separate reasoning content from output. Defaults to None.
return_perf_metrics (bool) – Return perf metrics. Defaults to False.
garbage_collection_gen0_threshold (int) – Threshold for Python garbage collection of generation 0 objects.Lower values trigger more frequent garbage collection. Defaults to 20000.
cuda_graph_config (Optional[tensorrt_llm.llmapi.llm_args.CudaGraphConfig]) – CUDA graph config.If true, use CUDA graphs for decoding. CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, and are enabled for batches that consist of decoding requests only (the reason is that it’s hard to capture a single graph with prefill requests since the input shapes are a function of the sequence lengths). Note that each CUDA graph can use up to 200 MB of extra memory. Defaults to None.
attention_dp_config (Optional[tensorrt_llm.llmapi.llm_args.AttentionDpConfig]) – Optimized load-balancing for the DP Attention scheduler. Defaults to None.
disable_overlap_scheduler (bool) – Disable the overlap scheduler. Defaults to False.
moe_config (tensorrt_llm.llmapi.llm_args.MoeConfig) – MoE config. Defaults to None.
attn_backend (str) – Attention backend to use. Defaults to TRTLLM.
enable_mixed_sampler (bool) – If true, will iterate over sampling_params of each request and use the corresponding sampling strategy, e.g. top-k, top-p, etc. Defaults to False.
sampler_type (Union[str, tensorrt_llm.llmapi.llm_args.SamplerType]) – The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless BeamSearch is requested. Defaults to auto.
enable_iter_perf_stats (bool) – Enable iteration performance statistics. Defaults to False.
enable_iter_req_stats (bool) – If true, enables per request stats per iteration. Must also set enable_iter_perf_stats to true to get request stats. Defaults to False.
print_iter_log (bool) – Print iteration logs. Defaults to False.
batch_wait_timeout_ms (float) – If greater than 0, the request queue might wait up to batch_wait_timeout_ms to receive max_batch_size requests, if fewer than max_batch_size requests are currently available. If 0, no waiting occurs. Defaults to 0.
torch_compile_config (Optional[tensorrt_llm.llmapi.llm_args.TorchCompileConfig]) – Torch compile config. Defaults to None.
enable_autotuner (bool) – Enable autotuner only when torch compile is enabled. Defaults to True.
enable_layerwise_nvtx_marker (bool) – If true, enable layerwise nvtx marker. Defaults to False.
enable_min_latency (bool) – If true, enable min-latency mode. Currently only used for Llama4. Defaults to False.
stream_interval (int) – The iteration interval to create responses under the streaming mode. Set this to a larger value when the batch size is large, which helps reduce the streaming overhead. Defaults to 1.
force_dynamic_quantization (bool) – If true, force dynamic quantization. Defaults to False. Defaults to False.
allreduce_strategy (Optional[Literal['AUTO', 'NCCL', 'UB', 'MINLATENCY', 'ONESHOT', 'TWOSHOT', 'LOWPRECISION', 'MNNVL', 'NCCL_SYMMETRIC']]) – Allreduce strategy to use. Defaults to AUTO.
checkpoint_loader (Optional[tensorrt_llm._torch.models.checkpoints.BaseCheckpointLoader]) – The checkpoint loader to use for this LLM instance. Defaults to None.
checkpoint_format (Optional[str]) – The format of the provided checkpoint. Defaults to None.

tokenizer#

The tokenizer loaded by LLM instance, if any.

Type:: tensorrt_llm.llmapi.tokenizer.TokenizerBase, optional

llm_id#

The unique ID of the LLM instance.

Type:: str

__init__(

model: str | Path,

tokenizer: str | Path | TokenizerBase | PreTrainedTokenizerBase | None = None,

tokenizer_mode: Literal['auto', 'slow'] = 'auto',

skip_tokenizer_init: bool = False,

trust_remote_code: bool = False,

tensor_parallel_size: int = 1,

dtype: str = 'auto',

revision: str | None = None,

tokenizer_revision: str | None = None,

**kwargs: Any,

) → None[source]#

Generate output for the given prompts in the synchronous mode. Synchronous generation accepts either single prompt or batched prompts.

Parameters:

inputs (tensorrt_llm.inputs.data.PromptInputs, Sequence[tensorrt_llm.inputs.data.PromptInputs]) – The prompt text or token ids. It can be single prompt or batched prompts.
sampling_params (tensorrt_llm.sampling_params.SamplingParams, List[tensorrt_llm.sampling_params.SamplingParams], optional) – The sampling params for the generation. Defaults to None. A default one will be used if not provided.
use_tqdm (bool) – Whether to use tqdm to display the progress bar. Defaults to True.
lora_request (tensorrt_llm.executor.request.LoRARequest, Sequence[tensorrt_llm.executor.request.LoRARequest], optional) – LoRA request to use for generation, if any. Defaults to None.
prompt_adapter_request (tensorrt_llm.executor.request.PromptAdapterRequest, Sequence[tensorrt_llm.executor.request.PromptAdapterRequest], optional) – Prompt Adapter request to use for generation, if any. Defaults to None.
kv_cache_retention_config (tensorrt_llm.bindings.executor.KvCacheRetentionConfig, Sequence[tensorrt_llm.bindings.executor.KvCacheRetentionConfig], optional) – Configuration for the request’s retention in the KV Cache. Defaults to None.
disaggregated_params (tensorrt_llm.disaggregated_params.DisaggregatedParams, Sequence[tensorrt_llm.disaggregated_params.DisaggregatedParams], optional) – Disaggregated parameters. Defaults to None.
scheduling_params (tensorrt_llm.scheduling_params.SchedulingParams, List[tensorrt_llm.scheduling_params.SchedulingParams], optional) – Scheduling parameters. Defaults to None.

Returns:

The output data of the completion request to the LLM.

Return type:

Union[tensorrt_llm.llmapi.RequestOutput, List[tensorrt_llm.llmapi.RequestOutput]]

generate_async( inputs: str | List[int] | TextPrompt | TokensPrompt, sampling_params: SamplingParams | None = None, lora_request: LoRARequest | None = None, prompt_adapter_request: PromptAdapterRequest | None = None, streaming: bool = False, kv_cache_retention_config: KvCacheRetentionConfig | None = None, disaggregated_params: DisaggregatedParams | None = None, _postproc_params: PostprocParams | None = None, scheduling_params: SchedulingParams | None = None, ) → RequestOutput#

Generate output for the given prompt in the asynchronous mode. Asynchronous generation accepts single prompt only.

Parameters:

inputs (tensorrt_llm.inputs.data.PromptInputs) – The prompt text or token ids; it must be single prompt.
sampling_params (tensorrt_llm.sampling_params.SamplingParams, optional) – The sampling params for the generation. Defaults to None. A default one will be used if not provided.
lora_request (tensorrt_llm.executor.request.LoRARequest, optional) – LoRA request to use for generation, if any. Defaults to None.
prompt_adapter_request (tensorrt_llm.executor.request.PromptAdapterRequest, optional) – Prompt Adapter request to use for generation, if any. Defaults to None.
streaming (bool) – Whether to use the streaming mode for the generation. Defaults to False.
kv_cache_retention_config (tensorrt_llm.bindings.executor.KvCacheRetentionConfig, optional) – Configuration for the request’s retention in the KV Cache. Defaults to None.
disaggregated_params (tensorrt_llm.disaggregated_params.DisaggregatedParams, optional) – Disaggregated parameters. Defaults to None.
scheduling_params (tensorrt_llm.scheduling_params.SchedulingParams, optional) – Scheduling parameters. Defaults to None.

Returns:

The output data of the completion request to the LLM.

Return type:

tensorrt_llm.llmapi.RequestOutput

get_kv_cache_events( timeout: float | None = 2, ) → List[dict]#

beta Get iteration KV events from the runtime.

KV events are used to track changes and operations within the KV Cache. Types of events:

KVCacheCreatedData: Indicates the creation of cache blocks.
KVCacheStoredData: Represents a sequence of stored blocks.
KVCacheRemovedData: Contains the hashes of blocks that are being removed from the cache.
KVCacheUpdatedData: Captures updates to existing cache blocks.

To enable KV events:

set event_buffer_max_size to a positive integer in the KvCacheConfig.
set enable_block_reuse to True in the KvCacheConfig.

Parameters:: timeout (float, optional) – Max wait time in seconds when retrieving events from queue. Defaults to 2.
Returns:: A list of runtime events as dict.
Return type:: List[dict]

get_kv_cache_events_async( timeout: float | None = 2, ) → IterationResult#

beta Get iteration KV events from the runtime.

KV events are used to track changes and operations within the KV Cache. Types of events:

KVCacheCreatedData: Indicates the creation of cache blocks.
KVCacheStoredData: Represents a sequence of stored blocks.
KVCacheRemovedData: Contains the hashes of blocks that are being removed from the cache.
KVCacheUpdatedData: Captures updates to existing cache blocks.

To enable KV events:

set event_buffer_max_size to a positive integer in the KvCacheConfig.
set enable_block_reuse to True in the KvCacheConfig.

Parameters:: timeout (float, optional) – Max wait time in seconds when retrieving events from queue. . Defaults to 2.
Returns:: An async iterable object containing runtime events.
Return type:: tensorrt_llm.executor.result.IterationResult

get_stats(timeout: float | None = 2) → List[dict]#

beta Get iteration statistics from the runtime. To collect statistics, call this function after prompts have been submitted with LLM().generate().

Parameters:

timeout (float, optional) – Max wait time in seconds when retrieving stats from queue. Defaults to 2.

Returns:

A list of runtime stats as dict.: e.g., [‘{“cpuMemUsage”: …, “iter”: 0, …}’, ‘{“cpuMemUsage”: …, “iter”: 1, …}’]

Return type:

List[dict]

get_stats_async( timeout: float | None = 2, ) → IterationResult#

beta Get iteration statistics from the runtime. To collect statistics, you can call this function in an async coroutine or the /metrics endpoint (if you’re using trtllm-serve) after prompts have been submitted.

Parameters:: timeout (float, optional) – Max wait time in seconds when retrieving stats from queue. Defaults to 2.
Returns:: An async iterable object containing runtime stats.
Return type:: tensorrt_llm.executor.result.IterationResult

shutdown() → None#: beta None

property llm_id: str#: beta None

property tokenizer: TokenizerBase | None#

class tensorrt_llm.llmapi.CompletionOutput(

index: int,

text: str = '',

token_ids: ~typing.List[int] | None = <factory>,

cumulative_logprob: float | None = None,

logprobs: list[dict[int,

~tensorrt_llm.executor.result.Logprob]] | None = <factory>,

prompt_logprobs: list[dict[int,

~tensorrt_llm.executor.result.Logprob]] | None = <factory>,

finish_reason: ~typing.Literal['stop',

'length',

'timeout',

'cancelled'] | None = None,

stop_reason: int | str | None = None,

generation_logits: ~torch.Tensor | None = None,

disaggregated_params: ~tensorrt_llm.disaggregated_params.DisaggregatedParams | None = None,

request_perf_metrics: ~tensorrt_llm.bindings.executor.RequestPerfMetrics | None = None,

_postprocess_result: ~typing.Any = None,

)[source]#

Bases: object

The output data of one completion output of a request.

Parameters:

index (int) – The index of the output in the request.
text (str) – The generated output text. Defaults to “”.
token_ids (List[int], optional) – The token ids of the generated output text. Defaults to [].
cumulative_logprob (float, optional) – The cumulative log probability of the generated output text. Defaults to None.
logprobs (TokenLogprobs, optional) – The log probabilities of the top probability words at each position if the logprobs are requested. Defaults to None.
prompt_logprobs (TokenLogprobs, optional) – The log probabilities per prompt token. Defaults to None.
finish_reason (Literal['stop', 'length', 'timeout', 'cancelled'], optional) – The reason why the sequence is finished. Defaults to None.
stop_reason (int, str, optional) – The stop string or token id that caused the completion to stop, None if the completion finished for some other reason. Defaults to None.
generation_logits (torch.Tensor, optional) – The logits on the generated output token ids. Defaults to None.
disaggregated_params (tensorrt_llm.disaggregated_params.DisaggregatedParams, optional) – Parameters needed for disaggregated serving. Includes the type of request, the first generated tokens, the context request id and the any additional state needing to be transferred from context and generation instances. Defaults to None.
request_perf_metrics (tensorrt_llm.bindings.executor.RequestPerfMetrics, optional) – Performance metrics for the request. Defaults to None.

length#

The number of generated tokens.

Type:: int

token_ids_diff#

Newly generated token ids.

Type:: List[int]

logprobs_diff#

Logprobs of newly generated tokens.

Type:: List[float]

text_diff#

Newly generated tokens.

Type:: str

__init__( index: int, text: str = '', token_ids: ~typing.List[int] | None = <factory>, cumulative_logprob: float | None = None, logprobs: list[dict[int, ~tensorrt_llm.executor.result.Logprob]] | None = <factory>, prompt_logprobs: list[dict[int, ~tensorrt_llm.executor.result.Logprob]] | None = <factory>, finish_reason: ~typing.Literal['stop', 'length', 'timeout', 'cancelled'] | None = None, stop_reason: int | str | None = None, generation_logits: ~torch.Tensor | None = None, disaggregated_params: ~tensorrt_llm.disaggregated_params.DisaggregatedParams | None = None, request_perf_metrics: ~tensorrt_llm.bindings.executor.RequestPerfMetrics | None = None, _postprocess_result: ~typing.Any = None, ) → None#

cumulative_logprob: float | None#

disaggregated_params: DisaggregatedParams | None#

finish_reason: Literal['stop', 'length', 'timeout', 'cancelled'] | None#

generation_logits: Tensor | None#

index: int#

property length: int#

logprobs: list[dict[int, Logprob]] | None#

property logprobs_diff: List[float]#

prompt_logprobs: list[dict[int, Logprob]] | None#

request_perf_metrics: RequestPerfMetrics | None#

stop_reason: int | str | None#

text: str#

property text_diff: str#

token_ids: List[int] | None#

property token_ids_diff: List[int]#

class tensorrt_llm.llmapi.RequestOutput[source]#

Bases: DetokenizedGenerationResultBase, GenerationResult

The output data of a completion request to the LLM.

request_id#

The unique ID of the request.

Type:: int

prompt#

The prompt string of the request.

Type:: str, optional

prompt_token_ids#

The token ids of the prompt.

Type:: List[int]

outputs#

The output sequences of the request.

Type:: List[CompletionOutput]

context_logits#

The logits on the prompt token ids.

Type:: torch.Tensor, optional

finished#

Whether the whole request is finished.

Type:: bool

class PostprocWorker( pull_pipe_addr: tuple[str, bytes | None], push_pipe_addr: tuple[str, bytes | None], tokenizer_dir: str, record_creator: Callable[[Input, TransformersTokenizer], Any], )#

Bases: object

The worker to postprocess the responses from the executor’s await_response.

class Input( rsp: ForwardRef('tllm.Response') | ForwardRef('ResponseWrapper'), sampling_params: tensorrt_llm.sampling_params.SamplingParams | None = None, postproc_params: tensorrt_llm.executor.postproc_worker.PostprocParams | None = None, streaming: bool | None = None, )#

Bases: object

__init__( rsp: tllm.Response | ResponseWrapper, sampling_params: SamplingParams | None = None, postproc_params: PostprocParams | None = None, streaming: bool | None = None, ) → None#

postproc_params: PostprocParams | None = None#

rsp: tllm.Response | ResponseWrapper#

sampling_params: SamplingParams | None = None#

streaming: bool | None = None#

class Output( client_id, res, is_final, error, metrics, )#

Bases: NamedTuple

count(value, /)#: Return number of occurrences of value.

index( value, start=0, stop=9223372036854775807, /, )#

Return first index of value.

Raises ValueError if the value is not present.

client_id: int#: Alias for field number 0

error: str#: Alias for field number 3

is_final: bool#: Alias for field number 2

metrics: dict[str, float] | None#: Alias for field number 4

res: Any#: Alias for field number 1

__init__( pull_pipe_addr: tuple[str, bytes | None], push_pipe_addr: tuple[str, bytes | None], tokenizer_dir: str, record_creator: Callable[[Input, TransformersTokenizer], Any], )#

Parameters:

pull_pipe_addr (tuple[str, Optional[bytes]]) – The address and HMAC key of the input IPC.
push_pipe_addr (tuple[str, Optional[bytes]]) – The address and HMAC key of the output IPC.
tokenizer_dir (str) – The directory to load tokenizer.
record_creator (Callable[["ResponsePostprocessWorker.Input"], Any]) – A creator for creating a record for a request.
result_handler (Optional[Callable[[GenerationResultBase], Any]]) – A callback handles the final result.

static default_record_creator( inp: PostprocWorker.Input, tokenizer: TransformersTokenizer, ) → DetokenizedGenerationResultBase#

start()#: Start the workflow in the current thread.

__init__() → None[source]#

abort() → None#: Abort the generation request.

aborted() → bool#

Return whether the generation request is aborted.

Returns:: whether the generation request is aborted.
Return type:: bool

async aresult() → GenerationResult#

Wait for the completion of the request, and return the result.

Returns:: generation result.
Return type:: tensorrt_llm.executor.result.GenerationResult

clear_logprob_params() → None#

record_stats( output: CompletionOutput, stats: dict[str, float] | None = None, ) → None#

Record the stats of the generation result.

Parameters:

output (CompletionOutput) – The output of the generation result.
stats (Optional[dict[str, float]]) – The stats of the generation result. Defaults to None.

result( timeout: float | None = None, ) → GenerationResult#

Wait for the completion of the request, and return the result.

Parameters:: timeout (float, optional) – Timeout. Defaults to None.
Returns:: generation result.
Return type:: tensorrt_llm.executor.result.GenerationResult

property context_logits: Tensor | None#

property finished: bool#

property outputs: List[CompletionOutput]#

property prompt: str | None#

property prompt_token_ids: List[int]#

property request_id: int#

class tensorrt_llm.llmapi.GuidedDecodingParams(

*,

json: str | BaseModel | dict | None = None,

regex: str | None = None,

grammar: str | None = None,

json_object: bool = False,

structural_tag: str | None = None,

)[source]#

Bases: object

Guided decoding parameters for text generation. Only one of the fields could be effective.

Parameters:

json (str, pydantic.main.BaseModel, dict, optional) – The generated text is amenable to json format with additional user-specified restrictions, namely schema. Defaults to None.
regex (str, optional) – The generated text is amenable to the user-specified regular expression. Defaults to None.
grammar (str, optional) – The generated text is amenable to the user-specified extended Backus-Naur form (EBNF) grammar. Defaults to None.
json_object (bool) – If True, the generated text is amenable to json format. Defaults to False.
structural_tag (str, optional) – The generated text is amenable to the user-specified structural tag. Structural tag is supported by xgrammar backend only. Defaults to None.

__init__( *, json: str | BaseModel | dict | None = None, regex: str | None = None, grammar: str | None = None, json_object: bool = False, structural_tag: str | None = None, ) → None#

grammar: str | None#

json: str | BaseModel | dict | None#

json_object: bool#

regex: str | None#

structural_tag: str | None#

class tensorrt_llm.llmapi.SamplingParams(

*,

end_id: int | None = None,

pad_id: int | None = None,

max_tokens: int = 32,

bad: str | List[str] | None = None,

bad_token_ids: List[int] | None = None,

stop: str | List[str] | None = None,

stop_token_ids: List[int] | None = None,

include_stop_str_in_output: bool = False,

embedding_bias: Tensor | None = None,

logits_processor: LogitsProcessor | List[LogitsProcessor] | None = None,

apply_batched_logits_processor: bool = False,

n: int = 1,

best_of: int | None = None,

use_beam_search: bool = False,

top_k: int | None = None,

top_p: float | None = None,

top_p_min: float | None = None,

top_p_reset_ids: int | None = None,

top_p_decay: float | None = None,

seed: int | None = None,

temperature: float | None = None,

min_tokens: int | None = None,

beam_search_diversity_rate: float | None = None,

repetition_penalty: float | None = None,

presence_penalty: float | None = None,

frequency_penalty: float | None = None,

length_penalty: float | None = None,

early_stopping: int | None = None,

no_repeat_ngram_size: int | None = None,

min_p: float | None = None,

beam_width_array: List[int] | None = None,

logprobs: int | None = None,

prompt_logprobs: int | None = None,

return_context_logits: bool = False,

return_generation_logits: bool = False,

exclude_input_from_output: bool = True,

return_encoder_output: bool = False,

return_perf_metrics: bool = False,

additional_model_outputs: List[AdditionalModelOutput] | None = None,

_context_logits_auto_enabled: bool = False,

_generation_logits_auto_enabled: bool = False,

_return_log_probs: bool = False,

lookahead_config: LookaheadDecodingConfig | None = None,

guided_decoding: GuidedDecodingParams | None = None,

ignore_eos: bool = False,

detokenize: bool = True,

add_special_tokens: bool = True,

truncate_prompt_tokens: int | None = None,

skip_special_tokens: bool = True,

spaces_between_special_tokens: bool = True,

)[source]#

Bases: object

Sampling parameters for text generation.

Usage Examples:

use_beam_search is False:

best_of is None: (top-p/top-k) sampling n responses and return n generations

best_of is not None: (top-p/top-k) sampling best_of responses and return n generations (best_of >= n must hold)

use_beam_search is True:

best_of is None: beam search with beam width of n, return n generations

best_of is not None: beam search with beam width of best_of, return n generations (best_of >= n must hold)

Parameters:

end_id (int, optional) – The end token id. Defaults to None.
pad_id (int, optional) – The pad token id. Defaults to None.
max_tokens (int) – The maximum number of tokens to generate. Defaults to 32.
bad (str, List[str], optional) – A string or a list of strings that redirect the generation when they are generated, so that the bad strings are excluded from the returned output. Defaults to None.
bad_token_ids (List[int], optional) – A list of token ids that redirect the generation when they are generated, so that the bad ids are excluded from the returned output. Defaults to None.
stop (str, List[str], optional) – A string or a list of strings that stop the generation when they are generated. The returned output will not contain the stop strings unless include_stop_str_in_output is True. Defaults to None.
stop_token_ids (List[int], optional) – A list of token ids that stop the generation when they are generated. Defaults to None.
include_stop_str_in_output (bool) – Whether to include the stop strings in output text. Defaults to False.
embedding_bias (torch.Tensor, optional) – The embedding bias tensor. Expected type is kFP32 and shape is [vocab_size]. Defaults to None.
logits_processor (tensorrt_llm.sampling_params.LogitsProcessor, List[tensorrt_llm.sampling_params.LogitsProcessor], optional) – The logits postprocessor callback(s). Defaults to None. If a list, each processor is applied in order during generation (supported in PyTorch backend only).
apply_batched_logits_processor (bool) – Whether to apply batched logits postprocessor callback. Defaults to False. The BatchedLogitsProcessor class is recommended for callback creation. The callback must be provided when initializing LLM.
n (int) – Number of sequences to generate. Defaults to 1.
best_of (int, optional) – Number of sequences to consider for best output. Defaults to None.
use_beam_search (bool) – Whether to use beam search. Defaults to False.
top_k (int, optional) – Controls number of logits to sample from. None means using C++ runtime default 0, i.e., all logits. Defaults to None.
top_p (float, optional) – Controls the top-P probability to sample from. None means using C++ runtime default 0.f. Defaults to None.
top_p_min (float, optional) – Controls decay in the top-P algorithm. topPMin is lower-bound. None means using C++ runtime default 1.e-6. Defaults to None.
top_p_reset_ids (int, optional) – Controls decay in the top-P algorithm. Indicates where to reset the decay. None means using C++ runtime default 1. Defaults to None.
top_p_decay (float, optional) – Controls decay in the top-P algorithm. The decay value. None means using C++ runtime default 1.f. Defaults to None.
seed (int, optional) – Controls the random seed used by the random number generator in sampling. None means using C++ runtime default 0. Defaults to None.
temperature (float, optional) – Controls the modulation of logits when sampling new tokens. It can have values > 0.f. None means using C++ runtime default 1.0f. Defaults to None.
min_tokens (int, optional) – Lower bound on the number of tokens to generate. Values < 1 have no effect. None means using C++ runtime default 1. Defaults to None.
beam_search_diversity_rate (float, optional) – Used to penalize tokens based on how often they appear in the sequence. It can have any value > 0.f. Values < 1.f encourages repetition, values > 1.f discourages it. None means using C++ runtime default 1.f. Defaults to None.
repetition_penalty (float, optional) – Used to penalize tokens based on how often they appear in the sequence. It can have any value > 0.f. Values < 1.f encourages repetition, values > 1.f discourages it. None means using C++ runtime default 1.f. Defaults to None.
presence_penalty (float, optional) – Used to penalize tokens already present in the sequence (irrespective of the number of appearances). It can have any values. Values < 0.f encourage repetition, values > 0.f discourage it. None means using C++ runtime default 0.f. Defaults to None.
frequency_penalty (float, optional) – Used to penalize tokens already present in the sequence (dependent on the number of appearances). It can have any values. Values < 0.f encourage repetition, values > 0.f discourage it. None means using C++ runtime default 0.f. Defaults to None.
length_penalty (float, optional) – Controls how to penalize longer sequences in beam search. None means using C++ runtime default 0.f. Defaults to None.
early_stopping (int, optional) – Controls whether the generation process finishes once beamWidth sentences are generated (ends with end_token). None means using C++ runtime default 1. Defaults to None.
no_repeat_ngram_size (int, optional) – Controls how many repeat ngram size are acceptable. None means using C++ runtime default 1 << 30. Defaults to None.
min_p (float, optional) – scale the most likely token to determine the minimum token probability. None means using C++ runtime default 0.0. Defaults to None.
beam_width_array (List[int], optional) – The array of beam width using in Variable-Beam-Width-Search. Defaults to None.
logprobs (int, optional) – Number of log probabilities to return per output token. Defaults to None.
prompt_logprobs (int, optional) – Number of log probabilities to return per prompt token. Defaults to None.
return_context_logits (bool) – Controls if Result should contain the context logits. Defaults to False.
return_generation_logits (bool) – Controls if Result should contain the generation logits. Defaults to False.
exclude_input_from_output (bool) – Controls if output tokens in Result should include the input tokens. Defaults to True.
return_encoder_output (bool) – Controls if Result should contain encoder output hidden states (for encoder-only and encoder-decoder models). Defaults to False.
return_perf_metrics (bool) – Controls if Result should contain the performance metrics for this request. Defaults to False.
additional_model_outputs (List[tensorrt_llm.sampling_params.AdditionalModelOutput], optional) – The additional outputs to gather from the model. Defaults to None.
lookahead_config (tensorrt_llm.bindings.executor.LookaheadDecodingConfig , optional) – Lookahead decoding config. Defaults to None.
guided_decoding (tensorrt_llm.sampling_params.GuidedDecodingParams, optional) – Guided decoding params. Defaults to None.
ignore_eos (bool) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. Defaults to False.
detokenize (bool) – Whether to detokenize the output. Defaults to True.
add_special_tokens (bool) – Whether to add special tokens to the prompt. Defaults to True.
truncate_prompt_tokens (int, optional) – If set to an integer k, will use only the last k tokens from the prompt (i.e., left truncation). Defaults to None.
skip_special_tokens (bool) – Whether to skip special tokens in the output. Defaults to True.
spaces_between_special_tokens (bool) – Whether to add spaces between special tokens in the output. Defaults to True.

__init__( *, end_id: int | None = None, pad_id: int | None = None, max_tokens: int = 32, bad: str | List[str] | None = None, bad_token_ids: List[int] | None = None, stop: str | List[str] | None = None, stop_token_ids: List[int] | None = None, include_stop_str_in_output: bool = False, embedding_bias: Tensor | None = None, logits_processor: LogitsProcessor | List[LogitsProcessor] | None = None, apply_batched_logits_processor: bool = False, n: int = 1, best_of: int | None = None, use_beam_search: bool = False, top_k: int | None = None, top_p: float | None = None, top_p_min: float | None = None, top_p_reset_ids: int | None = None, top_p_decay: float | None = None, seed: int | None = None, temperature: float | None = None, min_tokens: int | None = None, beam_search_diversity_rate: float | None = None, repetition_penalty: float | None = None, presence_penalty: float | None = None, frequency_penalty: float | None = None, length_penalty: float | None = None, early_stopping: int | None = None, no_repeat_ngram_size: int | None = None, min_p: float | None = None, beam_width_array: List[int] | None = None, logprobs: int | None = None, prompt_logprobs: int | None = None, return_context_logits: bool = False, return_generation_logits: bool = False, exclude_input_from_output: bool = True, return_encoder_output: bool = False, return_perf_metrics: bool = False, additional_model_outputs: List[AdditionalModelOutput] | None = None, _context_logits_auto_enabled: bool = False, _generation_logits_auto_enabled: bool = False, _return_log_probs: bool = False, lookahead_config: LookaheadDecodingConfig | None = None, guided_decoding: GuidedDecodingParams | None = None, ignore_eos: bool = False, detokenize: bool = True, add_special_tokens: bool = True, truncate_prompt_tokens: int | None = None, skip_special_tokens: bool = True, spaces_between_special_tokens: bool = True, ) → None#

add_special_tokens: bool#

additional_model_outputs: List[AdditionalModelOutput] | None#

apply_batched_logits_processor: bool#

bad: str | List[str] | None#

bad_token_ids: List[int] | None#

beam_search_diversity_rate: float | None#

beam_width_array: List[int] | None#

best_of: int | None#

detokenize: bool#

early_stopping: int | None#

embedding_bias: Tensor | None#

end_id: int | None#

exclude_input_from_output: bool#

frequency_penalty: float | None#

guided_decoding: GuidedDecodingParams | None#

ignore_eos: bool#

include_stop_str_in_output: bool#

length_penalty: float | None#

logits_processor: LogitsProcessor | List[LogitsProcessor] | None#

logprobs: int | None#

lookahead_config: LookaheadDecodingConfig | None#

max_tokens: int#

min_p: float | None#

min_tokens: int | None#

n: int#

no_repeat_ngram_size: int | None#

pad_id: int | None#

presence_penalty: float | None#

prompt_logprobs: int | None#

repetition_penalty: float | None#

return_context_logits: bool#

return_encoder_output: bool#

return_generation_logits: bool#

return_perf_metrics: bool#

seed: int | None#

skip_special_tokens: bool#

spaces_between_special_tokens: bool#

stop: str | List[str] | None#

stop_token_ids: List[int] | None#

temperature: float | None#

top_k: int | None#

top_p: float | None#

top_p_decay: float | None#

top_p_min: float | None#

top_p_reset_ids: int | None#

truncate_prompt_tokens: int | None#

use_beam_search: bool#

class tensorrt_llm.llmapi.DisaggregatedParams(

*,

request_type: str | None = None,

first_gen_tokens: List[int] | None = None,

ctx_request_id: int | None = None,

opaque_state: bytes | None = None,

draft_tokens: List[int] | None = None,

)[source]#

Bases: object

Disaggregated serving parameters.

Parameters:

request_type (str) – The type of request (“context_only” | “generation_only” | “context_and_generation”)
first_gen_tokens (List[int]) – The first tokens of the generation request
ctx_request_id (int) – The context request id
opaque_state (bytes) – Any additional state needing to be exchanged between context and gen instances

__init__( *, request_type: str | None = None, first_gen_tokens: List[int] | None = None, ctx_request_id: int | None = None, opaque_state: bytes | None = None, draft_tokens: List[int] | None = None, ) → None#

get_context_phase_params() → ContextPhaseParams[source]#

get_request_type() → RequestType[source]#

ctx_request_id: int | None#

draft_tokens: List[int] | None#

first_gen_tokens: List[int] | None#

opaque_state: bytes | None#

request_type: str | None#

class tensorrt_llm.llmapi.KvCacheConfig(

*,

enable_block_reuse: bool = True,

max_tokens: int | None = None,

max_attention_window: List[int] | None = None,

sink_token_length: int | None = None,

free_gpu_memory_fraction: float | None = None,

host_cache_size: int | None = None,

onboard_blocks: bool = True,

cross_kv_cache_fraction: float | None = None,

secondary_offload_min_priority: int | None = None,

event_buffer_max_size: int = 0,

attention_dp_events_gather_period_ms: int = 5,

enable_partial_reuse: bool = True,

copy_on_partial_reuse: bool = True,

use_uvm: bool = False,

dtype: str = 'auto',

mamba_ssm_cache_dtype: Literal['auto', 'float16', 'bfloat16', 'float32'] = 'auto',

)[source]#

Bases: StrictBaseModel, PybindMirror

Configuration for the KV cache.

field attention_dp_events_gather_period_ms: int = 5#: The period in milliseconds to gather attention DP events across ranks.

field copy_on_partial_reuse: bool = True#: Whether partially matched blocks that are in use can be reused after copying them.

field cross_kv_cache_fraction: float | None = None#: The fraction of the KV Cache memory should be reserved for cross attention. If set to p, self attention will use 1-p of KV Cache memory and cross attention will use p of KV Cache memory. Default is 50%. Should only be set when using encoder-decoder model.

field dtype: str = 'auto'#: The data type to use for the KV cache.

field enable_block_reuse: bool = True#: Controls if KV cache blocks can be reused for different requests.

field enable_partial_reuse: bool = True#: Whether blocks that are only partially matched can be reused.

field event_buffer_max_size: int = 0#: Maximum size of the event buffer. If set to 0, the event buffer will not be used.

field free_gpu_memory_fraction: float | None = None#: The fraction of GPU memory fraction that should be allocated for the KV cache. Default is 90%. If both max_tokens and free_gpu_memory_fraction are specified, memory corresponding to the minimum will be used.

field host_cache_size: int | None = None#: Size of the host cache in bytes. If both max_tokens and host_cache_size are specified, memory corresponding to the minimum will be used.

field mamba_ssm_cache_dtype: Literal['auto', 'float16', 'bfloat16', 'float32'] = 'auto'#: The data type to use for the Mamba SSM cache. If set to ‘auto’, the data type will be inferred from the model config.

field max_attention_window: List[int] | None = None#: Size of the attention window for each sequence. Only the last tokens will be stored in the KV cache. If the number of elements in max_attention_window is less than the number of layers, max_attention_window will be repeated multiple times to the number of layers.

field max_tokens: int | None = None#: The maximum number of tokens that should be stored in the KV cache. If both max_tokens and free_gpu_memory_fraction are specified, memory corresponding to the minimum will be used.

field onboard_blocks: bool = True#: Controls if blocks are onboarded.

field secondary_offload_min_priority: int | None = None#: Only blocks with priority > mSecondaryOfflineMinPriority can be offloaded to secondary memory.

field sink_token_length: int | None = None#: Number of sink tokens (tokens to always keep in attention window).

field use_uvm: bool = False#: Whether to use UVM for the KV cache.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_orm(obj: Any) → Self#

classmethod from_pybind( pybind_instance: PybindMirror, ) → T#

Construct an instance of the given class from the fields in the given pybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydantic BaseModel
pybind_instance – Instance of the pybind class to construct from its fields

Notes

When a field value is None in the pybind class, but it’s not optional and has a default value in the BaseModel class, it would get the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the given pybind instance

static get_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

static get_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

static maybe_to_pybind(ins)#

static mirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

static mirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

static pybind_equals(obj0, obj1)#: Check if two pybind objects are equal.

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

classmethod update_forward_refs(**localns: Any) → None#

classmethod validate(value: Any) → Self#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'attention_dp_events_gather_period_ms': FieldInfo(annotation=int, required=False, default=5, description='The period in milliseconds to gather attention DP events across ranks.'), 'copy_on_partial_reuse': FieldInfo(annotation=bool, required=False, default=True, description='Whether partially matched blocks that are in use can be reused after copying them.'), 'cross_kv_cache_fraction': FieldInfo(annotation=Union[float, NoneType], required=False, default=None, description='The fraction of the KV Cache memory should be reserved for cross attention. If set to p, self attention will use 1-p of KV Cache memory and cross attention will use p of KV Cache memory. Default is 50%. Should only be set when using encoder-decoder model.'), 'dtype': FieldInfo(annotation=str, required=False, default='auto', description='The data type to use for the KV cache.'), 'enable_block_reuse': FieldInfo(annotation=bool, required=False, default=True, description='Controls if KV cache blocks can be reused for different requests.'), 'enable_partial_reuse': FieldInfo(annotation=bool, required=False, default=True, description='Whether blocks that are only partially matched can be reused.'), 'event_buffer_max_size': FieldInfo(annotation=int, required=False, default=0, description='Maximum size of the event buffer. If set to 0, the event buffer will not be used.'), 'free_gpu_memory_fraction': FieldInfo(annotation=Union[float, NoneType], required=False, default=None, description='The fraction of GPU memory fraction that should be allocated for the KV cache. Default is 90%. If both `max_tokens` and `free_gpu_memory_fraction` are specified, memory corresponding to the minimum will be used.'), 'host_cache_size': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, description='Size of the host cache in bytes. If both `max_tokens` and `host_cache_size` are specified, memory corresponding to the minimum will be used.'), 'mamba_ssm_cache_dtype': FieldInfo(annotation=Literal['auto', 'float16', 'bfloat16', 'float32'], required=False, default='auto', description="The data type to use for the Mamba SSM cache. If set to 'auto', the data type will be inferred from the model config."), 'max_attention_window': FieldInfo(annotation=Union[List[int], NoneType], required=False, default=None, description='Size of the attention window for each sequence. Only the last tokens will be stored in the KV cache. If the number of elements in `max_attention_window` is less than the number of layers, `max_attention_window` will be repeated multiple times to the number of layers.'), 'max_tokens': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, description='The maximum number of tokens that should be stored in the KV cache. If both `max_tokens` and `free_gpu_memory_fraction` are specified, memory corresponding to the minimum will be used.'), 'onboard_blocks': FieldInfo(annotation=bool, required=False, default=True, description='Controls if blocks are onboarded.'), 'secondary_offload_min_priority': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, description='Only blocks with priority > mSecondaryOfflineMinPriority can be offloaded to secondary memory.'), 'sink_token_length': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, description='Number of sink tokens (tokens to always keep in attention window).'), 'use_uvm': FieldInfo(annotation=bool, required=False, default=False, description='Whether to use UVM for the KV cache.')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

class tensorrt_llm.llmapi.KvCacheRetentionConfig#

Bases: pybind11_object

class TokenRangeRetentionConfig#

Bases: pybind11_object

__init__( self: tensorrt_llm.bindings.executor.KvCacheRetentionConfig.TokenRangeRetentionConfig, token_start: int, token_end: int | None, priority: int, duration_ms: datetime.timedelta | None = None, ) → None#

property duration_ms#

property priority#

property token_end#

property token_start#

__init__( self: tensorrt_llm.bindings.executor.KvCacheRetentionConfig, token_range_retention_configs: list[tensorrt_llm.bindings.executor.KvCacheRetentionConfig.TokenRangeRetentionConfig], decode_retention_priority: int = 35, decode_duration_ms: datetime.timedelta | None = None, transfer_mode: tensorrt_llm.bindings.executor.KvCacheTransferMode = DRAM, directory: str | None = None, ) → None#

property decode_duration_ms#

property decode_retention_priority#

property directory#

property token_range_retention_configs#

property transfer_mode#

class tensorrt_llm.llmapi.CudaGraphConfig(

*,

batch_sizes: List[int] | None = None,

max_batch_size: int = 0,

enable_padding: bool = False,

)[source]#

Bases: StrictBaseModel

Configuration for CUDA graphs.

field batch_sizes: List[int] | None = None#: List of batch sizes to create CUDA graphs for.

field enable_padding: bool = False#: If true, batches are rounded up to the nearest cuda_graph_batch_size. This is usually a net win for performance.

field max_batch_size: int = 0#: Maximum batch size for CUDA graphs.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

classmethod update_forward_refs(**localns: Any) → None#

classmethod validate(value: Any) → Self#

validator validate_cuda_graph_max_batch_size » max_batch_size[source]#: Validate cuda_graph_config.max_batch_size is non-negative.

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'batch_sizes': FieldInfo(annotation=Union[List[int], NoneType], required=False, default=None, description='List of batch sizes to create CUDA graphs for.'), 'enable_padding': FieldInfo(annotation=bool, required=False, default=False, description='If true, batches are rounded up to the nearest cuda_graph_batch_size. This is usually a net win for performance.'), 'max_batch_size': FieldInfo(annotation=int, required=False, default=0, description='Maximum batch size for CUDA graphs.')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

class tensorrt_llm.llmapi.MoeConfig(

*,

backend: Literal['CUTLASS', 'CUTEDSL', 'WIDEEP', 'TRTLLM', 'DEEPGEMM', 'VANILLA', 'TRITON'] = 'CUTLASS',

max_num_tokens: int | None = None,

load_balancer: object | str | None = None,

disable_finalize_fusion: bool = False,

)[source]#

Bases: StrictBaseModel

Configuration for MoE.

field backend: Literal['CUTLASS', 'CUTEDSL', 'WIDEEP', 'TRTLLM', 'DEEPGEMM', 'VANILLA', 'TRITON'] = 'CUTLASS'#: MoE backend to use.

field disable_finalize_fusion: bool = False#: Disable FC2+finalize kernel fusion in CUTLASS MoE backend. Setting this to True recovers deterministic numerical behavior with top-k > 2.

field load_balancer: object | str | None = None#: Configuration for MoE load balancing.

field max_num_tokens: int | None = None#: If set, at most max_num_tokens tokens will be sent to torch.ops.trtllm.fused_moe at the same time. If the number of tokens exceeds max_num_tokens, the input tensors will be split into chunks and a for loop will be used.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

classmethod update_forward_refs(**localns: Any) → None#

classmethod validate(value: Any) → Self#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'backend': FieldInfo(annotation=Literal['CUTLASS', 'CUTEDSL', 'WIDEEP', 'TRTLLM', 'DEEPGEMM', 'VANILLA', 'TRITON'], required=False, default='CUTLASS', description='MoE backend to use.'), 'disable_finalize_fusion': FieldInfo(annotation=bool, required=False, default=False, description='Disable FC2+finalize kernel fusion in CUTLASS MoE backend. Setting this to True recovers deterministic numerical behavior with top-k > 2.'), 'load_balancer': FieldInfo(annotation=Union[object, str, NoneType], required=False, default=None, description='Configuration for MoE load balancing.', json_schema_extra={'type': 'Union[MoeLoadBalancerConfig, str]'}), 'max_num_tokens': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, description='If set, at most max_num_tokens tokens will be sent to torch.ops.trtllm.fused_moe at the same time. If the number of tokens exceeds max_num_tokens, the input tensors will be split into chunks and a for loop will be used.')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

class tensorrt_llm.llmapi.LookaheadDecodingConfig(

*,

max_draft_len: int | None = None,

speculative_model_dir: str | Path | None = None,

max_concurrency: int | None = None,

max_window_size: int = 4,

max_ngram_size: int = 3,

max_verification_set_size: int = 4,

)[source]#

Bases: DecodingBaseConfig, PybindMirror

Configuration for lookahead speculative decoding.

field max_concurrency: int | None = None#

field max_draft_len: int | None = None#

field max_ngram_size: int = 3#: Number of tokens per NGram.

field max_verification_set_size: int = 4#: Number of NGrams in verification branch per step.

field max_window_size: int = 4#: Number of NGrams in lookahead branch per step.

field speculative_model_dir: str | Path | None = None#

class Config#

Bases: object

extra = 'forbid'#

__init__(**data)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

calculate_speculative_resource()[source]#

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

classmethod from_pybind( pybind_instance: PybindMirror, ) → T#

Construct an instance of the given class from the fields in the given pybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydantic BaseModel
pybind_instance – Instance of the pybind class to construct from its fields

Notes

When a field value is None in the pybind class, but it’s not optional and has a default value in the BaseModel class, it would get the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the given pybind instance

static get_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

static get_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

static maybe_to_pybind(ins)#

static mirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

static mirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( context: Any, /, ) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

static pybind_equals(obj0, obj1)#: Check if two pybind objects are equal.

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

supports_backend(backend: str) → bool[source]#: Override if the speculation algorithm does not support a subset of the possible backends.

classmethod update_forward_refs(

**localns: Any,

) → None#

validate() → None#: Do any additional error checking here.

validator validate_positive_values » max_ngram_size, max_window_size, max_verification_set_size[source]#

decoding_type: ClassVar[str] = 'Lookahead'#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'max_concurrency': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_draft_len': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_ngram_size': FieldInfo(annotation=int, required=False, default=3, description='Number of tokens per NGram.'), 'max_verification_set_size': FieldInfo(annotation=int, required=False, default=4, description='Number of NGrams in verification branch per step.'), 'max_window_size': FieldInfo(annotation=int, required=False, default=4, description='Number of NGrams in lookahead branch per step.'), 'speculative_model_dir': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None)}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

property spec_dec_mode#

class tensorrt_llm.llmapi.MedusaDecodingConfig(

*,

max_draft_len: int | None = None,

speculative_model_dir: str | Path | None = None,

max_concurrency: int | None = None,

medusa_choices: List[List[int]] | None = None,

num_medusa_heads: int | None = None,

)[source]#

Bases: DecodingBaseConfig

field max_concurrency: int | None = None#

field max_draft_len: int | None = None#

field medusa_choices: List[List[int]] | None = None#

field num_medusa_heads: int | None = None#

field speculative_model_dir: str | Path | None = None#

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

supports_backend(backend: str) → bool[source]#: Override if the speculation algorithm does not support a subset of the possible backends.

classmethod update_forward_refs(

**localns: Any,

) → None#

validate() → None#: Do any additional error checking here.

decoding_type: ClassVar[str] = 'Medusa'#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'max_concurrency': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_draft_len': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'medusa_choices': FieldInfo(annotation=Union[List[List[int]], NoneType], required=False, default=None), 'num_medusa_heads': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'speculative_model_dir': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None)}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

property spec_dec_mode#

class tensorrt_llm.llmapi.EagleDecodingConfig(

*,

max_draft_len: int | None = None,

speculative_model_dir: str | Path | None = None,

max_concurrency: int | None = None,

eagle_choices: List[List[int]] | None = None,

greedy_sampling: bool | None = True,

posterior_threshold: float | None = None,

use_dynamic_tree: bool | None = False,

dynamic_tree_max_topK: int | None = None,

num_eagle_layers: int | None = None,

max_non_leaves_per_layer: int | None = None,

eagle3_one_model: bool | None = True,

)[source]#

Bases: DecodingBaseConfig

field dynamic_tree_max_topK: int | None = None#

field eagle3_one_model: bool | None = True#

field eagle_choices: List[List[int]] | None = None#

field greedy_sampling: bool | None = True#

field max_concurrency: int | None = None#

field max_draft_len: int | None = None#

field max_non_leaves_per_layer: int | None = None#

field num_eagle_layers: int | None = None#

field posterior_threshold: float | None = None#

field speculative_model_dir: str | Path | None = None#

field use_dynamic_tree: bool | None = False#

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

supports_backend(backend: str) → bool#: Override if the speculation algorithm does not support a subset of the possible backends.

classmethod update_forward_refs(

**localns: Any,

) → None#

validate() → None[source]#: Do any additional error checking here.

decoding_type: ClassVar[str] = 'Eagle'#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'dynamic_tree_max_topK': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'eagle3_one_model': FieldInfo(annotation=Union[bool, NoneType], required=False, default=True), 'eagle_choices': FieldInfo(annotation=Union[List[List[int]], NoneType], required=False, default=None), 'greedy_sampling': FieldInfo(annotation=Union[bool, NoneType], required=False, default=True), 'max_concurrency': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_draft_len': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_non_leaves_per_layer': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'num_eagle_layers': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'posterior_threshold': FieldInfo(annotation=Union[float, NoneType], required=False, default=None), 'speculative_model_dir': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None), 'use_dynamic_tree': FieldInfo(annotation=Union[bool, NoneType], required=False, default=False)}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

property spec_dec_mode#

class tensorrt_llm.llmapi.MTPDecodingConfig(

*,

max_draft_len: int | None = None,

speculative_model_dir: str | Path | None = None,

max_concurrency: int | None = None,

num_nextn_predict_layers: int = 1,

use_relaxed_acceptance_for_thinking: bool = False,

relaxed_topk: int = 1,

relaxed_delta: float = 0.0,

use_mtp_vanilla: bool = False,

num_nextn_predict_layers_from_model_config: int = 1,

BEGIN_THINKING_PHASE_TOKEN: int = 128798,

END_THINKING_PHASE_TOKEN: int = 128799,

)[source]#

Bases: DecodingBaseConfig

field BEGIN_THINKING_PHASE_TOKEN: int = 128798#

field END_THINKING_PHASE_TOKEN: int = 128799#

field max_concurrency: int | None = None#

field max_draft_len: int | None = None#

field num_nextn_predict_layers: int = 1#

field num_nextn_predict_layers_from_model_config: int = 1#

field relaxed_delta: float = 0.0#

field relaxed_topk: int = 1#

field speculative_model_dir: str | Path | None = None#

field use_mtp_vanilla: bool = False#

field use_relaxed_acceptance_for_thinking: bool = False#

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

supports_backend(backend: str) → bool[source]#: Override if the speculation algorithm does not support a subset of the possible backends.

classmethod update_forward_refs(**localns: Any) → None#

validate() → None#: Do any additional error checking here.

decoding_type: ClassVar[str] = 'MTP'#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'BEGIN_THINKING_PHASE_TOKEN': FieldInfo(annotation=int, required=False, default=128798), 'END_THINKING_PHASE_TOKEN': FieldInfo(annotation=int, required=False, default=128799), 'max_concurrency': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_draft_len': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'num_nextn_predict_layers': FieldInfo(annotation=int, required=False, default=1), 'num_nextn_predict_layers_from_model_config': FieldInfo(annotation=int, required=False, default=1), 'relaxed_delta': FieldInfo(annotation=float, required=False, default=0.0), 'relaxed_topk': FieldInfo(annotation=int, required=False, default=1), 'speculative_model_dir': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None), 'use_mtp_vanilla': FieldInfo(annotation=bool, required=False, default=False), 'use_relaxed_acceptance_for_thinking': FieldInfo(annotation=bool, required=False, default=False)}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

property spec_dec_mode#

class tensorrt_llm.llmapi.SchedulerConfig(

*,

capacity_scheduler_policy: CapacitySchedulerPolicy = CapacitySchedulerPolicy.GUARANTEED_NO_EVICT,

context_chunking_policy: ContextChunkingPolicy | None = None,

dynamic_batch_config: DynamicBatchConfig | None = None,

)[source]#

Bases: StrictBaseModel, PybindMirror

field capacity_scheduler_policy: CapacitySchedulerPolicy = CapacitySchedulerPolicy.GUARANTEED_NO_EVICT#: The capacity scheduler policy to use

field context_chunking_policy: ContextChunkingPolicy | None = None#: The context chunking policy to use

field dynamic_batch_config: DynamicBatchConfig | None = None#: The dynamic batch config to use

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_orm(obj: Any) → Self#

classmethod from_pybind( pybind_instance: PybindMirror, ) → T#

Construct an instance of the given class from the fields in the given pybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydantic BaseModel
pybind_instance – Instance of the pybind class to construct from its fields

Notes

When a field value is None in the pybind class, but it’s not optional and has a default value in the BaseModel class, it would get the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the given pybind instance

static get_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

static get_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

static maybe_to_pybind(ins)#

static mirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

static mirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

static pybind_equals(obj0, obj1)#: Check if two pybind objects are equal.

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

classmethod update_forward_refs(**localns: Any) → None#

classmethod validate(value: Any) → Self#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'capacity_scheduler_policy': FieldInfo(annotation=CapacitySchedulerPolicy, required=False, default=<CapacitySchedulerPolicy.GUARANTEED_NO_EVICT: 'GUARANTEED_NO_EVICT'>, description='The capacity scheduler policy to use'), 'context_chunking_policy': FieldInfo(annotation=Union[ContextChunkingPolicy, NoneType], required=False, default=None, description='The context chunking policy to use'), 'dynamic_batch_config': FieldInfo(annotation=Union[DynamicBatchConfig, NoneType], required=False, default=None, description='The dynamic batch config to use')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

class tensorrt_llm.llmapi.CapacitySchedulerPolicy(

value,

names=<not given>,

*values,

module=None,

qualname=None,

type=None,

start=1,

boundary=None,

)[source]#

Bases: StrEnum

__init__(*args, **kwds)#

capitalize()#

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower case.

casefold()#: Return a version of the string suitable for caseless comparisons.

center(width, fillchar=' ', /)#

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[, start[, end]]) → int#: Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

encode(encoding='utf-8', errors='strict')#

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[, start[, end]]) → bool#: Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)#

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[, start[, end]]) → int#

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args, **kwargs) → str#: Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping) → str#: Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).

index(sub[, start[, end]]) → int#

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()#

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.

isalpha()#

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.

isascii()#

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.

isdecimal()#

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.

isdigit()#

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and there is at least one character in the string.

isidentifier()#

Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.

islower()#

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.

isnumeric()#

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is at least one character in the string.

isprintable()#

Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable in repr() or if it is empty.

isspace()#

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.

istitle()#

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.

isupper()#

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.

join(iterable, /)#

Concatenate any number of strings.

The string whose method is called is inserted in between each given string. The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

ljust(width, fillchar=' ', /)#

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower()#: Return a copy of the string converted to lowercase.

lstrip(chars=None, /)#

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

static maketrans()#

Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.

partition(sep, /)#

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original string and two empty strings.

removeprefix(prefix, /)#

Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.

removesuffix(suffix, /)#

Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.

replace(old, new, count=-1, /)#

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences are replaced.

rfind(sub[, start[, end]]) → int#

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[, start[, end]]) → int#

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width, fillchar=' ', /)#

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep, /)#

Partition the string into three parts using the given separator.

This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing two empty strings and the original string.

rsplit(sep=None, maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.

rstrip(chars=None, /)#

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None, maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the front of the string and works to the end.

Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.

splitlines(keepends=False)#

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and true.

startswith(prefix[, start[, end]]) → bool#: Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

strip(chars=None, /)#

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase()#: Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()#

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining cased characters have lower case.

translate(table, /)#

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.

The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.

upper()#: Return a copy of the string converted to uppercase.

zfill(width, /)#

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

GUARANTEED_NO_EVICT = 'GUARANTEED_NO_EVICT'#

MAX_UTILIZATION = 'MAX_UTILIZATION'#

STATIC_BATCH = 'STATIC_BATCH'#

class tensorrt_llm.llmapi.BuildConfig(

max_input_len: int = 1024,

max_seq_len: int = None,

opt_batch_size: int = 8,

max_batch_size: int = 2048,

max_beam_width: int = 1,

max_num_tokens: int = 8192,

opt_num_tokens: Optional[int] = None,

max_prompt_embedding_table_size: int = 0,

kv_cache_type: tensorrt_llm.bindings.KVCacheType = None,

gather_context_logits: int = False,

gather_generation_logits: int = False,

strongly_typed: bool = True,

force_num_profiles: Optional[int] = None,

profiling_verbosity: str = 'layer_names_only',

enable_debug_output: bool = False,

max_draft_len: int = 0,

speculative_decoding_mode: tensorrt_llm.models.modeling_utils.SpeculativeDecodingMode = <SpeculativeDecodingMode.NONE: 1>,

use_refit: bool = False,

input_timing_cache: str = None,

output_timing_cache: str = 'model.cache',

lora_config: tensorrt_llm.lora_helper.LoraConfig = <factory>,

auto_parallel_config: tensorrt_llm.auto_parallel.config.AutoParallelConfig = <factory>,

weight_sparsity: bool = False,

weight_streaming: bool = False,

plugin_config: tensorrt_llm.plugin.plugin.PluginConfig = <factory>,

use_strip_plan: bool = False,

max_encoder_input_len: int = 1024,

dry_run: bool = False,

visualize_network: str = None,

monitor_memory: bool = False,

use_mrope: bool = False,

)[source]#

Bases: object

__init__( max_input_len: int = 1024, max_seq_len: int = None, opt_batch_size: int = 8, max_batch_size: int = 2048, max_beam_width: int = 1, max_num_tokens: int = 8192, opt_num_tokens: int | None = None, max_prompt_embedding_table_size: int = 0, kv_cache_type: ~tensorrt_llm.bindings.KVCacheType = None, gather_context_logits: int = False, gather_generation_logits: int = False, strongly_typed: bool = True, force_num_profiles: int | None = None, profiling_verbosity: str = 'layer_names_only', enable_debug_output: bool = False, max_draft_len: int = 0, speculative_decoding_mode: ~tensorrt_llm.models.modeling_utils.SpeculativeDecodingMode = <SpeculativeDecodingMode.NONE: 1>, use_refit: bool = False, input_timing_cache: str = None, output_timing_cache: str = 'model.cache', lora_config: ~tensorrt_llm.lora_helper.LoraConfig = <factory>, auto_parallel_config: ~tensorrt_llm.auto_parallel.config.AutoParallelConfig = <factory>, weight_sparsity: bool = False, weight_streaming: bool = False, plugin_config: ~tensorrt_llm.plugin.plugin.PluginConfig = <factory>, use_strip_plan: bool = False, max_encoder_input_len: int = 1024, dry_run: bool = False, visualize_network: str = None, monitor_memory: bool = False, use_mrope: bool = False, ) → None#

classmethod from_dict(config, plugin_config=None)[source]#

classmethod from_json_file(config_file, plugin_config=None)[source]#

classmethod get_build_config_defaults()[source]#

to_dict()[source]#

update(**kwargs)[source]#

update_from_dict(config: dict)[source]#

update_kv_cache_type(model_architecture: str)[source]#

auto_parallel_config: AutoParallelConfig#

dry_run: bool = False#

enable_debug_output: bool = False#

force_num_profiles: int | None = None#

gather_context_logits: int = False#

gather_generation_logits: int = False#

input_timing_cache: str = None#

kv_cache_type: KVCacheType = None#

lora_config: LoraConfig#

max_batch_size: int = 2048#

max_beam_width: int = 1#

max_draft_len: int = 0#

max_encoder_input_len: int = 1024#

max_input_len: int = 1024#

max_num_tokens: int = 8192#

max_prompt_embedding_table_size: int = 0#

max_seq_len: int = None#

monitor_memory: bool = False#

opt_batch_size: int = 8#

opt_num_tokens: int | None = None#

output_timing_cache: str = 'model.cache'#

plugin_config: PluginConfig#

profiling_verbosity: str = 'layer_names_only'#

speculative_decoding_mode: SpeculativeDecodingMode = 1#

strongly_typed: bool = True#

use_mrope: bool = False#

use_refit: bool = False#

use_strip_plan: bool = False#

visualize_network: str = None#

weight_sparsity: bool = False#

weight_streaming: bool = False#

class tensorrt_llm.llmapi.QuantConfig(

quant_algo: QuantAlgo | None = None,

kv_cache_quant_algo: QuantAlgo | None = None,

group_size: int = 128,

smoothquant_val: float = 0.5,

clamp_val: List[float] | None = None,

use_meta_recipe: bool = False,

has_zero_point: bool = False,

pre_quant_scale: bool = False,

exclude_modules: List[str] | None = None,

mamba_ssm_cache_dtype: str | None = None,

)[source]#

Bases: object

Serializable quantization configuration class, part of the PretrainedConfig.

Parameters:

quant_algo (tensorrt_llm.quantization.mode.QuantAlgo, optional) – Quantization algorithm. Defaults to None.
kv_cache_quant_algo (tensorrt_llm.quantization.mode.QuantAlgo, optional) – KV cache quantization algorithm. Defaults to None.
group_size (int) – The group size for group-wise quantization. Defaults to 128.
smoothquant_val (float) – The smoothing parameter alpha used in smooth quant. Defaults to 0.5.
clamp_val (List[float], optional) – The clamp values used in FP8 rowwise quantization. Defaults to None.
use_meta_recipe (bool) – Whether to use Meta’s recipe for FP8 rowwise quantization. Defaults to False.
has_zero_point (bool) – Whether to use zero point for quantization. Defaults to False.
pre_quant_scale (bool) – Whether to use pre-quant scale for quantization. Defaults to False.
exclude_modules (List[str], optional) – The module name patterns that are skipped in quantization. Defaults to None.
mamba_ssm_cache_dtype (str, optional) – The data type for mamba SSM cache. Defaults to None.

__init__( quant_algo: QuantAlgo | None = None, kv_cache_quant_algo: QuantAlgo | None = None, group_size: int = 128, smoothquant_val: float = 0.5, clamp_val: List[float] | None = None, use_meta_recipe: bool = False, has_zero_point: bool = False, pre_quant_scale: bool = False, exclude_modules: List[str] | None = None, mamba_ssm_cache_dtype: str | None = None, ) → None#

classmethod from_dict( config: dict, ) → QuantConfig[source]#

Create a QuantConfig instance from a dict.

Parameters:: config (dict) – The dict used to create QuantConfig.
Returns:: The QuantConfig created from dict.
Return type:: tensorrt_llm.models.modeling_utils.QuantConfig

is_module_excluded_from_quantization(name: str) → bool[source]#

Check if the module is excluded from quantization.

Parameters:: name (str) – The name of the module.
Returns:: True if the module is excluded from quantization, False otherwise.
Return type:: bool

to_dict() → dict[source]#

Dump a QuantConfig instance to a dict.

Returns:: The dict dumped from QuantConfig.
Return type:: dict

clamp_val: List[float] | None = None#

exclude_modules: List[str] | None = None#

group_size: int = 128#

has_zero_point: bool = False#

kv_cache_quant_algo: QuantAlgo | None = None#

property layer_quant_mode: QuantMode#

mamba_ssm_cache_dtype: str | None = None#

pre_quant_scale: bool = False#

quant_algo: QuantAlgo | None = None#

property quant_mode: QuantModeWrapper#

smoothquant_val: float = 0.5#

use_meta_recipe: bool = False#

class tensorrt_llm.llmapi.QuantAlgo(

value,

names=<not given>,

*values,

module=None,

qualname=None,

type=None,

start=1,

boundary=None,

)[source]#

Bases: StrEnum

__init__(*args, **kwds)#

capitalize()#

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower case.

casefold()#: Return a version of the string suitable for caseless comparisons.

center(width, fillchar=' ', /)#

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[, start[, end]]) → int#: Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

encode(encoding='utf-8', errors='strict')#

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[, start[, end]]) → bool#: Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)#

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[, start[, end]]) → int#

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args, **kwargs) → str#: Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping) → str#: Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).

index(sub[, start[, end]]) → int#

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()#

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.

isalpha()#

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.

isascii()#

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.

isdecimal()#

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.

isdigit()#

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and there is at least one character in the string.

isidentifier()#

Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.

islower()#

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.

isnumeric()#

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is at least one character in the string.

isprintable()#

Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable in repr() or if it is empty.

isspace()#

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.

istitle()#

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.

isupper()#

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.

join(iterable, /)#

Concatenate any number of strings.

The string whose method is called is inserted in between each given string. The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

ljust(width, fillchar=' ', /)#

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower()#: Return a copy of the string converted to lowercase.

lstrip(chars=None, /)#

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

static maketrans()#

Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.

partition(sep, /)#

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original string and two empty strings.

removeprefix(prefix, /)#

Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.

removesuffix(suffix, /)#

Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.

replace(old, new, count=-1, /)#

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences are replaced.

rfind(sub[, start[, end]]) → int#

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[, start[, end]]) → int#

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width, fillchar=' ', /)#

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep, /)#

Partition the string into three parts using the given separator.

This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing two empty strings and the original string.

rsplit(sep=None, maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.

rstrip(chars=None, /)#

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None, maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the front of the string and works to the end.

Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.

splitlines(keepends=False)#

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and true.

startswith(prefix[, start[, end]]) → bool#: Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

strip(chars=None, /)#

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase()#: Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()#

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining cased characters have lower case.

translate(table, /)#

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.

The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.

upper()#: Return a copy of the string converted to uppercase.

zfill(width, /)#

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

FP8 = 'FP8'#

FP8_BLOCK_SCALES = 'FP8_BLOCK_SCALES'#

FP8_PER_CHANNEL_PER_TOKEN = 'FP8_PER_CHANNEL_PER_TOKEN'#

INT8 = 'INT8'#

MIXED_PRECISION = 'MIXED_PRECISION'#

NO_QUANT = 'NO_QUANT'#

NVFP4 = 'NVFP4'#

W4A16 = 'W4A16'#

W4A16_AWQ = 'W4A16_AWQ'#

W4A16_GPTQ = 'W4A16_GPTQ'#

W4A16_MXFP4 = 'W4A16_MXFP4'#

W4A8_AWQ = 'W4A8_AWQ'#

W4A8_MXFP4_FP8 = 'W4A8_MXFP4_FP8'#

W4A8_MXFP4_MXFP8 = 'W4A8_MXFP4_MXFP8'#

W4A8_QSERVE_PER_CHANNEL = 'W4A8_QSERVE_PER_CHANNEL'#

W4A8_QSERVE_PER_GROUP = 'W4A8_QSERVE_PER_GROUP'#

W8A16 = 'W8A16'#

W8A16_GPTQ = 'W8A16_GPTQ'#

W8A8_SQ_PER_CHANNEL = 'W8A8_SQ_PER_CHANNEL'#

W8A8_SQ_PER_CHANNEL_PER_TENSOR_PLUGIN = 'W8A8_SQ_PER_CHANNEL_PER_TENSOR_PLUGIN'#

W8A8_SQ_PER_CHANNEL_PER_TOKEN_PLUGIN = 'W8A8_SQ_PER_CHANNEL_PER_TOKEN_PLUGIN'#

W8A8_SQ_PER_TENSOR_PER_TOKEN_PLUGIN = 'W8A8_SQ_PER_TENSOR_PER_TOKEN_PLUGIN'#

W8A8_SQ_PER_TENSOR_PLUGIN = 'W8A8_SQ_PER_TENSOR_PLUGIN'#

class tensorrt_llm.llmapi.CalibConfig(

*,

device: Literal['cuda', 'cpu'] = 'cuda',

calib_dataset: str = 'cnn_dailymail',

calib_batches: int = 512,

calib_batch_size: int = 1,

calib_max_seq_length: int = 512,

random_seed: int = 1234,

tokenizer_max_seq_length: int = 2048,

)[source]#

Bases: StrictBaseModel

Calibration configuration.

field calib_batch_size: int = 1#: The batch size that the calibration runs.

field calib_batches: int = 512#: The number of batches that the calibration runs.

field calib_dataset: str = 'cnn_dailymail'#: The name or local path of calibration dataset.

field calib_max_seq_length: int = 512#: The maximum sequence length that the calibration runs.

field device: Literal['cuda', 'cpu'] = 'cuda'#: The device to run calibration.

field random_seed: int = 1234#: The random seed used for calibration.

field tokenizer_max_seq_length: int = 2048#: The maximum sequence length to initialize tokenizer for calibration.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict( config: dict, ) → CalibConfig[source]#

Create a CalibConfig instance from a dict.

Parameters:: config (dict) – The dict used to create CalibConfig.
Returns:: The CalibConfig created from dict.
Return type:: tensorrt_llm.llmapi.CalibConfig

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

to_dict() → dict[source]#

Dump a CalibConfig instance to a dict.

Returns:: The dict dumped from CalibConfig.
Return type:: dict

classmethod update_forward_refs(**localns: Any) → None#

classmethod validate(value: Any) → Self#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'calib_batch_size': FieldInfo(annotation=int, required=False, default=1, description='The batch size that the calibration runs.'), 'calib_batches': FieldInfo(annotation=int, required=False, default=512, description='The number of batches that the calibration runs.'), 'calib_dataset': FieldInfo(annotation=str, required=False, default='cnn_dailymail', description='The name or local path of calibration dataset.'), 'calib_max_seq_length': FieldInfo(annotation=int, required=False, default=512, description='The maximum sequence length that the calibration runs.'), 'device': FieldInfo(annotation=Literal['cuda', 'cpu'], required=False, default='cuda', description='The device to run calibration.'), 'random_seed': FieldInfo(annotation=int, required=False, default=1234, description='The random seed used for calibration.'), 'tokenizer_max_seq_length': FieldInfo(annotation=int, required=False, default=2048, description='The maximum sequence length to initialize tokenizer for calibration.')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

class tensorrt_llm.llmapi.BuildCacheConfig(

cache_root: Path | None = None,

max_records: int = 10,

max_cache_storage_gb: float = 256,

)[source]#

Bases: object

Configuration for the build cache.

cache_root#

The root directory for the build cache.

Type:: str

max_records#

The maximum number of records to store in the cache.

Type:: int

max_cache_storage_gb#

The maximum amount of storage (in GB) to use for the cache.

Type:: float

Note

The build-cache assumes the weights of the model are not changed during the execution. If the weights are changed, you should remove the caches manually.

__init__( cache_root: Path | None = None, max_records: int = 10, max_cache_storage_gb: float = 256, )[source]#

property cache_root: Path#

property max_cache_storage_gb: float#

property max_records: int#

class tensorrt_llm.llmapi.RequestError[source]#

Bases: RuntimeError

The error raised when the request is failed.

__init__(*args, **kwargs)#

add_note()#: Exception.add_note(note) – add a note to the exception

with_traceback()#: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

args#

class tensorrt_llm.llmapi.MpiCommSession(comm=None, n_workers: int = 1)[source]#

Bases: MpiSession

__init__(comm=None, n_workers: int = 1)[source]#

abort()[source]#

get_comm()[source]#

is_comm_session() → bool#

shutdown(wait=True)[source]#

shutdown_abort(grace: float = 60, reason=None)#

submit(

task: Callable[[...], T],

*args,

**kwargs,

) → List[Future[T]][source]#

Submit a task to MPI workers.

Parameters:

task – The task to be submitted.
args – Positional arguments for the task.
kwargs – Keyword arguments for the task.

submit_sync(

task: Callable[[...], T],

*args,

**kwargs,

) → List[T][source]#

class tensorrt_llm.llmapi.ExtendedRuntimePerfKnobConfig(

*,

multi_block_mode: bool = True,

enable_context_fmha_fp32_acc: bool = False,

cuda_graph_mode: bool = False,

cuda_graph_cache_size: int = 0,

)[source]#

Bases: StrictBaseModel, PybindMirror

Configuration for extended runtime performance knobs.

field cuda_graph_cache_size: int = 0#: Number of cuda graphs to be cached in the runtime. The larger the cache, the better the perf, but more GPU memory is consumed.

field cuda_graph_mode: bool = False#: Whether to use CUDA graph mode.

field enable_context_fmha_fp32_acc: bool = False#: Whether to enable context FMHA FP32 accumulation.

field multi_block_mode: bool = True#: Whether to use multi-block mode.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_orm( obj: Any, ) → Self#

classmethod from_pybind( pybind_instance: PybindMirror, ) → T#

Construct an instance of the given class from the fields in the given pybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydantic BaseModel
pybind_instance – Instance of the pybind class to construct from its fields

Notes

When a field value is None in the pybind class, but it’s not optional and has a default value in the BaseModel class, it would get the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the given pybind instance

static get_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

static get_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

static maybe_to_pybind(ins)#

static mirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

static mirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( context: Any, /, ) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj( obj: Any, ) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

static pybind_equals(obj0, obj1)#: Check if two pybind objects are equal.

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

classmethod update_forward_refs(

**localns: Any,

) → None#

classmethod validate( value: Any, ) → Self#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'cuda_graph_cache_size': FieldInfo(annotation=int, required=False, default=0, description='Number of cuda graphs to be cached in the runtime. The larger the cache, the better the perf, but more GPU memory is consumed.'), 'cuda_graph_mode': FieldInfo(annotation=bool, required=False, default=False, description='Whether to use CUDA graph mode.'), 'enable_context_fmha_fp32_acc': FieldInfo(annotation=bool, required=False, default=False, description='Whether to enable context FMHA FP32 accumulation.'), 'multi_block_mode': FieldInfo(annotation=bool, required=False, default=True, description='Whether to use multi-block mode.')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

class tensorrt_llm.llmapi.BatchingType(

value,

names=<not given>,

*values,

module=None,

qualname=None,

type=None,

start=1,

boundary=None,

)[source]#

Bases: StrEnum

__init__(*args, **kwds)#

capitalize()#

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower case.

casefold()#: Return a version of the string suitable for caseless comparisons.

center(width, fillchar=' ', /)#

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[, start[, end]]) → int#: Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

encode(encoding='utf-8', errors='strict')#

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[, start[, end]]) → bool#: Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)#

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[, start[, end]]) → int#

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args, **kwargs) → str#: Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping) → str#: Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).

index(sub[, start[, end]]) → int#

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()#

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.

isalpha()#

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.

isascii()#

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.

isdecimal()#

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.

isdigit()#

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and there is at least one character in the string.

isidentifier()#

Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.

islower()#

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.

isnumeric()#

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is at least one character in the string.

isprintable()#

Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable in repr() or if it is empty.

isspace()#

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.

istitle()#

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.

isupper()#

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.

join(iterable, /)#

Concatenate any number of strings.

The string whose method is called is inserted in between each given string. The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

ljust(width, fillchar=' ', /)#

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower()#: Return a copy of the string converted to lowercase.

lstrip(chars=None, /)#

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

static maketrans()#

Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.

partition(sep, /)#

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original string and two empty strings.

removeprefix(prefix, /)#

Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.

removesuffix(suffix, /)#

Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.

replace(old, new, count=-1, /)#

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences are replaced.

rfind(sub[, start[, end]]) → int#

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[, start[, end]]) → int#

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width, fillchar=' ', /)#

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep, /)#

Partition the string into three parts using the given separator.

This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing two empty strings and the original string.

rsplit(sep=None, maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.

rstrip(chars=None, /)#

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None, maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the front of the string and works to the end.

Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.

splitlines(keepends=False)#

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and true.

startswith(prefix[, start[, end]]) → bool#: Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

strip(chars=None, /)#

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase()#: Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()#

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining cased characters have lower case.

translate(table, /)#

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.

The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.

upper()#: Return a copy of the string converted to uppercase.

zfill(width, /)#

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

INFLIGHT = 'INFLIGHT'#

STATIC = 'STATIC'#

class tensorrt_llm.llmapi.ContextChunkingPolicy(

value,

names=<not given>,

*values,

module=None,

qualname=None,

type=None,

start=1,

boundary=None,

)[source]#

Bases: StrEnum

Context chunking policy.

__init__(*args, **kwds)#

capitalize()#

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower case.

casefold()#: Return a version of the string suitable for caseless comparisons.

center(width, fillchar=' ', /)#

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[, start[, end]]) → int#: Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

encode(encoding='utf-8', errors='strict')#

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[, start[, end]]) → bool#: Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)#

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[, start[, end]]) → int#

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args, **kwargs) → str#: Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping) → str#: Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).

index(sub[, start[, end]]) → int#

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()#

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.

isalpha()#

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.

isascii()#

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.

isdecimal()#

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.

isdigit()#

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and there is at least one character in the string.

isidentifier()#

Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.

islower()#

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.

isnumeric()#

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is at least one character in the string.

isprintable()#

Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable in repr() or if it is empty.

isspace()#

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.

istitle()#

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.

isupper()#

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.

join(iterable, /)#

Concatenate any number of strings.

The string whose method is called is inserted in between each given string. The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

ljust(width, fillchar=' ', /)#

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower()#: Return a copy of the string converted to lowercase.

lstrip(chars=None, /)#

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

static maketrans()#

Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.

partition(sep, /)#

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original string and two empty strings.

removeprefix(prefix, /)#

Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.

removesuffix(suffix, /)#

Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.

replace(old, new, count=-1, /)#

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences are replaced.

rfind(sub[, start[, end]]) → int#

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[, start[, end]]) → int#

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width, fillchar=' ', /)#

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep, /)#

Partition the string into three parts using the given separator.

This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing two empty strings and the original string.

rsplit(sep=None, maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.

rstrip(chars=None, /)#

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None, maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the front of the string and works to the end.

Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.

splitlines(keepends=False)#

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and true.

startswith(prefix[, start[, end]]) → bool#: Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

strip(chars=None, /)#

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase()#: Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()#

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining cased characters have lower case.

translate(table, /)#

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.

The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.

upper()#: Return a copy of the string converted to uppercase.

zfill(width, /)#

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

EQUAL_PROGRESS = 'EQUAL_PROGRESS'#

FIRST_COME_FIRST_SERVED = 'FIRST_COME_FIRST_SERVED'#

class tensorrt_llm.llmapi.DynamicBatchConfig(

*,

enable_batch_size_tuning: bool,

enable_max_num_tokens_tuning: bool,

dynamic_batch_moving_average_window: int,

)[source]#

Bases: StrictBaseModel, PybindMirror

Dynamic batch configuration.

Controls how batch size and token limits are dynamically adjusted at runtime.

field dynamic_batch_moving_average_window: int [Required]#: The window size for moving average of input and output length which is used to calculate dynamic batch size and max num tokens

field enable_batch_size_tuning: bool [Required]#: Controls if the batch size should be tuned dynamically

field enable_max_num_tokens_tuning: bool [Required]#: Controls if the max num tokens should be tuned dynamically

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_orm(obj: Any) → Self#

classmethod from_pybind( pybind_instance: PybindMirror, ) → T#

Construct an instance of the given class from the fields in the given pybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydantic BaseModel
pybind_instance – Instance of the pybind class to construct from its fields

Notes

When a field value is None in the pybind class, but it’s not optional and has a default value in the BaseModel class, it would get the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the given pybind instance

static get_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

static get_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

static maybe_to_pybind(ins)#

static mirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

static mirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

static pybind_equals(obj0, obj1)#: Check if two pybind objects are equal.

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

classmethod update_forward_refs(**localns: Any) → None#

classmethod validate(value: Any) → Self#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'dynamic_batch_moving_average_window': FieldInfo(annotation=int, required=True, description='The window size for moving average of input and output length which is used to calculate dynamic batch size and max num tokens'), 'enable_batch_size_tuning': FieldInfo(annotation=bool, required=True, description='Controls if the batch size should be tuned dynamically'), 'enable_max_num_tokens_tuning': FieldInfo(annotation=bool, required=True, description='Controls if the max num tokens should be tuned dynamically')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

class tensorrt_llm.llmapi.CacheTransceiverConfig(

*,

backend: Literal['DEFAULT', 'UCX', 'NIXL', 'MPI'] | None = None,

max_tokens_in_buffer: int | None = None,

)[source]#

Bases: StrictBaseModel, PybindMirror

Configuration for the cache transceiver.

field backend: Literal['DEFAULT', 'UCX', 'NIXL', 'MPI'] | None = None#: The communication backend type to use for the cache transceiver.

field max_tokens_in_buffer: int | None = None#: The max number of tokens the transfer buffer can fit.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_orm(obj: Any) → Self#

classmethod from_pybind( pybind_instance: PybindMirror, ) → T#

Construct an instance of the given class from the fields in the given pybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydantic BaseModel
pybind_instance – Instance of the pybind class to construct from its fields

Notes

When a field value is None in the pybind class, but it’s not optional and has a default value in the BaseModel class, it would get the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the given pybind instance

static get_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

static get_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

static maybe_to_pybind(ins)#

static mirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

static mirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( context: Any, /, ) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

static pybind_equals(obj0, obj1)#: Check if two pybind objects are equal.

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

classmethod update_forward_refs(

**localns: Any,

) → None#

classmethod validate(value: Any) → Self#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'backend': FieldInfo(annotation=Union[Literal['DEFAULT', 'UCX', 'NIXL', 'MPI'], NoneType], required=False, default=None, description='The communication backend type to use for the cache transceiver.'), 'max_tokens_in_buffer': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, description='The max number of tokens the transfer buffer can fit.')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

class tensorrt_llm.llmapi.NGramDecodingConfig(

*,

max_draft_len: int | None = None,

speculative_model_dir: str | Path | None = None,

max_concurrency: int | None = None,

max_matching_ngram_size: int = 0,

is_keep_all: bool = True,

is_use_oldest: bool = True,

is_public_pool: bool = True,

)[source]#

Bases: DecodingBaseConfig

Configuration for NGram drafter speculative decoding.

Parameters:

max_draft_len – int The length maximum of draft tokens (can be understood as length maximum of output draft tokens).
max_matching_ngram_size – int The length maximum of searching tokens (can be understood as length maximum of input tokens to search).
is_keep_all – bool = True Whether to keep all candidate pattern-matches pairs, only one match is kept for each pattern if False.
is_use_oldest – bool = True Whether to provide the oldest match when pattern is hit, the newest one is provided if False.
is_public_pool – bool = True Whether to use a common pool for all requests, or the pool is private for each request if False.

field is_keep_all: bool = True#

field is_public_pool: bool = True#

field is_use_oldest: bool = True#

field max_concurrency: int | None = None#

field max_draft_len: int | None = None#

field max_matching_ngram_size: int = 0#

field speculative_model_dir: str | Path | None = None#

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

supports_backend(backend: str) → bool[source]#: Override if the speculation algorithm does not support a subset of the possible backends.

classmethod update_forward_refs(

**localns: Any,

) → None#

validate() → None#: Do any additional error checking here.

decoding_type: ClassVar[str] = 'NGram'#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'is_keep_all': FieldInfo(annotation=bool, required=False, default=True), 'is_public_pool': FieldInfo(annotation=bool, required=False, default=True), 'is_use_oldest': FieldInfo(annotation=bool, required=False, default=True), 'max_concurrency': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_draft_len': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_matching_ngram_size': FieldInfo(annotation=int, required=False, default=0), 'speculative_model_dir': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None)}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

property spec_dec_mode#

class tensorrt_llm.llmapi.UserProvidedDecodingConfig(

*,

max_draft_len: int | None = None,

speculative_model_dir: str | Path | None = None,

max_concurrency: int | None = None,

drafter: object,

resource_manager: object = None,

)[source]#

Bases: DecodingBaseConfig

field drafter: object [Required]#

field max_concurrency: int | None = None#

field max_draft_len: int | None = None#

field resource_manager: object = None#

field speculative_model_dir: str | Path | None = None#

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( context: Any, /, ) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

supports_backend(backend: str) → bool#: Override if the speculation algorithm does not support a subset of the possible backends.

classmethod update_forward_refs(

**localns: Any,

) → None#

validate() → None#: Do any additional error checking here.

decoding_type: ClassVar[str] = 'User_Provided'#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'drafter': FieldInfo(annotation=object, required=True), 'max_concurrency': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_draft_len': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'resource_manager': FieldInfo(annotation=object, required=False, default=None), 'speculative_model_dir': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None)}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

property spec_dec_mode#

class tensorrt_llm.llmapi.TorchCompileConfig(

*,

enable_fullgraph: bool = True,

enable_inductor: bool = False,

enable_piecewise_cuda_graph: bool = False,

capture_num_tokens: List[int] | None = None,

enable_userbuffers: bool = True,

max_num_streams: int = 1,

)[source]#

Bases: StrictBaseModel

Configuration for torch.compile.

field capture_num_tokens: List[int] | None = None#: List of num of tokens to capture the piecewise CUDA graph for. If not provided, the number of tokens will be the same as cuda_graph_config.batch_sizes.

field enable_fullgraph: bool = True#: Enable full graph compilation in torch.compile.

field enable_inductor: bool = False#: Enable inductor backend in torch.compile.

field enable_piecewise_cuda_graph: bool = False#: Enable piecewise CUDA graph in torch.compile.

field enable_userbuffers: bool = True#: When torch compile is enabled, userbuffers is enabled by default.

field max_num_streams: int = 1#: The maximum number of CUDA streams to use for torch.compile.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

classmethod update_forward_refs(**localns: Any) → None#

classmethod validate(value: Any) → Self#

validator validate_capture_num_tokens » capture_num_tokens[source]#

validator validate_torch_compile_max_num_streams » max_num_streams[source]#: Validate torch_compile_config.max_num_streams >= 1.

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'capture_num_tokens': FieldInfo(annotation=Union[List[int], NoneType], required=False, default=None, description='List of num of tokens to capture the piecewise CUDA graph for. If not provided, the number of tokens will be the same as cuda_graph_config.batch_sizes.'), 'enable_fullgraph': FieldInfo(annotation=bool, required=False, default=True, description='Enable full graph compilation in torch.compile.'), 'enable_inductor': FieldInfo(annotation=bool, required=False, default=False, description='Enable inductor backend in torch.compile.'), 'enable_piecewise_cuda_graph': FieldInfo(annotation=bool, required=False, default=False, description='Enable piecewise CUDA graph in torch.compile.'), 'enable_userbuffers': FieldInfo(annotation=bool, required=False, default=True, description='When torch compile is enabled, userbuffers is enabled by default.'), 'max_num_streams': FieldInfo(annotation=int, required=False, default=1, description='The maximum number of CUDA streams to use for torch.compile.')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

class tensorrt_llm.llmapi.DraftTargetDecodingConfig(

*,

max_draft_len: int | None = None,

speculative_model_dir: str | Path | None = None,

max_concurrency: int | None = None,

)[source]#

Bases: DecodingBaseConfig

field max_concurrency: int | None = None#

field max_draft_len: int | None = None#

field speculative_model_dir: str | Path | None = None#

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( context: Any, /, ) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

supports_backend(backend: str) → bool[source]#: Override if the speculation algorithm does not support a subset of the possible backends.

classmethod update_forward_refs(

**localns: Any,

) → None#

validate() → None#: Do any additional error checking here.

decoding_type: ClassVar[str] = 'Draft_Target'#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'max_concurrency': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_draft_len': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'speculative_model_dir': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None)}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

property spec_dec_mode#

tensorrt_llm.llmapi.LlmArgs#

alias of TorchLlmArgs

class tensorrt_llm.llmapi.TorchLlmArgs(

*,

model: str | ~pathlib.Path,

tokenizer: str | ~pathlib.Path | ~transformers.tokenization_utils_base.PreTrainedTokenizerBase | ~tensorrt_llm.llmapi.tokenizer.TokenizerBase | None = None,

tokenizer_mode: ~typing.Literal['auto',

'slow'] = 'auto',

skip_tokenizer_init: bool = False,

trust_remote_code: bool = False,

tensor_parallel_size: int = 1,

dtype: str = 'auto',

revision: str | None = None,

tokenizer_revision: str | None = None,

pipeline_parallel_size: int = 1,

context_parallel_size: int = 1,

gpus_per_node: int | None = None,

moe_cluster_parallel_size: int | None = None,

moe_tensor_parallel_size: int | None = None,

moe_expert_parallel_size: int | None = None,

enable_attention_dp: bool = False,

cp_config: dict | None = <factory>,

load_format: str | ~tensorrt_llm.llmapi.llm_args.LoadFormat = LoadFormat.AUTO,

fail_fast_on_attention_window_too_large: bool = False,

enable_lora: bool = False,

lora_config: ~tensorrt_llm.lora_helper.LoraConfig | None = None,

kv_cache_config: ~tensorrt_llm.llmapi.llm_args.KvCacheConfig = <factory>,

enable_chunked_prefill: bool = False,

guided_decoding_backend: ~typing.Literal['xgrammar',

'llguidance'] | None = None,

batched_logits_processor: object | None = None,

iter_stats_max_iterations: int | None = None,

request_stats_max_iterations: int | None = None,

peft_cache_config: ~tensorrt_llm.llmapi.llm_args.PeftCacheConfig | None = None,

scheduler_config: ~tensorrt_llm.llmapi.llm_args.SchedulerConfig = <factory>,

cache_transceiver_config: ~tensorrt_llm.llmapi.llm_args.CacheTransceiverConfig | None = None,

speculative_config: ~tensorrt_llm.llmapi.llm_args.DraftTargetDecodingConfig | ~tensorrt_llm.llmapi.llm_args.EagleDecodingConfig | ~tensorrt_llm.llmapi.llm_args.LookaheadDecodingConfig | ~tensorrt_llm.llmapi.llm_args.MedusaDecodingConfig | ~tensorrt_llm.llmapi.llm_args.MTPDecodingConfig | ~tensorrt_llm.llmapi.llm_args.NGramDecodingConfig | ~tensorrt_llm.llmapi.llm_args.UserProvidedDecodingConfig | ~tensorrt_llm.llmapi.llm_args.AutoDecodingConfig | None = None,

max_batch_size: int | None = None,

max_input_len: int | None = None,

max_seq_len: int | None = None,

max_beam_width: int | None = None,

max_num_tokens: int | None = None,

gather_generation_logits: bool = False,

num_postprocess_workers: int = 0,

postprocess_tokenizer_dir: str | None = None,

reasoning_parser: str | None = None,

decoding_config: object | None = None,

_mpi_session: object | None = None,

backend: str | None = None,

return_perf_metrics: bool = False,

build_config: object | None = None,

garbage_collection_gen0_threshold: int = 20000,

cuda_graph_config: ~tensorrt_llm.llmapi.llm_args.CudaGraphConfig | None = <factory>,

attention_dp_config: ~tensorrt_llm.llmapi.llm_args.AttentionDpConfig | None = None,

disable_overlap_scheduler: bool = False,

moe_config: ~tensorrt_llm.llmapi.llm_args.MoeConfig = <factory>,

attn_backend: str = 'TRTLLM',

enable_mixed_sampler: bool = False,

sampler_type: str | ~tensorrt_llm.llmapi.llm_args.SamplerType = SamplerType.auto,

enable_iter_perf_stats: bool = False,

enable_iter_req_stats: bool = False,

print_iter_log: bool = False,

batch_wait_timeout_ms: float = 0,

torch_compile_config: ~tensorrt_llm.llmapi.llm_args.TorchCompileConfig | None = None,

enable_autotuner: bool = True,

enable_layerwise_nvtx_marker: bool = False,

enable_min_latency: bool = False,

stream_interval: int = 1,

force_dynamic_quantization: bool = False,

allreduce_strategy: ~typing.Literal['AUTO',

'NCCL',

'UB',

'MINLATENCY',

'ONESHOT',

'TWOSHOT',

'LOWPRECISION',

'MNNVL',

'NCCL_SYMMETRIC'] | None = 'AUTO',

checkpoint_loader: object | None = None,

checkpoint_format: str | None = None,

)[source]#

Bases: BaseLlmArgs

field allreduce_strategy: Literal['AUTO', 'NCCL', 'UB', 'MINLATENCY', 'ONESHOT', 'TWOSHOT', 'LOWPRECISION', 'MNNVL', 'NCCL_SYMMETRIC'] | None = 'AUTO'#: beta Allreduce strategy to use.

field attention_dp_config: AttentionDpConfig | None = None#: beta Optimized load-balancing for the DP Attention scheduler.

field attn_backend: str = 'TRTLLM'#: beta Attention backend to use.

field backend: str | None = None#: deprecated The backend to use for this LLM instance.

field batch_wait_timeout_ms: float = 0#: prototype If greater than 0, the request queue might wait up to batch_wait_timeout_ms to receive max_batch_size requests, if fewer than max_batch_size requests are currently available. If 0, no waiting occurs.

field batched_logits_processor: object | None = None#: Batched logits processor.

field build_config: object | None = None#: deprecated Build config.

field cache_transceiver_config: CacheTransceiverConfig | None = None#: prototype Cache transceiver config.

field checkpoint_format: str | None = None#: prototype The format of the provided checkpoint.

field checkpoint_loader: object | None = None#: prototype The checkpoint loader to use for this LLM instance.

field context_parallel_size: int = 1#: The context parallel size.

field cp_config: dict | None [Optional]#: prototype Context parallel config.

field cuda_graph_config: CudaGraphConfig | None [Optional]#: beta CUDA graph config.If true, use CUDA graphs for decoding. CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, and are enabled for batches that consist of decoding requests only (the reason is that it’s hard to capture a single graph with prefill requests since the input shapes are a function of the sequence lengths). Note that each CUDA graph can use up to 200 MB of extra memory.

field disable_overlap_scheduler: bool = False#: beta Disable the overlap scheduler.

field dtype: str = 'auto'#: The data type to use for the model.

field enable_attention_dp: bool = False#: beta Enable attention data parallel.

field enable_autotuner: bool = True#: prototype Enable autotuner only when torch compile is enabled.

field enable_chunked_prefill: bool = False#: Enable chunked prefill.

field enable_iter_perf_stats: bool = False#: prototype Enable iteration performance statistics.

field enable_iter_req_stats: bool = False#: prototype If true, enables per request stats per iteration. Must also set enable_iter_perf_stats to true to get request stats.

field enable_layerwise_nvtx_marker: bool = False#: beta If true, enable layerwise nvtx marker.

field enable_lora: bool = False#: Enable LoRA.

field enable_min_latency: bool = False#: beta If true, enable min-latency mode. Currently only used for Llama4.

field enable_mixed_sampler: bool = False#: beta If true, will iterate over sampling_params of each request and use the corresponding sampling strategy, e.g. top-k, top-p, etc.

field fail_fast_on_attention_window_too_large: bool = False#: Fail fast when attention window is too large to fit even a single sequence in the KV cache.

field force_dynamic_quantization: bool = False#: prototype If true, force dynamic quantization. Defaults to False.

field garbage_collection_gen0_threshold: int = 20000#: beta Threshold for Python garbage collection of generation 0 objects.Lower values trigger more frequent garbage collection.

field gather_generation_logits: bool = False#: prototype Gather generation logits.

field gpus_per_node: int | None = None#: beta The number of GPUs per node.

field guided_decoding_backend: Literal['xgrammar', 'llguidance'] | None = None#: Guided decoding backend. llguidance is supported in PyTorch backend only.

field iter_stats_max_iterations: int | None = None#: prototype The maximum number of iterations for iter stats.

field kv_cache_config: KvCacheConfig [Optional]#: KV cache config.

field load_format: str | LoadFormat = LoadFormat.AUTO#: How to load the model weights. By default, detect the weight type from the model checkpoint.

field lora_config: LoraConfig | None = None#: LoRA configuration for the model.

field max_batch_size: int | None = None#: The maximum batch size.

field max_beam_width: int | None = None#: The maximum beam width.

field max_input_len: int | None = None#: The maximum input length.

field max_num_tokens: int | None = None#: The maximum number of tokens.

field max_seq_len: int | None = None#: The maximum sequence length.

field model: str | Path [Required]#: The path to the model checkpoint or the model name from the Hugging Face Hub.

field moe_cluster_parallel_size: int | None = None#: beta The cluster parallel size for MoE models’s expert weights.

field moe_config: MoeConfig [Optional]#: beta MoE config.

field moe_expert_parallel_size: int | None = None#: The expert parallel size for MoE models’s expert weights.

field moe_tensor_parallel_size: int | None = None#: The tensor parallel size for MoE models’s expert weights.

field mpi_session: object | None = None (alias '_mpi_session')#: The optional MPI session to use for this LLM instance.

field num_postprocess_workers: int = 0#: prototype The number of processes used for postprocessing the generated tokens, including detokenization.

field peft_cache_config: PeftCacheConfig | None = None#: prototype PEFT cache config.

field pipeline_parallel_size: int = 1#: The pipeline parallel size.

field postprocess_tokenizer_dir: str | None = None#: prototype The path to the tokenizer directory for postprocessing.

field print_iter_log: bool = False#: beta Print iteration logs.

field reasoning_parser: str | None = None#: prototype The parser to separate reasoning content from output.

field request_stats_max_iterations: int | None = None#: prototype The maximum number of iterations for request stats.

field return_perf_metrics: bool = False#: prototype Return perf metrics.

field revision: str | None = None#: The revision to use for the model.

field sampler_type: str | SamplerType = SamplerType.auto#: prototype The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless BeamSearch is requested.

field scheduler_config: SchedulerConfig [Optional]#: prototype Scheduler config.

field skip_tokenizer_init: bool = False#: Whether to skip the tokenizer initialization.

field speculative_config: SpeculativeConfig = None#: Speculative decoding config.

field stream_interval: int = 1#: The iteration interval to create responses under the streaming mode. Set this to a larger value when the batch size is large, which helps reduce the streaming overhead.

field tensor_parallel_size: int = 1#: The tensor parallel size.

field tokenizer: str | Path | TokenizerBase | PreTrainedTokenizerBase | None = None#: The path to the tokenizer checkpoint or the tokenizer name from the Hugging Face Hub.

field tokenizer_mode: Literal['auto', 'slow'] = 'auto'#: The mode to initialize the tokenizer.

field tokenizer_revision: str | None = None#: The revision to use for the tokenizer.

field torch_compile_config: TorchCompileConfig | None = None#: prototype Torch compile config.

field trust_remote_code: bool = False#: Whether to trust the remote code.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

validator convert_load_format » load_format[source]#

classmethod from_kwargs(

**kwargs: Any,

) → BaseLlmArgs#

Create LlmArgs instance from kwargs.

Parameters:: kwargs (Any) – Arguments passed to LlmArgs constructor.
Returns:: The BaseLlmArgs instance.
Return type:: tensorrt_llm.llmapi.llm_utils.BaseLlmArgs

get_pytorch_backend_config() → PyTorchConfig[source]#

validator init_backend » backend[source]#

validator init_build_config » all fields#: Creating a default BuildConfig if none is provided

validator set_default_max_input_len » all fields#

validator set_runtime_knobs_from_build_config » all fields#

validator sync_quant_config_with_kv_cache_config_dtype » all fields[source]#

to_dict() → dict#

Dump LlmArgs instance to a dict.

Returns:: The dict that contains all fields of the LlmArgs instance.
Return type:: dict

validator validate_and_init_tokenizer » all fields#: Initialize tokenizer based on configuration.

validator validate_attention_dp_config » all fields[source]#

Validate attention DP configuration.

Ensures that: 1. If attention_dp_config.enable_balance is true, attention_dp_config.batching_wait_iters must be greater or equal to 0 2. If attention_dp_config.enable_balance is true, attention_dp_config.timeout_iters must be greater or equal to 0

validator validate_batch_wait_timeout_ms » all fields[source]#: Validate batch wait timeout.

validator validate_build_config_remaining » all fields#

validator validate_build_config_with_runtime_params » all fields#

validator validate_checkpoint_format » all fields[source]#

validator validate_cuda_graph_config » all fields[source]#

Validate CUDA graph configuration.

Ensures that: 1. If cuda_graph_config.batch_sizes is provided, cuda_graph_config.max_batch_size must be 0 2. If cuda_graph_config.batch_sizes is not provided, it is generated based on cuda_graph_config.max_batch_size 3. If both are provided, cuda_graph_config.batch_sizes must match the generated values

validator validate_dtype » dtype#

validator validate_gpus_per_node » gpus_per_node#

validator validate_load_balancer » all fields[source]#

validator validate_lora_config_consistency » all fields#

validator validate_model » model#

validator validate_model_format_misc » all fields#

Load the model format, and do the following:

Load the build_config if got an engine.
Load the parallel_config if got a checkpoint.

validator validate_parallel_config » all fields#

validator validate_peft_cache_config » all fields#

validator validate_runtime_args » all fields#

validator validate_speculative_config » all fields#

validator validate_stream_interval » all fields[source]#

warn_on_unstable_feature_usage() → TorchLlmArgs[source]#: Warn on unstable feature usage.

decoding_config: object | None#

Read-only data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

msg#: The deprecation message to be emitted.

wrapped_property#: The property instance if the deprecated field is a computed field, or None.

field_name#: The name of the field being deprecated.

property extra_resource_managers: Dict[str, object]#

property model_format: _ModelFormatKind#

property parallel_config: _ParallelConfig#

property quant_config: QuantConfig#

property speculative_model_dir: _ModelFormatKind | None#

property speculative_model_format: _ModelFormatKind#

class tensorrt_llm.llmapi.TrtLlmArgs(

*,

model: str | ~pathlib.Path,

tokenizer: str | ~pathlib.Path | ~transformers.tokenization_utils_base.PreTrainedTokenizerBase | ~tensorrt_llm.llmapi.tokenizer.TokenizerBase | None = None,

tokenizer_mode: ~typing.Literal['auto',

'slow'] = 'auto',

skip_tokenizer_init: bool = False,

trust_remote_code: bool = False,

tensor_parallel_size: int = 1,

dtype: str = 'auto',

revision: str | None = None,

tokenizer_revision: str | None = None,

pipeline_parallel_size: int = 1,

context_parallel_size: int = 1,

gpus_per_node: int | None = None,

moe_cluster_parallel_size: int | None = None,

moe_tensor_parallel_size: int | None = None,

moe_expert_parallel_size: int | None = None,

enable_attention_dp: bool = False,

cp_config: dict | None = <factory>,

load_format: ~typing.Literal['auto',

'dummy'] = 'auto',

fail_fast_on_attention_window_too_large: bool = False,

enable_lora: bool = False,

lora_config: ~tensorrt_llm.lora_helper.LoraConfig | None = None,

kv_cache_config: ~tensorrt_llm.llmapi.llm_args.KvCacheConfig = <factory>,

enable_chunked_prefill: bool = False,

guided_decoding_backend: ~typing.Literal['xgrammar',

'llguidance'] | None = None,

batched_logits_processor: object | None = None,

iter_stats_max_iterations: int | None = None,

request_stats_max_iterations: int | None = None,

peft_cache_config: ~tensorrt_llm.llmapi.llm_args.PeftCacheConfig | None = None,

scheduler_config: ~tensorrt_llm.llmapi.llm_args.SchedulerConfig = <factory>,

cache_transceiver_config: ~tensorrt_llm.llmapi.llm_args.CacheTransceiverConfig | None = None,

speculative_config: ~tensorrt_llm.llmapi.llm_args.DraftTargetDecodingConfig | ~tensorrt_llm.llmapi.llm_args.EagleDecodingConfig | ~tensorrt_llm.llmapi.llm_args.LookaheadDecodingConfig | ~tensorrt_llm.llmapi.llm_args.MedusaDecodingConfig | ~tensorrt_llm.llmapi.llm_args.MTPDecodingConfig | ~tensorrt_llm.llmapi.llm_args.NGramDecodingConfig | ~tensorrt_llm.llmapi.llm_args.UserProvidedDecodingConfig | ~tensorrt_llm.llmapi.llm_args.AutoDecodingConfig | None = None,

max_batch_size: int | None = None,

max_input_len: int | None = None,

max_seq_len: int | None = None,

max_beam_width: int | None = None,

max_num_tokens: int | None = None,

gather_generation_logits: bool = False,

num_postprocess_workers: int = 0,

postprocess_tokenizer_dir: str | None = None,

reasoning_parser: str | None = None,

decoding_config: object | None = None,

_mpi_session: object | None = None,

backend: str | None = None,

return_perf_metrics: bool = False,

auto_parallel: bool = False,

auto_parallel_world_size: int | None = None,

enable_tqdm: bool = False,

workspace: str | None = None,

enable_build_cache: object = False,

extended_runtime_perf_knob_config: ~tensorrt_llm.llmapi.llm_args.ExtendedRuntimePerfKnobConfig | None = None,

calib_config: ~tensorrt_llm.llmapi.llm_args.CalibConfig | None = None,

quant_config: ~tensorrt_llm.models.modeling_utils.QuantConfig | None = None,

embedding_parallel_mode: str = 'SHARDING_ALONG_VOCAB',

fast_build: bool = False,

build_config: object | None = None,

enable_prompt_adapter: bool = False,

max_prompt_adapter_token: int = 0,

batching_type: ~tensorrt_llm.llmapi.llm_args.BatchingType | None = None,

normalize_log_probs: bool = False,

)[source]#

Bases: BaseLlmArgs

field backend: str | None = None#: The backend to use for this LLM instance.

field batched_logits_processor: object | None = None#: Batched logits processor.

field batching_type: BatchingType | None = None#: Batching type.

field build_config: object | None = None#: Build config.

field cache_transceiver_config: CacheTransceiverConfig | None = None#: Cache transceiver config.

field calib_config: CalibConfig | None = None#: Calibration config.

field context_parallel_size: int = 1#: The context parallel size.

field cp_config: dict | None [Optional]#: Context parallel config.

field dtype: str = 'auto'#: The data type to use for the model.

field embedding_parallel_mode: str = 'SHARDING_ALONG_VOCAB'#: The embedding parallel mode.

field enable_attention_dp: bool = False#: Enable attention data parallel.

field enable_build_cache: object = False#: Enable build cache.

field enable_chunked_prefill: bool = False#: Enable chunked prefill.

field enable_lora: bool = False#: Enable LoRA.

field enable_prompt_adapter: bool = False#: Enable prompt adapter.

field enable_tqdm: bool = False#: Enable tqdm for progress bar.

field extended_runtime_perf_knob_config: ExtendedRuntimePerfKnobConfig | None = None#: Extended runtime perf knob config.

field fail_fast_on_attention_window_too_large: bool = False#: Fail fast when attention window is too large to fit even a single sequence in the KV cache.

field fast_build: bool = False#: Enable fast build.

field gather_generation_logits: bool = False#: Gather generation logits.

field gpus_per_node: int | None = None#: The number of GPUs per node.

field guided_decoding_backend: Literal['xgrammar', 'llguidance'] | None = None#: Guided decoding backend. llguidance is supported in PyTorch backend only.

field iter_stats_max_iterations: int | None = None#: The maximum number of iterations for iter stats.

field kv_cache_config: KvCacheConfig [Optional]#: KV cache config.

field load_format: Literal['auto', 'dummy'] = 'auto'#: The format to load the model.

field lora_config: LoraConfig | None = None#: LoRA configuration for the model.

field max_batch_size: int | None = None#: The maximum batch size.

field max_beam_width: int | None = None#: The maximum beam width.

field max_input_len: int | None = None#: The maximum input length.

field max_num_tokens: int | None = None#: The maximum number of tokens.

field max_prompt_adapter_token: int = 0#: The maximum number of prompt adapter tokens.

field max_seq_len: int | None = None#: The maximum sequence length.

field model: str | Path [Required]#: The path to the model checkpoint or the model name from the Hugging Face Hub.

field moe_cluster_parallel_size: int | None = None#: The cluster parallel size for MoE models’s expert weights.

field moe_expert_parallel_size: int | None = None#: The expert parallel size for MoE models’s expert weights.

field moe_tensor_parallel_size: int | None = None#: The tensor parallel size for MoE models’s expert weights.

field mpi_session: object | None = None (alias '_mpi_session')#: The optional MPI session to use for this LLM instance.

field normalize_log_probs: bool = False#: Normalize log probabilities.

field num_postprocess_workers: int = 0#: The number of processes used for postprocessing the generated tokens, including detokenization.

field peft_cache_config: PeftCacheConfig | None = None#: PEFT cache config.

field pipeline_parallel_size: int = 1#: The pipeline parallel size.

field postprocess_tokenizer_dir: str | None = None#: The path to the tokenizer directory for postprocessing.

field quant_config: QuantConfig | None = None#: Quantization config.

field reasoning_parser: str | None = None#: The parser to separate reasoning content from output.

field request_stats_max_iterations: int | None = None#: The maximum number of iterations for request stats.

field return_perf_metrics: bool = False#: Return perf metrics.

field revision: str | None = None#: The revision to use for the model.

field scheduler_config: SchedulerConfig [Optional]#: Scheduler config.

field skip_tokenizer_init: bool = False#: Whether to skip the tokenizer initialization.

field speculative_config: SpeculativeConfig = None#: Speculative decoding config.

field tensor_parallel_size: int = 1#: The tensor parallel size.

field tokenizer: str | Path | TokenizerBase | PreTrainedTokenizerBase | None = None#: The path to the tokenizer checkpoint or the tokenizer name from the Hugging Face Hub.

field tokenizer_mode: Literal['auto', 'slow'] = 'auto'#: The mode to initialize the tokenizer.

field tokenizer_revision: str | None = None#: The revision to use for the tokenizer.

field trust_remote_code: bool = False#: Whether to trust the remote code.

field workspace: str | None = None#: The workspace for the model.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod from_kwargs(

**kwargs: Any,

) → BaseLlmArgs#

Create LlmArgs instance from kwargs.

Parameters:: kwargs (Any) – Arguments passed to LlmArgs constructor.
Returns:: The BaseLlmArgs instance.
Return type:: tensorrt_llm.llmapi.llm_utils.BaseLlmArgs

validator init_build_config » all fields#: Creating a default BuildConfig if none is provided

validator init_calib_config » calib_config[source]#

validator set_default_max_input_len » all fields#

validator set_runtime_knobs_from_build_config » all fields#

validator setup_embedding_parallel_mode » all fields[source]#

to_dict() → dict#

Dump LlmArgs instance to a dict.

Returns:: The dict that contains all fields of the LlmArgs instance.
Return type:: dict

validator validate_and_init_tokenizer » all fields#: Initialize tokenizer based on configuration.

validator validate_auto_parallel » all fields[source]#

validator validate_build_config_remaining » all fields#

validator validate_build_config_with_runtime_params » all fields#

validator validate_dtype » dtype#

validator validate_enable_build_cache » all fields[source]#

validator validate_gpus_per_node » gpus_per_node#

validator validate_kv_cache_dtype » all fields[source]#

validator validate_lora_config_consistency » all fields#

validator validate_model » model#

validator validate_model_format_misc » all fields#

Load the model format, and do the following:

Load the build_config if got an engine.
Load the parallel_config if got a checkpoint.

validator validate_parallel_config » all fields#

validator validate_peft_cache_config » all fields#

validator validate_quant_config » quant_config[source]#

validator validate_runtime_args » all fields#

validator validate_speculative_config » all fields#

auto_parallel: bool#

Read-only data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

msg#: The deprecation message to be emitted.

wrapped_property#: The property instance if the deprecated field is a computed field, or None.

field_name#: The name of the field being deprecated.

property auto_parallel_config: AutoParallelConfig#

auto_parallel_world_size: int | None#

Read-only data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

msg#: The deprecation message to be emitted.

wrapped_property#: The property instance if the deprecated field is a computed field, or None.

field_name#: The name of the field being deprecated.

decoding_config: object | None#

Read-only data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

msg#: The deprecation message to be emitted.

wrapped_property#: The property instance if the deprecated field is a computed field, or None.

field_name#: The name of the field being deprecated.

property model_format: _ModelFormatKind#

property parallel_config: _ParallelConfig#

property speculative_model_dir: _ModelFormatKind | None#

property speculative_model_format: _ModelFormatKind#

class tensorrt_llm.llmapi.AutoDecodingConfig(

*,

max_draft_len: int | None = None,

speculative_model_dir: str | Path | None = None,

max_concurrency: int | None = None,

)[source]#

Bases: DecodingBaseConfig

Configuration for auto speculative decoding.

This config will automatically select a good, draft-model free speculation algorithm with some heuristic.

Attributes that are inherited from the base class are ignored.

field max_concurrency: int | None = None#

field max_draft_len: int | None = None#

field speculative_model_dir: str | Path | None = None#

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

supports_backend(backend: str) → bool[source]#: Override if the speculation algorithm does not support a subset of the possible backends.

classmethod update_forward_refs(**localns: Any) → None#

validate() → None#: Do any additional error checking here.

decoding_type: ClassVar[str] = 'AUTO'#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'max_concurrency': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'max_draft_len': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'speculative_model_dir': FieldInfo(annotation=Union[str, Path, NoneType], required=False, default=None)}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

property spec_dec_mode#

class tensorrt_llm.llmapi.AttentionDpConfig(

*,

enable_balance: bool = False,

timeout_iters: int = 50,

batching_wait_iters: int = 10,

)[source]#

Bases: StrictBaseModel

Configuration for attention DP.

field batching_wait_iters: int = 10#: The number of iterations to wait for batching.

field enable_balance: bool = False#: Whether to enable balance.

field timeout_iters: int = 50#: The number of iterations to timeout.

class Config#

Bases: object

extra = 'forbid'#

__init__(**data: Any) → None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

classmethod construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; use model_copy instead.

If you need include or exclude, use:

`python {test="skip" lint="skip"} data = self.model_dump(include=include, exclude=exclude, round_trip=True) data = {**data, **(update or {})} copied = self.model_validate(data) `

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None, by_alias: bool = False, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, ) → Dict[str, Any]#

classmethod from_dict(data: dict)[source]#

classmethod from_orm(obj: Any) → Self#

json(

*,

include: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

exclude: set[int] | set[str] | Mapping[int, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | Mapping[str, set[int] | set[str] | Mapping[int, IncEx | bool] | Mapping[str, IncEx | bool] | bool] | None = None,

by_alias: bool = False,

exclude_unset: bool = False,

exclude_defaults: bool = False,

exclude_none: bool = False,

encoder: Callable[[Any], Any] | None = PydanticUndefined,

models_as_dict: bool = PydanticUndefined,

**dumps_kwargs: Any,

) → str#

classmethod model_construct(

_fields_set: set[str] | None = None,

**values: Any,

) → Self#

Creates a new instance of the Model class with validated data.

Creates a new model setting __dict__ and __pydantic_fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects the model_config.extra setting on the provided model. That is, if model_config.extra == ‘allow’, then all extra passed values are added to the model instance’s __dict__ and __pydantic_extra__ fields. If model_config.extra == ‘ignore’ (the default), then all extra passed values are ignored. Because no validation is performed with a call to model_construct(), having model_config.extra == ‘forbid’ does not result in an error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided, this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute. Otherwise, the field names from the values argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of the Model class with validated data.

model_copy( *, update: Mapping[str, Any] | None = None, deep: bool = False, ) → Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. This might have unexpected side effects if you store anything in it, on top of the model fields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validated before creating the new model. You should trust this data.
deep – Set to True to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in which to_python should run. If mode is ‘json’, the output will only contain JSON serializable types. If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’s to_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value of None.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors, “error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided, a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethod model_json_schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', schema_generator: type[~pydantic.json_schema.GenerateJsonSchema] = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: ~typing.Literal['validation', 'serialization'] = 'validation', ) → dict[str, Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass of GenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethod model_parametrized_name( params: tuple[type[Any], ...], ) → str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic class Model with 2 type variables and a concrete model Model[str, int], the value (str, int) would be passed to params.
Returns:: String representing the new class where params are passed to cls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context: Any, /) → None#: Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

classmethod model_rebuild( *, force: bool = False, raise_errors: bool = True, _parent_namespace_depth: int = 2, _types_namespace: MappingNamespace | None = None, ) → bool | None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved during the initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults to False.
raise_errors – Whether to raise errors, defaults to True.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults to None.

Returns:

Returns None if the schema is already “complete” and rebuilding was not required. If rebuilding _was_ required, returns True if rebuilding was successful, otherwise False.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – If json_data is not a JSON string or the object could not be validated.

classmethod model_validate_strings( obj: Any, *, strict: bool | None = None, context: Any | None = None, by_alias: bool | None = None, by_name: bool | None = None, ) → Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethod parse_file( path: str | Path, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod parse_obj(obj: Any) → Self#

classmethod parse_raw( b: str | bytes, *, content_type: str | None = None, encoding: str = 'utf8', proto: DeprecatedParseProtocol | None = None, allow_pickle: bool = False, ) → Self#

classmethod schema( by_alias: bool = True, ref_template: str = '#/$defs/{model}', ) → Dict[str, Any]#

classmethod schema_json(

*,

by_alias: bool = True,

ref_template: str = '#/$defs/{model}',

**dumps_kwargs: Any,

) → str#

classmethod update_forward_refs(**localns: Any) → None#

classmethod validate(value: Any) → Self#

model_computed_fields = {}#

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property model_extra: dict[str, Any] | None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, or None if config.extra is not set to “allow”.

model_fields = {'batching_wait_iters': FieldInfo(annotation=int, required=False, default=10, description='The number of iterations to wait for batching.'), 'enable_balance': FieldInfo(annotation=bool, required=False, default=False, description='Whether to enable balance.'), 'timeout_iters': FieldInfo(annotation=int, required=False, default=50, description='The number of iterations to timeout.')}#

property model_fields_set: set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

API Reference#