Plugin

class tensorrt_llm.plugin.PluginConfig(bert_attention_plugin: str = 'float16', gpt_attention_plugin: str = 'float16', gemm_plugin: str = None, smooth_quant_gemm_plugin: str = None, identity_plugin: str = None, layernorm_quantization_plugin: str = None, rmsnorm_quantization_plugin: str = None, nccl_plugin: str = 'float16', lookup_plugin: str = None, lora_plugin: str = None, weight_only_groupwise_quant_matmul_plugin: str = None, weight_only_quant_matmul_plugin: str = None, quantize_per_token_plugin: bool = False, quantize_tensor_plugin: bool = False, moe_plugin: str = 'float16', mamba_conv1d_plugin: str = 'float16', context_fmha: bool = True, context_fmha_fp32_acc: bool = False, paged_kv_cache: bool = True, remove_input_padding: bool = True, use_custom_all_reduce: bool = True, multi_block_mode: bool = False, enable_xqa: bool = True, attention_qk_half_accumulation: bool = False, tokens_per_block: int = 128, use_paged_context_fmha: bool = False, use_fp8_context_fmha: bool = False, use_context_fmha_for_generation: bool = False, multiple_profiles: bool = False, paged_state: bool = True, streamingllm: bool = False)[source]

Bases: object

to_legacy_setting()[source]

Legacy setting means that all of the plugins and features are disabled, this needed for the legacy build.py script, which will be migrated to the centralized building script tensorrt_llm/commands/build.py.

After the migration is done, this function may or may not be deleted.