Model Recipes#

Model-Specific Deployment Guides#

The deployment guides below provide more detailed instructions for serving specific models with TensorRT LLM.

Comprehensive Configuration Database#

The table below lists all available pre-configured model scenarios in the TensorRT LLM configuration database. Each row represents a specific model, GPU, and performance profile combination with recommended request settings.

Note

Traffic Patterns: The ISL (Input Sequence Length) and OSL (Output Sequence Length) values in each configuration represent the maximum supported values for that config. Requests exceeding these limits may result in errors.

To handle requests with input sequences longer than the configured ISL, add the following to your config file:

enable_chunked_prefill: true

This enables chunked prefill, which processes long input sequences in chunks rather than requiring them to fit within a single prefill operation. Note that enabling chunked prefill does not guarantee optimal performance—these configs are tuned for the specified ISL/OSL.

DeepSeek-R1#

GPU

Performance Profile

ISL / OSL

Concurrency

Config

Command

8xB200_NVL

Min Latency

1024 / 1024

4

1k1k_tp8_conc4.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/1k1k_tp8_conc4.yaml

8xB200_NVL

Low Latency

1024 / 1024

8

1k1k_tp8_conc8.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/1k1k_tp8_conc8.yaml

8xB200_NVL

Balanced

1024 / 1024

16

1k1k_tp8_conc16.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/1k1k_tp8_conc16.yaml

8xB200_NVL

High Throughput

1024 / 1024

32

1k1k_tp8_conc32.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/1k1k_tp8_conc32.yaml

8xB200_NVL

Max Throughput

1024 / 1024

64

1k1k_tp8_conc64.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/1k1k_tp8_conc64.yaml

8xB200_NVL

Min Latency

8192 / 1024

4

8k1k_tp8_conc4.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/8k1k_tp8_conc4.yaml

8xB200_NVL

Low Latency

8192 / 1024

8

8k1k_tp8_conc8.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/8k1k_tp8_conc8.yaml

8xB200_NVL

Balanced

8192 / 1024

16

8k1k_tp8_conc16.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/8k1k_tp8_conc16.yaml

8xB200_NVL

High Throughput

8192 / 1024

32

8k1k_tp8_conc32.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/8k1k_tp8_conc32.yaml

8xB200_NVL

Max Throughput

8192 / 1024

64

8k1k_tp8_conc64.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/B200/8k1k_tp8_conc64.yaml

8xH200_SXM

Min Latency

1024 / 1024

4

1k1k_tp8_conc4.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/1k1k_tp8_conc4.yaml

8xH200_SXM

Low Latency

1024 / 1024

8

1k1k_tp8_conc8.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/1k1k_tp8_conc8.yaml

8xH200_SXM

Balanced

1024 / 1024

16

1k1k_tp8_conc16.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/1k1k_tp8_conc16.yaml

8xH200_SXM

High Throughput

1024 / 1024

32

1k1k_tp8_conc32.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/1k1k_tp8_conc32.yaml

8xH200_SXM

Max Throughput

1024 / 1024

64

1k1k_tp8_conc64.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/1k1k_tp8_conc64.yaml

8xH200_SXM

Min Latency

8192 / 1024

4

8k1k_tp8_conc4.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/8k1k_tp8_conc4.yaml

8xH200_SXM

Low Latency

8192 / 1024

8

8k1k_tp8_conc8.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/8k1k_tp8_conc8.yaml

8xH200_SXM

Balanced

8192 / 1024

16

8k1k_tp8_conc16.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/8k1k_tp8_conc16.yaml

8xH200_SXM

High Throughput

8192 / 1024

32

8k1k_tp8_conc32.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/8k1k_tp8_conc32.yaml

8xH200_SXM

Max Throughput

8192 / 1024

64

8k1k_tp8_conc64.yaml

trtllm-serve deepseek-ai/DeepSeek-R1-0528 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/deepseek-ai/DeepSeek-R1-0528/H200/8k1k_tp8_conc64.yaml

DeepSeek-R1 (NVFP4)#

GPU

Performance Profile

ISL / OSL

Concurrency

Config

Command

4xB200_NVL

Min Latency

1024 / 1024

4

1k1k_tp4_conc4.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp4_conc4.yaml

8xB200_NVL

Low Latency

1024 / 1024

4

1k1k_tp8_conc4.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp8_conc4.yaml

4xB200_NVL

Low Latency

1024 / 1024

8

1k1k_tp4_conc8.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp4_conc8.yaml

8xB200_NVL

Low Latency

1024 / 1024

8

1k1k_tp8_conc8.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp8_conc8.yaml

4xB200_NVL

Low Latency

1024 / 1024

16

1k1k_tp4_conc16.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp4_conc16.yaml

8xB200_NVL

Low Latency

1024 / 1024

16

1k1k_tp8_conc16.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp8_conc16.yaml

4xB200_NVL

Low Latency

1024 / 1024

32

1k1k_tp4_conc32.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp4_conc32.yaml

8xB200_NVL

High Throughput

1024 / 1024

32

1k1k_tp8_conc32.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp8_conc32.yaml

4xB200_NVL

High Throughput

1024 / 1024

64

1k1k_tp4_conc64.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp4_conc64.yaml

8xB200_NVL

High Throughput

1024 / 1024

64

1k1k_tp8_conc64.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp8_conc64.yaml

4xB200_NVL

High Throughput

1024 / 1024

128

1k1k_tp4_conc128.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp4_conc128.yaml

8xB200_NVL

High Throughput

1024 / 1024

128

1k1k_tp8_conc128.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp8_conc128.yaml

4xB200_NVL

High Throughput

1024 / 1024

256

1k1k_tp4_conc256.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp4_conc256.yaml

8xB200_NVL

Max Throughput

1024 / 1024

256

1k1k_tp8_conc256.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/1k1k_tp8_conc256.yaml

4xB200_NVL

Min Latency

8192 / 1024

4

8k1k_tp4_conc4.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp4_conc4.yaml

8xB200_NVL

Low Latency

8192 / 1024

4

8k1k_tp8_conc4.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp8_conc4.yaml

4xB200_NVL

Low Latency

8192 / 1024

8

8k1k_tp4_conc8.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp4_conc8.yaml

8xB200_NVL

Low Latency

8192 / 1024

8

8k1k_tp8_conc8.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp8_conc8.yaml

4xB200_NVL

Low Latency

8192 / 1024

16

8k1k_tp4_conc16.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp4_conc16.yaml

8xB200_NVL

Low Latency

8192 / 1024

16

8k1k_tp8_conc16.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp8_conc16.yaml

4xB200_NVL

Low Latency

8192 / 1024

32

8k1k_tp4_conc32.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp4_conc32.yaml

8xB200_NVL

High Throughput

8192 / 1024

32

8k1k_tp8_conc32.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp8_conc32.yaml

4xB200_NVL

High Throughput

8192 / 1024

64

8k1k_tp4_conc64.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp4_conc64.yaml

8xB200_NVL

High Throughput

8192 / 1024

64

8k1k_tp8_conc64.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp8_conc64.yaml

4xB200_NVL

High Throughput

8192 / 1024

128

8k1k_tp4_conc128.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp4_conc128.yaml

8xB200_NVL

High Throughput

8192 / 1024

128

8k1k_tp8_conc128.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp8_conc128.yaml

4xB200_NVL

High Throughput

8192 / 1024

256

8k1k_tp4_conc256.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp4_conc256.yaml

8xB200_NVL

Max Throughput

8192 / 1024

256

8k1k_tp8_conc256.yaml

trtllm-serve nvidia/DeepSeek-R1-0528-FP4-v2 --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/nvidia/DeepSeek-R1-0528-FP4-v2/B200/8k1k_tp8_conc256.yaml

gpt-oss-120b#

GPU

Performance Profile

ISL / OSL

Concurrency

Config

Command

B200_NVL

Min Latency

1024 / 1024

4

1k1k_tp1_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp1_conc4.yaml

2xB200_NVL

Low Latency

1024 / 1024

4

1k1k_tp2_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp2_conc4.yaml

4xB200_NVL

Low Latency

1024 / 1024

4

1k1k_tp4_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp4_conc4.yaml

8xB200_NVL

Low Latency

1024 / 1024

4

1k1k_tp8_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp8_conc4.yaml

B200_NVL

Low Latency

1024 / 1024

8

1k1k_tp1_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp1_conc8.yaml

2xB200_NVL

Low Latency

1024 / 1024

8

1k1k_tp2_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp2_conc8.yaml

4xB200_NVL

Low Latency

1024 / 1024

8

1k1k_tp4_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp4_conc8.yaml

8xB200_NVL

Low Latency

1024 / 1024

8

1k1k_tp8_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp8_conc8.yaml

B200_NVL

Low Latency

1024 / 1024

16

1k1k_tp1_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp1_conc16.yaml

2xB200_NVL

Low Latency

1024 / 1024

16

1k1k_tp2_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp2_conc16.yaml

4xB200_NVL

High Throughput

1024 / 1024

16

1k1k_tp4_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp4_conc16.yaml

8xB200_NVL

High Throughput

1024 / 1024

16

1k1k_tp8_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp8_conc16.yaml

B200_NVL

High Throughput

1024 / 1024

32

1k1k_tp1_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp1_conc32.yaml

2xB200_NVL

High Throughput

1024 / 1024

32

1k1k_tp2_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp2_conc32.yaml

4xB200_NVL

High Throughput

1024 / 1024

32

1k1k_tp4_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp4_conc32.yaml

8xB200_NVL

High Throughput

1024 / 1024

32

1k1k_tp8_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp8_conc32.yaml

B200_NVL

High Throughput

1024 / 1024

64

1k1k_tp1_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp1_conc64.yaml

2xB200_NVL

High Throughput

1024 / 1024

64

1k1k_tp2_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp2_conc64.yaml

4xB200_NVL

High Throughput

1024 / 1024

64

1k1k_tp4_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp4_conc64.yaml

8xB200_NVL

Max Throughput

1024 / 1024

64

1k1k_tp8_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k1k_tp8_conc64.yaml

B200_NVL

Min Latency

1024 / 8192

4

1k8k_tp1_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp1_conc4.yaml

2xB200_NVL

Low Latency

1024 / 8192

4

1k8k_tp2_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp2_conc4.yaml

4xB200_NVL

Low Latency

1024 / 8192

4

1k8k_tp4_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp4_conc4.yaml

8xB200_NVL

Low Latency

1024 / 8192

4

1k8k_tp8_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp8_conc4.yaml

B200_NVL

Low Latency

1024 / 8192

8

1k8k_tp1_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp1_conc8.yaml

2xB200_NVL

Low Latency

1024 / 8192

8

1k8k_tp2_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp2_conc8.yaml

4xB200_NVL

Low Latency

1024 / 8192

8

1k8k_tp4_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp4_conc8.yaml

8xB200_NVL

Low Latency

1024 / 8192

8

1k8k_tp8_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp8_conc8.yaml

B200_NVL

Low Latency

1024 / 8192

16

1k8k_tp1_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp1_conc16.yaml

2xB200_NVL

Low Latency

1024 / 8192

16

1k8k_tp2_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp2_conc16.yaml

4xB200_NVL

High Throughput

1024 / 8192

16

1k8k_tp4_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp4_conc16.yaml

8xB200_NVL

High Throughput

1024 / 8192

16

1k8k_tp8_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp8_conc16.yaml

B200_NVL

High Throughput

1024 / 8192

32

1k8k_tp1_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp1_conc32.yaml

2xB200_NVL

High Throughput

1024 / 8192

32

1k8k_tp2_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp2_conc32.yaml

4xB200_NVL

High Throughput

1024 / 8192

32

1k8k_tp4_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp4_conc32.yaml

8xB200_NVL

High Throughput

1024 / 8192

32

1k8k_tp8_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp8_conc32.yaml

B200_NVL

High Throughput

1024 / 8192

64

1k8k_tp1_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp1_conc64.yaml

2xB200_NVL

High Throughput

1024 / 8192

64

1k8k_tp2_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp2_conc64.yaml

4xB200_NVL

High Throughput

1024 / 8192

64

1k8k_tp4_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp4_conc64.yaml

8xB200_NVL

Max Throughput

1024 / 8192

64

1k8k_tp8_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/1k8k_tp8_conc64.yaml

B200_NVL

Min Latency

8192 / 1024

4

8k1k_tp1_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp1_conc4.yaml

2xB200_NVL

Low Latency

8192 / 1024

4

8k1k_tp2_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp2_conc4.yaml

4xB200_NVL

Low Latency

8192 / 1024

4

8k1k_tp4_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp4_conc4.yaml

8xB200_NVL

Low Latency

8192 / 1024

4

8k1k_tp8_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp8_conc4.yaml

B200_NVL

Low Latency

8192 / 1024

8

8k1k_tp1_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp1_conc8.yaml

2xB200_NVL

Low Latency

8192 / 1024

8

8k1k_tp2_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp2_conc8.yaml

4xB200_NVL

Low Latency

8192 / 1024

8

8k1k_tp4_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp4_conc8.yaml

8xB200_NVL

Low Latency

8192 / 1024

8

8k1k_tp8_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp8_conc8.yaml

B200_NVL

Low Latency

8192 / 1024

16

8k1k_tp1_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp1_conc16.yaml

2xB200_NVL

Low Latency

8192 / 1024

16

8k1k_tp2_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp2_conc16.yaml

4xB200_NVL

High Throughput

8192 / 1024

16

8k1k_tp4_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp4_conc16.yaml

8xB200_NVL

High Throughput

8192 / 1024

16

8k1k_tp8_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp8_conc16.yaml

B200_NVL

High Throughput

8192 / 1024

32

8k1k_tp1_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp1_conc32.yaml

2xB200_NVL

High Throughput

8192 / 1024

32

8k1k_tp2_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp2_conc32.yaml

4xB200_NVL

High Throughput

8192 / 1024

32

8k1k_tp4_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp4_conc32.yaml

8xB200_NVL

High Throughput

8192 / 1024

32

8k1k_tp8_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp8_conc32.yaml

B200_NVL

High Throughput

8192 / 1024

64

8k1k_tp1_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp1_conc64.yaml

2xB200_NVL

High Throughput

8192 / 1024

64

8k1k_tp2_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp2_conc64.yaml

4xB200_NVL

High Throughput

8192 / 1024

64

8k1k_tp4_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp4_conc64.yaml

8xB200_NVL

Max Throughput

8192 / 1024

64

8k1k_tp8_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/B200/8k1k_tp8_conc64.yaml

H200_SXM

Min Latency

1024 / 1024

4

1k1k_tp1_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp1_conc4.yaml

2xH200_SXM

Low Latency

1024 / 1024

4

1k1k_tp2_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp2_conc4.yaml

4xH200_SXM

Low Latency

1024 / 1024

4

1k1k_tp4_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp4_conc4.yaml

8xH200_SXM

Low Latency

1024 / 1024

4

1k1k_tp8_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp8_conc4.yaml

H200_SXM

Low Latency

1024 / 1024

8

1k1k_tp1_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp1_conc8.yaml

2xH200_SXM

Low Latency

1024 / 1024

8

1k1k_tp2_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp2_conc8.yaml

4xH200_SXM

Low Latency

1024 / 1024

8

1k1k_tp4_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp4_conc8.yaml

8xH200_SXM

Low Latency

1024 / 1024

8

1k1k_tp8_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp8_conc8.yaml

H200_SXM

Low Latency

1024 / 1024

16

1k1k_tp1_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp1_conc16.yaml

2xH200_SXM

Low Latency

1024 / 1024

16

1k1k_tp2_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp2_conc16.yaml

4xH200_SXM

High Throughput

1024 / 1024

16

1k1k_tp4_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp4_conc16.yaml

8xH200_SXM

High Throughput

1024 / 1024

16

1k1k_tp8_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp8_conc16.yaml

H200_SXM

High Throughput

1024 / 1024

32

1k1k_tp1_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp1_conc32.yaml

2xH200_SXM

High Throughput

1024 / 1024

32

1k1k_tp2_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp2_conc32.yaml

4xH200_SXM

High Throughput

1024 / 1024

32

1k1k_tp4_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp4_conc32.yaml

8xH200_SXM

High Throughput

1024 / 1024

32

1k1k_tp8_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp8_conc32.yaml

H200_SXM

High Throughput

1024 / 1024

64

1k1k_tp1_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp1_conc64.yaml

2xH200_SXM

High Throughput

1024 / 1024

64

1k1k_tp2_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp2_conc64.yaml

4xH200_SXM

High Throughput

1024 / 1024

64

1k1k_tp4_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp4_conc64.yaml

8xH200_SXM

Max Throughput

1024 / 1024

64

1k1k_tp8_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k1k_tp8_conc64.yaml

H200_SXM

Min Latency

1024 / 8192

4

1k8k_tp1_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp1_conc4.yaml

2xH200_SXM

Low Latency

1024 / 8192

4

1k8k_tp2_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp2_conc4.yaml

4xH200_SXM

Low Latency

1024 / 8192

4

1k8k_tp4_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp4_conc4.yaml

8xH200_SXM

Low Latency

1024 / 8192

4

1k8k_tp8_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp8_conc4.yaml

H200_SXM

Low Latency

1024 / 8192

8

1k8k_tp1_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp1_conc8.yaml

2xH200_SXM

Low Latency

1024 / 8192

8

1k8k_tp2_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp2_conc8.yaml

4xH200_SXM

Low Latency

1024 / 8192

8

1k8k_tp4_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp4_conc8.yaml

8xH200_SXM

Low Latency

1024 / 8192

8

1k8k_tp8_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp8_conc8.yaml

H200_SXM

Low Latency

1024 / 8192

16

1k8k_tp1_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp1_conc16.yaml

2xH200_SXM

Low Latency

1024 / 8192

16

1k8k_tp2_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp2_conc16.yaml

4xH200_SXM

High Throughput

1024 / 8192

16

1k8k_tp4_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp4_conc16.yaml

8xH200_SXM

High Throughput

1024 / 8192

16

1k8k_tp8_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp8_conc16.yaml

H200_SXM

High Throughput

1024 / 8192

32

1k8k_tp1_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp1_conc32.yaml

2xH200_SXM

High Throughput

1024 / 8192

32

1k8k_tp2_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp2_conc32.yaml

4xH200_SXM

High Throughput

1024 / 8192

32

1k8k_tp4_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp4_conc32.yaml

8xH200_SXM

High Throughput

1024 / 8192

32

1k8k_tp8_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp8_conc32.yaml

H200_SXM

High Throughput

1024 / 8192

64

1k8k_tp1_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp1_conc64.yaml

2xH200_SXM

High Throughput

1024 / 8192

64

1k8k_tp2_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp2_conc64.yaml

4xH200_SXM

High Throughput

1024 / 8192

64

1k8k_tp4_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp4_conc64.yaml

8xH200_SXM

Max Throughput

1024 / 8192

64

1k8k_tp8_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/1k8k_tp8_conc64.yaml

H200_SXM

Min Latency

8192 / 1024

4

8k1k_tp1_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp1_conc4.yaml

2xH200_SXM

Low Latency

8192 / 1024

4

8k1k_tp2_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp2_conc4.yaml

4xH200_SXM

Low Latency

8192 / 1024

4

8k1k_tp4_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp4_conc4.yaml

8xH200_SXM

Low Latency

8192 / 1024

4

8k1k_tp8_conc4.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp8_conc4.yaml

H200_SXM

Low Latency

8192 / 1024

8

8k1k_tp1_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp1_conc8.yaml

2xH200_SXM

Low Latency

8192 / 1024

8

8k1k_tp2_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp2_conc8.yaml

4xH200_SXM

Low Latency

8192 / 1024

8

8k1k_tp4_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp4_conc8.yaml

8xH200_SXM

Low Latency

8192 / 1024

8

8k1k_tp8_conc8.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp8_conc8.yaml

H200_SXM

Low Latency

8192 / 1024

16

8k1k_tp1_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp1_conc16.yaml

2xH200_SXM

Low Latency

8192 / 1024

16

8k1k_tp2_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp2_conc16.yaml

4xH200_SXM

High Throughput

8192 / 1024

16

8k1k_tp4_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp4_conc16.yaml

8xH200_SXM

High Throughput

8192 / 1024

16

8k1k_tp8_conc16.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp8_conc16.yaml

H200_SXM

High Throughput

8192 / 1024

32

8k1k_tp1_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp1_conc32.yaml

2xH200_SXM

High Throughput

8192 / 1024

32

8k1k_tp2_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp2_conc32.yaml

4xH200_SXM

High Throughput

8192 / 1024

32

8k1k_tp4_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp4_conc32.yaml

8xH200_SXM

High Throughput

8192 / 1024

32

8k1k_tp8_conc32.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp8_conc32.yaml

H200_SXM

High Throughput

8192 / 1024

64

8k1k_tp1_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp1_conc64.yaml

2xH200_SXM

High Throughput

8192 / 1024

64

8k1k_tp2_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp2_conc64.yaml

4xH200_SXM

High Throughput

8192 / 1024

64

8k1k_tp4_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp4_conc64.yaml

8xH200_SXM

Max Throughput

8192 / 1024

64

8k1k_tp8_conc64.yaml

trtllm-serve openai/gpt-oss-120b --extra_llm_api_options ${TRTLLM_DIR}/examples/configs/database/openai/gpt-oss-120b/H200/8k1k_tp8_conc64.yaml