TensorRT Model Optimizer
Getting Started
Overview
Installation
Quick Start: Quantization
Quick Start: Quantization (Windows)
Quick Start: Pruning
Quick Start: Distillation
Quick Start: Sparsity
Guides
Support Matrix
Quantization
Pruning
NAS
Distillation
Sparsity
Saving & Restoring
Speculative Decoding
Deployment
TensorRT-LLM
DirectML
Unified HuggingFace Checkpoint
Examples
All GitHub Examples
Reference
Changelog
modelopt API
deploy
onnx
torch
distill
export
nas
opt
prune
quantization
backends
calib
modelopt.torch.quantization.compress
config
conversion
export_onnx
extensions
mode
model_calib
model_quant
nn
optim
plugins
qtensor
quant_modules
tensor_quant
triton
utils
sparsity
speculative
trace
utils
Support
Contact us
FAQs
TensorRT Model Optimizer
modelopt API
torch
quantization
backends
modelopt.torch.quantization.backends.nvfp4_gemm
View page source
modelopt.torch.quantization.backends.nvfp4_gemm
nvfp4_gemm
(
quant_module
,
input_tensor
,
bias
=
None
)
GEMM function for fp4 quantization.