Python Export Pipeline (Deprecated)#

The historical FX-tracing export package remains available in 0.7.1 for compatibility and for components that are not yet fully covered by llm_loader. For new enablement, use:

experimental/quantization to create unified quantized HuggingFace-style checkpoints when starting from FP16/BF16 source checkpoints.
experimental/llm_loader to export ONNX and runtime sidecars from FP16/BF16, pre-quantized, or locally quantized checkpoints.
llm_build, visual_build, audio_build, action_build, and the runtime examples to build engines and run inference.

The tensorrt_edgellm/ folder will be removed in 0.8.0 after the experimental/quantization -> experimental/llm_loader workflow reaches full feature parity for all models and features.

Replacement Map#

Legacy responsibility	Current workflow
ModelOpt quantization inside the export package	`python -m experimental.quantization ...`
LLM ONNX export	`python -m llm_loader.export_all_cli <checkpoint> <output_dir>`
VLM/audio/Omni/VLA component export	`llm_loader.export_all_cli` component orchestration
EAGLE base export	`llm_loader.export_all_cli --eagle-base`
MTP export	`llm_loader.export_all_cli --mtp`
LoRA graph insertion and adapter processing	`python -m llm_loader.lora.insert_lora_cli` and `python -m llm_loader.lora.process_lora_weights_cli`
Static LoRA merge	`python -m llm_loader.lora.merge_lora_cli` before quantization/export
FP8 KV cache	`experimental.quantization --kv_cache_quantization fp8`; `llm_loader` detects metadata during export
FP8 embedding sidecar	`llm_loader.export_all_cli --fp8-embedding`
Vocabulary reduction	`python -m llm_loader.vocab_reduction`, then `llm_loader.export_all_cli --reduced-vocab-dir`

Where To Go#