modelopt.torch.quantization.compress
- compress(model, config=None)
- Compress model weights of quantized model. - This function compresses weights in layers that have an enabled weight_quantizer with a supported quantization format. The compression is controlled by a pattern-based configuration. - Parameters:
- model – The quantized model to compress. 
- config (dict[str, bool] | None | CompressConfig) – - Dictionary mapping layer patterns to boolean compression flags. If - None, defaults to- {"default": True}which compresses all supported layers.- Example configuration: - { "*.mlp.fc1*": False, # Skip compression for fc1 layers "default": True, # Compress all other layers } - Note: Each configuration except “default” is applied sequentially; therefore the later configurations will override the previous ones if the same layer is matched. 
 
 - Note: This function modifies the input model in-place.