Quick Start: Sparsity

Sparsity

ModelOpt’s sparsity feature is an effective technique to reduce the memory footprint of deep learning models and accelerate the inference speed. ModelOpt provides an easy-to-use API mts.sparsify() to apply weight sparsity to a given model. mts.sparsify() supports NVIDIA 2:4 Sparsity sparsity pattern and various sparsification methods, such as (NVIDIA ASP) and (SparseGPT).

This guide provides a quick start to apply weight sparsity to a PyTorch model using ModelOpt.

Post-Training Sparsification (PTS) for PyTorch models

mts.sparsify() requires the model, the appropriate sparsity configuration, and a forward loop as inputs. Here is a quick example of sparsifying a model to 2:4 sparsity pattern with SparseGPT method using mts.sparsify().

import modelopt.torch.sparsity as mts

# Setup the model
model = get_model()

# Setup the data loaders. An example usage:
data_loader = get_train_dataloader(num_samples=calib_size)

# Define the sparsity configuration
sparsity_config = {"data_loader": data_loader, "collect_func": lambda x: x}

# Sparsify the model and perform calibration (PTS)
model = mts.sparsity(model, mode="sparsegpt", config=sparsity_config)

Note

data_loader is only required in case of data-driven sparsity, e.g., SparseGPT for calibration. sparse_magnitude does not require data_loader as it is purely based on the weights of the model.

Note

data_loader and collect_func can be substituted with a forward_loop that iterates the model through the calibration dataset.


Next Steps
  • Learn more about sparsity and advanced usage of ModelOpt sparsity in Sparsity guide.

  • Checkout out the end-to-end examples on GitHub for PTQ and QAT here.