Quick Start: Sparsity
Sparsity
ModelOpt’s sparsity feature is an effective technique to reduce the
memory footprint of deep learning models and accelerate the inference speed. ModelOpt provides the
easy-to-use API mts.sparsify()
to apply
weight sparsity to a given model.
mts.sparsify()
supports
NVIDIA 2:4 Sparsity sparsity pattern and various sparsification
methods, such as NVIDIA ASP
and SparseGPT.
This guide provides a quick start to apply weight sparsity to a PyTorch model using ModelOpt.
Post-Training Sparsification (PTS) for PyTorch models
mts.sparsify()
requires the model,
the appropriate sparsity configuration, and a forward loop as inputs.
Here is a quick example of sparsifying a model to 2:4 sparsity pattern with SparseGPT method using
mts.sparsify()
.
import modelopt.torch.sparsity as mts
# Setup the model
model = get_model()
# Setup the data loaders. An example usage:
data_loader = get_train_dataloader(num_samples=calib_size)
# Define the sparsity configuration
sparsity_config = {"data_loader": data_loader, "collect_func": lambda x: x}
# Sparsify the model and perform calibration (PTS)
model = mts.sparsify(model, mode="sparsegpt", config=sparsity_config)
Note
data_loader is only required in case of data-driven sparsity, e.g., SparseGPT for calibration. sparse_magnitude does not require data_loader as it is purely based on the weights of the model.
Note
data_loader and collect_func can be substituted with a forward_loop that iterates the model through the calibration dataset.
Sparsity-aware Training (SAT) for PyTorch models
After sparsifying the model, you can save the checkpoint for the sparsified model and use it for fine-tuning the sparsified model. Check out the GitHub end-to-end example to learn more about SAT.
- Next Steps
Learn more about sparsity and advanced usage of ModelOpt sparsity in Sparsity guide.
Checkout out the end-to-end example on GitHub for PTS and SAT.