NAS
Introduction
ModelOpt provides a NAS method (aka mode
) - AutoNAS via the
modelopt.torch.nas
module. Given a model, these methods finds the
subnet which meets the given deployment constraints (e.g. FLOPs, parameters) from your provided
base model with little to no accuracy degradation (depending on how aggressive is the model size reduced).
More details on this NAS mode is as follows:
autonas
: A NAS method suitable for Computer Vision models that searches for the layerwise parameters like number of channels, kernel size, network depth etc.
Follow the steps described below to obtain the optimal model meeting your unique requirements
using modelopt.torch.nas
:
Convert your model via
mtn.convert
: Natively generate a neural architecture search space from your PyTorch base model using a simple set of configurations. Conveniently save and restore the model architecture and weights during the process.NAS training: Seamlessly train the resulting search space within your existing training pipeline.
Subnet architecture search via
mtn.search
: Search for the best neural architecture (subnet) satisfying your deployment constraints, e.g., FLOPs / parameters.Fine-tuning: Optionally, fine-tune the resulting subnet to achieve even higher accuracy.
To find out more about NAS and related concepts, please refer to the below section NAS Concepts.
Convert and save
You can convert your model and generate a search space from it using
mtn.convert()
.
The resulting search space should be saved using mto.save()
.
It can be loaded back using mto.restore()
to perform the subsequent steps of architecture search.
Example usage:
import modelopt.torch.nas as mtn
import modelopt.torch.opt as mto
from torchvision.models import resnet50
# User-defined model
model = resnet50()
# Generate the search space for AutoNAS
model = mtn.convert(model, mode="autonas")
# Save the search space for future use
mto.save(model, "modelopt_model.pth")
Note
The NAS API’s are a super-set of the pruning API’s. You can use the pruning modes (e.g. "fastnas"
, "gradnas"
, etc.)
here as well.
Note
In the above example, we have used the default AutoNAS config
for mtn.convert()
.
You can see it using
mtn.config.AutoNASConfig()
.
You can also specify custom configurations to have a different search space. See
mtn.convert()
documentation for more information.
An example config is shown below:
import modelopt.torch.nas as mtn
config = mtn.config.AutoNASConfig()
config["nn.Conv2d"]["*"]["out_channels_ratio"] += (0.1,) # include more channel choices
model = mtn.convert(model_or_model_factory, mode=[("autonas", config)])
Note
If you want to learn more about the conversion process and the prerequisites for your model, you can take a look at NAS Model Prerequisites.
Note
Please see saving and restoring of ModelOpt-modified models to learn about all the available options for saving and restoring.
Profiling a search space
The search space can be used to perform architecture search according to your desired deployment constraints.
To better understand the performance and the range of the resulting search space, you can profile
the search space together with your deployment constraints using
mtn.profile()
:
import torch
# Looking for a subnet with at most 2 GFLOPs
constraints = {"flops": 2.0e9}
# Measure FLOPs against dummy_input
# Can be provided as a single tensor or tuple of input args to the model.
dummy_input = torch.randn(1, 3, 224, 224)
is_sat, search_space_stats = mtn.profile(model, dummy_input, constraints=constraints)
Following info will be printed:
Profiling the following subnets from the given model: ('min', 'centroid', 'max').
--------------------------------------------------------------------------------
Profiling Results
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Constraint ┃ min ┃ centroid ┃ max ┃ max/min ratio ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flops │ 487.92M │ 1.84G │ 4.59G │ 9.40 │
│ params │ 4.84M │ 12.33M │ 25.50M │ 5.27 │
└──────────────┴──────────────┴──────────────┴──────────────┴───────────────┘
Constraints Evaluation
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ ┃ ┃ Satisfiable ┃
┃ Constraint ┃ Upper Bound ┃ Upper Bound ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ flops │ 2.00G │ True │
└──────────────┴──────────────┴──────────────┘
Search Space Summary:
----------------------------------------------------------------------------------------------------
* conv1.out_channels [32, 64]
conv1.in_channels [3]
bn1.num_features [32, 64]
* layer1.depth [1, 2, 3]
* layer1.0.conv1.out_channels [32, 64]
layer1.0.conv1.in_channels [32, 64]
layer1.0.bn1.num_features [32, 64]
* layer1.0.conv2.out_channels [32, 64]
...
...
...
* layer4.2.conv1.out_channels [256, 352, 512]
layer4.2.conv1.in_channels [2048]
layer4.2.bn1.num_features [256, 352, 512]
* layer4.2.conv2.out_channels [256, 352, 512]
layer4.2.conv2.in_channels [256, 352, 512]
layer4.2.bn2.num_features [256, 352, 512]
layer4.2.conv3.out_channels [2048]
layer4.2.conv3.in_channels [256, 352, 512]
----------------------------------------------------------------------------------------------------
Number of configurable hparams: 40
Total size of the search space: 1.90e+18
Note: all constraints can be satisfied within the search space!
You can also skip the constraints
parameter to just print the range of available constraints
without checking if it is within your constraints. The profiling results will help you understand
the search space and come up with a potential search constraint that you can iterate on.
NAS training
Prerequisites
During NAS training, you can use your existing training infrastructure. However, we recommend you make the following modifications to your training hyperparameters:
Increase the training time (epochs) by 2-3x.
Make sure that the learning rate schedule is adjusted for the longer training time.
We recommend using a continuously decaying learning rate schedule such as the cosine annealing schedule (see PyTorch documentation).
Restore the search space
Please restore the search space from the saved one to continue with the rest of the steps as shown below:
# Provide the model before conversion to mto.restore
model = mto.restore(model_or_model_factory, "modelopt_model.pth")
Training
You can now proceed with your existing training pipeline with the changes in training time and learning rate.
Subnet architecture search
The next step in NAS is to perform architecture search on the resulting search space to find the best subnet satisfying your deployment constraints.
Prerequisites
To perform the search (
mtn.search()
) on a trained model, a score function, a dummy input (to measure your deployment constraints), the training dataloader (to calibrate the normalization layers) and the constraints are required. Please see themtn.search()
API for more details.Depending on the algorithm, you may be able to provide multiple search constraint such as
flops
orparams
by specifying an upper bound for each.
Performing search
Below is an example of running search on an AutoNAS converted and trained model.
# Wrap your original validation function to only take the model as input.
# This function acts as the score function to rank models.
def score_func(model):
return validate(model, val_loader, ...)
# Specify the sample input including target data shape for FLOPs calculation.
dummy_input = torch.randn(1, 3, 224, 224)
# Looking for a subnet with at most 2 GFLOPs
search_constraints = {"flops": 2.0e9}
# search_res (dict) contains state_dict / stats of the searcher
searched_model, search_res = mtn.search(
model=model,
constraints=search_constraints,
dummy_input=dummy_input,
config={
"data_loader": train_loader, # training data is used for calibrating BN layers
"score_func": score_func, # validation score is used to rank the subnets
# checkpoint to store the search state and resume or re-run the search with different constraint
"checkpoint": "modelopt_search_checkpoint.pth",
},
)
# Save the searched model for further fine-tuning
mto.save(searched_model, "modelopt_searched_model.pth")
Tip
If the runtime of the score function is longer than a few minutes, consider subsampling the dataset used in the score function. A PyTorch dataset can be subsampled using torch.utils.data.Subset as following:
subset_dataset = torch.utils.data.Subset(dataset, indices)
Note
NAS will modify the model in-place.
Note
mtn.search()
supports distributed data parallelism
via DistributedDataParallel
in PyTorch.
Fine-tuning
After search, the accuracy drop may be less significant compared with pruning, however, we still recommend to run fine-tuning to recover the best accuracy. A usually good fine-tuning schedule for AutoNAS is to repeat the pre-training schedule (1x epochs) with 0.5x-1x initial learning rate as done in FastNAS. Please refer to the Pruning fine-tuning section for more details.
NAS Model Prerequisites
In this guide, we will go through the steps to set up your model to work with NAS and pruning. At
the end of this guide, you will be able to convert
your own model to generate a search space that can be used for NAS and pruning.
Convert your model
Most PyTorch models, including custom models, are natively compatible with ModelOpt (depending on how the forward is implemented). To quickly test whether your model is compatible you can simply try to convert it:
import modelopt.torch.nas as mtn
from torchvision.models import resnet50
# User-defined model
model = resnet50()
# Convert the model into a search space
model = mtn.convert(model, mode="fastnas")
If you encounter problems or would like to understand more about the conversion process, please continue reading. Otherwise, you can skip the rest of this guide.
The conversion process
ModelOpt will automatically generate a search space for you from your custom PyTorch model. This is a one time process process performed during pruning and NAS. Once a model is converted, you can save and restore it for downstream tasks like training, inference, and fine-tuning.
To help you better understand how the search space is derived from your model, we go through the process in more detail below.
Layer support
You can make the most use out of ModelOpt with model architectures consisting of layers that ModelOpt can automatically convert into searchable units.
Specifically, the following native PyTorch layers can be converted into searchable units:
import torch.nn as nn
# We convert native PyTorch convolutional layers to automatically search over the number of
# channels and optionally over the kernel size.
nn.Conv1d
nn.Conv2d
nn.Conv3d
nn.ConvTranspose1d
nn.ConvTranspose2d
nn.ConvTranspose3d
# We convert native PyTorch linear layers to automatically search over the number of features
nn.Linear
# We convert native PyTorch sequential layers that contain residual blocks to automatically
# search over the number of layers (depth) in the sequential layer.
nn.Sequential
# We convert Megatron-core / NeMo GPT-style models (e.g. Llama3.1, NeMo Mistral, etc.)
# to automatically search over the MLP hidden size, number of attention heads, number of GQA groups,
# and depth of the model.
megatron.core.transformer.module.MegatronModule
nemo.collections.nlp.models.language_modeling.megatron_gpt_model.MegatronGPTModel
# We convert Hugging Face Attention layers to automatically search over the number of heads
# and MLP hidden size.
# Make sure `config.use_cache` is set to False during pruning.
transformers.models.bert.modeling_bert.BertAttention
transformers.models.gptj.modeling_gptj.GPTJAttention
Generating a search space
To generate a search space from your desired model, a simple call to
mtn.convert()
suffices:
import modelopt.torch.nas as mtn
from torchvision.models import resnet50
# User-defined model
model = resnet50()
# Convert the model for NAS/pruning
model = mtn.convert(model, mode="fastnas")
Your generated model
represents a search space consisting of a collection of subnets.
Note that you can use the converted model like any other, regular PyTorch model. It will behave
according to the currently activated subnet.
Roughly, the convert
process can be broken down into
the following steps:
Trace through the model to resolve layer dependencies and record how layers are connected.
Convert supported layers into searchable units, i.e., dynamic layers and connect them according to the recorded dependencies.
Generate a consistent search space from the converted model.
Note
During pruning, the conversion is performed implicitly when
mtp.prune
is called.
Prerequisites
In order to correctly generate a search space, your original model should satisfy the following prerequisites.
Traceability
The model needs to be traceable with ModelOpt’s torch.fx-like tracer.
If not, you will see errors or warnings when you run mtn.convert()
.
Note that some of these warnings may not affect the search space and hence can be ignored.
Note that in some cases certain layers cannot be traced and, if possible, you should adjust their definition and forward method to be traceable. Otherwise, such layers and all affected layers will be ignored in the conversion process.
DistributedDataParallel
Wrapping the model with DistributedDataParallel
should occur after the conversion process
and during wrapping find_unused_parameters=True
needs to be set:
model = mtn.convert(model, ...)
model = DistributedDataParallel(model, find_unused_parameters=True)
Auxiliary modules
If your model contains auxiliary modules, e.g., branches that are active only during the training, ensure that you convert the full model such that all modules are active during the conversion process.
Known limitations
Please be aware of other potential limitations as mentioned in the NAS FAQs!
NAS Concepts
Below, we will provide an overview of ModelOpt’s neural architecture search (NAS) and pruning algorithms as well as its basic concepts and terminology.
Overview
The process of finding the best neural network architecture for a given task. |
|
The set of possible candidate architecture that are searched during pruning or NAS. |
|
The set of hyperparameters, e.g., number of layers, describing the search space. |
|
A candidate architecture in the search space. |
|
The process of training the collection of subnets in the search space. |
|
The process of finding an optimal subnet within a trained search space. |
|
The process of training the selected subnet in isolation for improved final accuracy. |
|
The process of removing redundant components from a neural network for a given task. |
Concepts
Below, we provide an introduction to the concepts and terminology of neural architecture search. During regular neural network training, only the neural network weights are trained. However during NAS, both the weights and the architecture of the model are trained.
Neural Architecture Search (NAS)
Neural architecture search is the process of finding the best neural network architecture from a set of candidate architectures. NAS is usually performed before, during, or in-between training. During NAS different performance metrics, such as accuracy, on-device latency, or size of the model, are used to evaluate the candidate architectures.
Search space
The search space is defined as the (discrete) set of all possible neural architectures that are trained. Search spaces are derived from a (user-specified) base architecture (e.g., ResNet50) and a set of configs that describe how to parameterize the base architecture, see NAS Model Prerequisites for more info.
Architecture hyperparameters
The search space is parameterized via a set of discrete architecture hyperparameters that describe individual “modifications” to the base architecture, e.g., the number of channels in a convolutional layer, the number of repeated building blocks, number of attention heads in a transformer layer, etc. Each possible architecture in the search space can be described as a distinct configuration of the set of architecture hyperparameters.
Subnet
The search space consists of a collection of subnets, where each subnet represents a neural architecture. Each subnet constitutes a neural architecture with different layers and operators or different parameterization (e.g. channel number) of each layer.
To better characterize a given search space, we usually consider a few distinct subnets:
Minimum subnet (
min
): The smallest subnet within the search space.Centroid subnet (
centroid
): The subnet for which each architecture hyperparameter is set to the value closest to its centroid (mean).Maximum subnet (
max
): The largest subnet within the search space.
ModelOpt-converted model
After the conversion, the user-provided neural network will represent the search space. It can be
obtained via mtn.convert()
, see Convert and save.
During the conversion process, the search space is automatically derived from a given base architecture and the relevant architecture hyperparameters are automatically identified.
The next step is to train the converted model (instead of the original architecture) to find the optimal subnet for your deployment constraints.
NAS-based training
During training of an search space, we simultaneously train both the model’s weights and architecture:
Using
modelopt.torch.nas
you can re-use your existing training loop to train the search space.During search space training the entire collection of subnets is automatically trained together with its weights.
Given that we train both the architecture (all subnets) and the weights, training data may vary compared to regular training as described in the NAS Training section above.
Architecture search & selection
At the end of search space training process, the next step is to search and select the subnet from the search space:
The search procedure is a discrete optimization problem to determine the optimal subnet configuration from the search space.
The search procedure takes your deployment constraints, e.g., FLOPs, parameters or latency and inference device, into account to determine the optimal (most accurate) subnet configuration while satisfying the constraints.
The resulting subnet can be used for further downstream tasks, e.g., fine-tuning and deployment.
Subnet fine-tuning
To further boost the accuracy of the selected subnet, the subnet is usually fine-tuned on the original task:
To fine-tune the subnet, you can simply repeat the training pipeline of the original model with the adjusted training schedule as described in the Fine-tuning section above.
The fine-tuned model constitutes the deployable model with the optimal trade-off between accuracy and your provided constraints.
NAS vs. Pruning
The difference between NAS and pruning is summarized below.
NAS |
Pruning |
|
---|---|---|
Search space |
More flexible search space with additional searchable dimensions such as network depth, kernel size, or selection of activation function. |
Less flexible search space with searchable dimensions constrained to fewer options such as number of channels and features or attention heads. |
Training time |
Usually requires training a model for additional time before a subnet can be searched. |
No training is required when a pre-trained checkpoint is available. If not, regular training can be used to pre-train a checkpoint. |
Performance |
Can provide improved accuracy-latency trade-off due to more flexible search space and the increased training time. |
May provide similar performance to NAS in particular applications, however, usually exhibits worse performance due to the limited search space and training time. |