gradnas
Module implementing gradnas
pruning algorithm for search.
Summary:
gradnas algorithm gives a better score to sort various pruning choices over L1 norm (fastnas) for language models.
Details:
Further, we can get scores for hparams which are implemented even abstractly. For example, we can use this algorithm to sort the heads in a multi-head attention layer. The attention heads do not have a unique tensor parameter associated to it.
We are ranking the prunable choices for a particular hparam based on Sum((gradient of loss wrt pruning mask)^2). The pruning mask of an hparam is a binary mask indicating which choices of the hparam are pruned (0 means pruned and 1 means not pruned).
While calculating the backward gradient of loss, the masks are set to 1 at all tensors. See more about masks being used to measure sensitivity in this paper: https://arxiv.org/pdf/1905.10650.pdf
Classes
Class to describe the |
|
Binary searcher for gradient algorithm. |
|
Class for managing gradient data for an hparam. |
- ModeloptConfig GradNASConfig
Bases:
ModeloptBaseRuleConfig
Configuration for the
"gradnas"
mode.Show default config as JSON
- Default config (JSON):
{ "nn.Conv1d": { "*": { "channel_divisor": 32, "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [] } }, "nn.Conv2d": { "*": { "channel_divisor": 32, "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [] } }, "nn.Conv3d": { "*": { "channel_divisor": 32, "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [] } }, "nn.ConvTranspose1d": { "*": { "channel_divisor": 32, "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [] } }, "nn.ConvTranspose2d": { "*": { "channel_divisor": 32, "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [] } }, "nn.ConvTranspose3d": { "*": { "channel_divisor": 32, "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [] } }, "nn.Linear": { "*": { "feature_divisor": 32, "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } }, "nn.BatchNorm1d": { "*": { "feature_divisor": 32, "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } }, "nn.BatchNorm2d": { "*": { "feature_divisor": 32, "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } }, "nn.BatchNorm3d": { "*": { "feature_divisor": 32, "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } }, "nn.SyncBatchNorm": { "*": { "feature_divisor": 32, "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } }, "nn.InstanceNorm1d": { "*": { "feature_divisor": 32, "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } }, "nn.InstanceNorm2d": { "*": { "feature_divisor": 32, "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } }, "nn.InstanceNorm3d": { "*": { "feature_divisor": 32, "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } }, "nn.LayerNorm": { "*": { "feature_divisor": 32, "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } }, "nn.GroupNorm": { "*": { "channel_divisor": 32, "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ] } } }
- field nn.BatchNorm1d: DynamicBatchNorm1dConfig | None | dict[str, DynamicBatchNorm1dConfig | None]
Show details
Configuration for dynamic nn.BatchNorm1d module.
If the
"nn.BatchNorm1d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "feature_divisor": 32 } }
To deactivate any dynamic nn.BatchNorm1d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.BatchNorm1d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.BatchNorm2d: DynamicBatchNorm2dConfig | None | dict[str, DynamicBatchNorm2dConfig | None]
Show details
Configuration for dynamic nn.BatchNorm2d module.
If the
"nn.BatchNorm2d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "feature_divisor": 32 } }
To deactivate any dynamic nn.BatchNorm2d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.BatchNorm2d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.BatchNorm3d: DynamicBatchNorm3dConfig | None | dict[str, DynamicBatchNorm3dConfig | None]
Show details
Configuration for dynamic nn.BatchNorm3d module.
If the
"nn.BatchNorm3d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "feature_divisor": 32 } }
To deactivate any dynamic nn.BatchNorm3d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.BatchNorm3d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.Conv1d: DynamicConv1dConfig | None | dict[str, DynamicConv1dConfig | None]
Show details
Configuration for dynamic nn.Conv1d module.
If the
"nn.Conv1d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [], "channel_divisor": 32 } }
To deactivate any dynamic nn.Conv1d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.Conv1d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.Conv2d: DynamicConv2dConfig | None | dict[str, DynamicConv2dConfig | None]
Show details
Configuration for dynamic nn.Conv2d module.
If the
"nn.Conv2d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [], "channel_divisor": 32 } }
To deactivate any dynamic nn.Conv2d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.Conv2d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.Conv3d: DynamicConv3dConfig | None | dict[str, DynamicConv3dConfig | None]
Show details
Configuration for dynamic nn.Conv3d module.
If the
"nn.Conv3d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [], "channel_divisor": 32 } }
To deactivate any dynamic nn.Conv3d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.Conv3d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.ConvTranspose1d: DynamicConvTranspose1dConfig | None | dict[str, DynamicConvTranspose1dConfig | None]
Show details
Configuration for dynamic nn.ConvTranspose1d module.
If the
"nn.ConvTranspose1d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [], "channel_divisor": 32 } }
To deactivate any dynamic nn.ConvTranspose1d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.ConvTranspose1d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.ConvTranspose2d: DynamicConvTranspose2dConfig | None | dict[str, DynamicConvTranspose2dConfig | None]
Show details
Configuration for dynamic nn.ConvTranspose2d module.
If the
"nn.ConvTranspose2d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [], "channel_divisor": 32 } }
To deactivate any dynamic nn.ConvTranspose2d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.ConvTranspose2d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.ConvTranspose3d: DynamicConvTranspose3dConfig | None | dict[str, DynamicConvTranspose3dConfig | None]
Show details
Configuration for dynamic nn.ConvTranspose3d module.
If the
"nn.ConvTranspose3d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "kernel_size": [], "channel_divisor": 32 } }
To deactivate any dynamic nn.ConvTranspose3d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.ConvTranspose3d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.GroupNorm: DynamicGroupNormConfig | None | dict[str, DynamicGroupNormConfig | None]
Show details
Configuration for dynamic nn.GroupNorm module.
If the
"nn.GroupNorm"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "channels_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "channel_divisor": 32 } }
To deactivate any dynamic nn.GroupNorm module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.GroupNorm
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.InstanceNorm1d: DynamicInstanceNorm1dConfig | None | dict[str, DynamicInstanceNorm1dConfig | None]
Show details
Configuration for dynamic nn.InstanceNorm1d module.
If the
"nn.InstanceNorm1d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "feature_divisor": 32 } }
To deactivate any dynamic nn.InstanceNorm1d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.InstanceNorm1d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.InstanceNorm2d: DynamicInstanceNorm2dConfig | None | dict[str, DynamicInstanceNorm2dConfig | None]
Show details
Configuration for dynamic nn.InstanceNorm2d module.
If the
"nn.InstanceNorm2d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "feature_divisor": 32 } }
To deactivate any dynamic nn.InstanceNorm2d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.InstanceNorm2d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.InstanceNorm3d: DynamicInstanceNorm3dConfig | None | dict[str, DynamicInstanceNorm3dConfig | None]
Show details
Configuration for dynamic nn.InstanceNorm3d module.
If the
"nn.InstanceNorm3d"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "feature_divisor": 32 } }
To deactivate any dynamic nn.InstanceNorm3d module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.InstanceNorm3d
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.LayerNorm: DynamicLayerNormConfig | None | dict[str, DynamicLayerNormConfig | None]
Show details
Configuration for dynamic nn.LayerNorm module.
If the
"nn.LayerNorm"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "feature_divisor": 32 } }
To deactivate any dynamic nn.LayerNorm module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.LayerNorm
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.Linear: DynamicLinearConfig | None | dict[str, DynamicLinearConfig | None]
Show details
Configuration for dynamic nn.Linear module.
If the
"nn.Linear"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "feature_divisor": 32 } }
To deactivate any dynamic nn.Linear module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.Linear
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- field nn.SyncBatchNorm: DynamicSyncBatchNormConfig | None | dict[str, DynamicSyncBatchNormConfig | None]
Show details
Configuration for dynamic nn.SyncBatchNorm module.
If the
"nn.SyncBatchNorm"
key is not specified, the default configuration (shown in JSON) will be used:{ "*": { "features_ratio": [ 0.05, 0.1, 0.15000000000000002, 0.2, 0.25, 0.30000000000000004, 0.35000000000000003, 0.4, 0.45, 0.5, 0.55, 0.6000000000000001, 0.65, 0.7000000000000001, 0.75, 0.8, 0.8500000000000001, 0.9, 0.9500000000000001, 1.0 ], "feature_divisor": 32 } }
To deactivate any dynamic nn.SyncBatchNorm module, use
None
instead of providing a dictionary{}
.To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all
nn.SyncBatchNorm
layers except for those in the"lm_head"
submodule use:{ "*": {...}, "*lm_head*": None, }
Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.
If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:
{...}
which is short for
{ "*": {...}, }
- class GradNASModeDescriptor
Bases:
FastNASModeDescriptor
Class to describe the
"gradnas"
mode.The properties of this mode can be inspected via the source code.
- property config_class: type[ModeloptBaseConfig]
Specifies the config class for the mode.
- property name: str
Returns the value (str representation) of the mode.
- property search_algorithm: type[BaseSearcher]
Specifies the search algorithm to use for this mode (if any).
- class GradientBinarySearcher
Bases:
BinarySearcher
Binary searcher for gradient algorithm.
- SETUP_GRADIENT_FUNC: dict[type[DynamicModule], Callable[[DynamicModule], tuple[GradientDataManager, RemovableHandle]]]
- before_search()
Setup search with gradient-based score.
- Return type:
None
- property default_search_config: dict[str, Any]
Get the default config for the searcher.
- static gradnas_score_func(model)
Score function for gradnas algorithm.
If we prune N neurons from layer L, the total degradation is the sum of degradation values of the N pruned neurons. In fast algorithm, the degradation due to pruning is estimated directly from validation_score(model after pruning). Rest of the algorithm is exactly the same as fast algorithm.
- Parameters:
model (Module)
- Return type:
float
- property hparam_names_for_search: set[str]
We can only optimize over certain types of hparams in gradient binary search.
- sanitize_search_config(config)
Sanitize the search config dict.
- Parameters:
config (dict[str, Any] | None)
- Return type:
dict[str, Any]
- class GradientDataManager
Bases:
object
Class for managing gradient data for an hparam.
- __init__(shape, model, reduce_func=<function GradientDataManager.<lambda>>)
Initialize GradientDataManager.
- process_gradient()
Process gradient of the mask.
- property score
The score of the hparam based on the stored gradients.