gradnas

Module implementing gradnas pruning algorithm for search.

Summary:

gradnas algorithm gives a better score to sort various pruning choices over L1 norm (fastnas) for language models.

Details:

Further, we can get scores for hparams which are implemented even abstractly. For example, we can use this algorithm to sort the heads in a multi-head attention layer. The attention heads do not have a unique tensor parameter associated to it.

We are ranking the prunable choices for a particular hparam based on Sum((gradient of loss wrt pruning mask)^2). The pruning mask of an hparam is a binary mask indicating which choices of the hparam are pruned (0 means pruned and 1 means not pruned).

While calculating the backward gradient of loss, the masks are set to 1 at all tensors. See more about masks being used to measure sensitivity in this paper: https://arxiv.org/pdf/1905.10650.pdf

Classes

`GradNASModeDescriptor`	Class to describe the `"gradnas"` mode.
`GradientBinarySearcher`	Binary searcher for gradient algorithm.
`GradientDataManager`	Class for managing gradient data for an hparam.

ModeloptConfig GradNASConfig

Bases: ModeloptBaseRuleConfig

Configuration for the "gradnas" mode.

Show default config as JSON

Default config (JSON):

{
   "nn.Conv1d": {
      "*": {
         "channel_divisor": 32,
         "channels_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ],
         "kernel_size": []
      }
   },
   "nn.Conv2d": {
      "*": {
         "channel_divisor": 32,
         "channels_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ],
         "kernel_size": []
      }
   },
   "nn.Conv3d": {
      "*": {
         "channel_divisor": 32,
         "channels_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ],
         "kernel_size": []
      }
   },
   "nn.ConvTranspose1d": {
      "*": {
         "channel_divisor": 32,
         "channels_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ],
         "kernel_size": []
      }
   },
   "nn.ConvTranspose2d": {
      "*": {
         "channel_divisor": 32,
         "channels_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ],
         "kernel_size": []
      }
   },
   "nn.ConvTranspose3d": {
      "*": {
         "channel_divisor": 32,
         "channels_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ],
         "kernel_size": []
      }
   },
   "nn.Linear": {
      "*": {
         "feature_divisor": 32,
         "features_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   },
   "nn.BatchNorm1d": {
      "*": {
         "feature_divisor": 32,
         "features_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   },
   "nn.BatchNorm2d": {
      "*": {
         "feature_divisor": 32,
         "features_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   },
   "nn.BatchNorm3d": {
      "*": {
         "feature_divisor": 32,
         "features_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   },
   "nn.SyncBatchNorm": {
      "*": {
         "feature_divisor": 32,
         "features_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   },
   "nn.InstanceNorm1d": {
      "*": {
         "feature_divisor": 32,
         "features_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   },
   "nn.InstanceNorm2d": {
      "*": {
         "feature_divisor": 32,
         "features_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   },
   "nn.InstanceNorm3d": {
      "*": {
         "feature_divisor": 32,
         "features_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   },
   "nn.LayerNorm": {
      "*": {
         "feature_divisor": 32,
         "features_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   },
   "nn.GroupNorm": {
      "*": {
         "channel_divisor": 32,
         "channels_ratio": [
            0.05,
            0.1,
            0.15000000000000002,
            0.2,
            0.25,
            0.30000000000000004,
            0.35000000000000003,
            0.4,
            0.45,
            0.5,
            0.55,
            0.6000000000000001,
            0.65,
            0.7000000000000001,
            0.75,
            0.8,
            0.8500000000000001,
            0.9,
            0.9500000000000001,
            1.0
         ]
      }
   }
}

field nn.BatchNorm1d: DynamicBatchNorm1dConfig | None | dict[str, DynamicBatchNorm1dConfig | None]

Show details

Configuration for dynamic nn.BatchNorm1d module.

If the "nn.BatchNorm1d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "features_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "feature_divisor": 32
  }
}

To deactivate any dynamic nn.BatchNorm1d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.BatchNorm1d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.BatchNorm2d: DynamicBatchNorm2dConfig | None | dict[str, DynamicBatchNorm2dConfig | None]

Show details

Configuration for dynamic nn.BatchNorm2d module.

If the "nn.BatchNorm2d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "features_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "feature_divisor": 32
  }
}

To deactivate any dynamic nn.BatchNorm2d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.BatchNorm2d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.BatchNorm3d: DynamicBatchNorm3dConfig | None | dict[str, DynamicBatchNorm3dConfig | None]

Show details

Configuration for dynamic nn.BatchNorm3d module.

If the "nn.BatchNorm3d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "features_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "feature_divisor": 32
  }
}

To deactivate any dynamic nn.BatchNorm3d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.BatchNorm3d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.Conv1d: DynamicConv1dConfig | None | dict[str, DynamicConv1dConfig | None]

Show details

Configuration for dynamic nn.Conv1d module.

If the "nn.Conv1d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "channels_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "kernel_size": [],
    "channel_divisor": 32
  }
}

To deactivate any dynamic nn.Conv1d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.Conv1d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.Conv2d: DynamicConv2dConfig | None | dict[str, DynamicConv2dConfig | None]

Show details

Configuration for dynamic nn.Conv2d module.

If the "nn.Conv2d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "channels_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "kernel_size": [],
    "channel_divisor": 32
  }
}

To deactivate any dynamic nn.Conv2d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.Conv2d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.Conv3d: DynamicConv3dConfig | None | dict[str, DynamicConv3dConfig | None]

Show details

Configuration for dynamic nn.Conv3d module.

If the "nn.Conv3d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "channels_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "kernel_size": [],
    "channel_divisor": 32
  }
}

To deactivate any dynamic nn.Conv3d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.Conv3d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.ConvTranspose1d: DynamicConvTranspose1dConfig | None | dict[str, DynamicConvTranspose1dConfig | None]

Show details

Configuration for dynamic nn.ConvTranspose1d module.

If the "nn.ConvTranspose1d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "channels_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "kernel_size": [],
    "channel_divisor": 32
  }
}

To deactivate any dynamic nn.ConvTranspose1d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.ConvTranspose1d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.ConvTranspose2d: DynamicConvTranspose2dConfig | None | dict[str, DynamicConvTranspose2dConfig | None]

Show details

Configuration for dynamic nn.ConvTranspose2d module.

If the "nn.ConvTranspose2d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "channels_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "kernel_size": [],
    "channel_divisor": 32
  }
}

To deactivate any dynamic nn.ConvTranspose2d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.ConvTranspose2d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.ConvTranspose3d: DynamicConvTranspose3dConfig | None | dict[str, DynamicConvTranspose3dConfig | None]

Show details

Configuration for dynamic nn.ConvTranspose3d module.

If the "nn.ConvTranspose3d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "channels_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "kernel_size": [],
    "channel_divisor": 32
  }
}

To deactivate any dynamic nn.ConvTranspose3d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.ConvTranspose3d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.GroupNorm: DynamicGroupNormConfig | None | dict[str, DynamicGroupNormConfig | None]

Show details

Configuration for dynamic nn.GroupNorm module.

If the "nn.GroupNorm" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "channels_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "channel_divisor": 32
  }
}

To deactivate any dynamic nn.GroupNorm module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.GroupNorm layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.InstanceNorm1d: DynamicInstanceNorm1dConfig | None | dict[str, DynamicInstanceNorm1dConfig | None]

Show details

Configuration for dynamic nn.InstanceNorm1d module.

If the "nn.InstanceNorm1d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "features_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "feature_divisor": 32
  }
}

To deactivate any dynamic nn.InstanceNorm1d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.InstanceNorm1d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.InstanceNorm2d: DynamicInstanceNorm2dConfig | None | dict[str, DynamicInstanceNorm2dConfig | None]

Show details

Configuration for dynamic nn.InstanceNorm2d module.

If the "nn.InstanceNorm2d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "features_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "feature_divisor": 32
  }
}

To deactivate any dynamic nn.InstanceNorm2d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.InstanceNorm2d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.InstanceNorm3d: DynamicInstanceNorm3dConfig | None | dict[str, DynamicInstanceNorm3dConfig | None]

Show details

Configuration for dynamic nn.InstanceNorm3d module.

If the "nn.InstanceNorm3d" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "features_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "feature_divisor": 32
  }
}

To deactivate any dynamic nn.InstanceNorm3d module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.InstanceNorm3d layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.LayerNorm: DynamicLayerNormConfig | None | dict[str, DynamicLayerNormConfig | None]

Show details

Configuration for dynamic nn.LayerNorm module.

If the "nn.LayerNorm" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "features_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "feature_divisor": 32
  }
}

To deactivate any dynamic nn.LayerNorm module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.LayerNorm layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.Linear: DynamicLinearConfig | None | dict[str, DynamicLinearConfig | None]

Show details

Configuration for dynamic nn.Linear module.

If the "nn.Linear" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "features_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "feature_divisor": 32
  }
}

To deactivate any dynamic nn.Linear module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.Linear layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

field nn.SyncBatchNorm: DynamicSyncBatchNormConfig | None | dict[str, DynamicSyncBatchNormConfig | None]

Show details

Configuration for dynamic nn.SyncBatchNorm module.

If the "nn.SyncBatchNorm" key is not specified, the default configuration (shown in JSON) will be used:

{
  "*": {
    "features_ratio": [
05,
1,
15000000000000002,
2,
25,
30000000000000004,
35000000000000003,
4,
45,
5,
55,
6000000000000001,
65,
7000000000000001,
75,
8,
8500000000000001,
9,
9500000000000001,
0
    ],
    "feature_divisor": 32
  }
}

To deactivate any dynamic nn.SyncBatchNorm module, use None instead of providing a dictionary {}.

To specify layer-specific configurations, you can specify a config for each submodule with the key specifying a glob pattern that matches the submodule name. For example, to convert to a dynamic module for all nn.SyncBatchNorm layers except for those in the "lm_head" submodule use:

{
    "*": {...},
    "*lm_head*": None,
}

Note that glob expressions are processed sequentially in the order they are specified. Later keys in the config will overwrite earlier keys if they match the same submodule name.

If you want to specify the same configuration for all submodules, you can provide an unnested dictionary as well:

{...}

which is short for

{
    "*": {...},
}

class GradNASModeDescriptor

Bases: FastNASModeDescriptor

Class to describe the "gradnas" mode.

The properties of this mode can be inspected via the source code.

property config_class: type[ModeloptBaseConfig]: Specifies the config class for the mode.

property name: str: Returns the value (str representation) of the mode.

property search_algorithm: type[BaseSearcher]: Specifies the search algorithm to use for this mode (if any).

class GradientBinarySearcher

Bases: BinarySearcher

Binary searcher for gradient algorithm.

SETUP_GRADIENT_FUNC: dict[type[DynamicModule], Callable[[DynamicModule], tuple[GradientDataManager, RemovableHandle]]]

before_search()

Setup search with gradient-based score.

Return type:: None

property default_search_config: dict[str, Any]: Get the default config for the searcher.

static gradnas_score_func(model)

Score function for gradnas algorithm.

If we prune N neurons from layer L, the total degradation is the sum of degradation values of the N pruned neurons. In fast algorithm, the degradation due to pruning is estimated directly from validation_score(model after pruning). Rest of the algorithm is exactly the same as fast algorithm.

Parameters:: model (Module)
Return type:: float

property hparam_names_for_search: set[str]: We can only optimize over certain types of hparams in gradient binary search.

sanitize_search_config(config)

Sanitize the search config dict.

Parameters:: config (dict[str, Any] | None)
Return type:: dict[str, Any]

class GradientDataManager

Bases: object

Class for managing gradient data for an hparam.

__init__(shape, model, reduce_func=<function GradientDataManager.<lambda>>): Initialize GradientDataManager.

process_gradient(): Process gradient of the mask.

property score: The score of the hparam based on the stored gradients.