mcore_gpt_minitron
Module implementing mcore_gpt_minitron
pruning algorithm for NVIDIA Megatron-Core / NeMo models.
Minitron pruning algorithm uses activation magnitudes to estimate importance of neurons / attention heads in the model. More details on Minitron pruning algorithm can be found here: https://arxiv.org/pdf/2407.14679
Classes
Searcher for Minitron pruning algorithm. |
- class MCoreGPTMinitronSearcher
Bases:
BaseSearcher
Searcher for Minitron pruning algorithm.
- SUPPORTED_HPARAMS = {'ffn_hidden_size', 'hidden_size', 'num_attention_heads', 'num_layers', 'num_query_groups'}
- before_search()
Optional pre-processing steps before the search.
- Return type:
None
- property default_search_config: Dict[str, Any]
Get the default config for the searcher.
- property default_state_dict: Dict[str, Any]
Return default state dict.
- run_search()
Run actual search.
- Return type:
None
- sanitize_search_config(config)
Sanitize the search config dict.
- Parameters:
config (Dict[str, Any] | None) –
- Return type:
Dict[str, Any]