mcore_gpt_minitron

Module implementing mcore_gpt_minitron pruning algorithm for NVIDIA Megatron-Core / NeMo models.

Minitron pruning algorithm uses activation magnitudes to estimate importance of neurons / attention heads in the model. More details on Minitron pruning algorithm can be found here: https://arxiv.org/pdf/2407.14679

Classes

MCoreGPTMinitronSearcher

Searcher for Minitron pruning algorithm.

class MCoreGPTMinitronSearcher

Bases: BaseSearcher

Searcher for Minitron pruning algorithm.

SUPPORTED_HPARAMS = {'ffn_hidden_size', 'num_attention_heads', 'num_query_groups'}

Optional pre-processing steps before the search.

Return type:

None

property default_search_config: Dict[str, Any]

Get the default config for the searcher.

property default_state_dict: Dict[str, Any]

Return default state dict.

Run actual search.

Return type:

None

sanitize_search_config(config)

Sanitize the search config dict.

Parameters:

config (Dict[str, Any] | None) –

Return type:

Dict[str, Any]