config

Configurations for speculative decoding modes.

ModeloptConfig EagleConfig

Bases: ModeloptBaseConfig

Eagle config.

Show default config as JSON

Default config (JSON):

{
   "eagle_num_layers": 1,
   "use_input_layernorm_in_first_layer": true,
   "use_last_layernorm": false,
   "eagle_hidden_state_distillation": false,
   "use_aux_hidden_state": false,
   "eagle_aux_hidden_state_layer_ids": [],
   "eagle_disable_moe": false,
   "draft_vocab_size": 0,
   "use_mtp_layernorm": false,
   "ffn_hidden_size": 0,
   "parallel_draft_step": 1
}

field draft_vocab_size: int

Show details

The vocab size of the eagle module. 0 means the same as base model.

field eagle_aux_hidden_state_layer_ids: list

Show details

The list of aux hidden state layers used in EAGLE-3.

field eagle_disable_moe: bool

Show details

Whether to disable MoE in eagle module.

field eagle_hidden_state_distillation: bool

Show details

Whether to use feature hidden states distillation.

field eagle_num_layers: int

Show details

The number of decoder used in the eagle model.

field ffn_hidden_size: int

Show details

ffn_hidden_size of the eagle module. Using base model’s ffn_hidden_size is set to 0.

field parallel_draft_step: int

Show details

The number of tokens generated in parallel draft. If set to 1, draft is not in parallel mode.

field use_aux_hidden_state: bool

Show details

Whether to use aux hidden state (EAGLE-3).

field use_input_layernorm_in_first_layer: bool

Show details

Whether to use input_layernorm in the first decoder layer.

field use_last_layernorm: bool

Show details

Whether to use a final layernorm before lm_head.

field use_mtp_layernorm: bool

Show details

Whether to use norms before input_hidden_states and embedding in eagle module.

ModeloptConfig MTPConfig

Bases: ModeloptBaseConfig

MTP config.

Show default config as JSON

Default config (JSON):

{
   "mtp_num_layers": 1,
   "mtp_num_module": 1,
   "mtp_freeze_list": [],
   "use_last_layernorm": false
}

field mtp_freeze_list: list

Show details

The list of mtp module to freeze.

field mtp_num_layers: int

Show details

The number of decoder used in the mtp model.

field mtp_num_module: int

Show details

The number of mtp used in the model.

field use_last_layernorm: bool

Show details

Whether to use a final layernorm before lm_head.

ModeloptConfig MedusaConfig

Bases: ModeloptBaseConfig

Medusa config.

Show default config as JSON

Default config (JSON):

{
   "medusa_num_heads": 2,
   "medusa_num_layers": 1
}

field medusa_num_heads: int

Show details

The number of medusa heads added to the model.

field medusa_num_layers: int

Show details

The number of ResBlocks used in medusa head.