gpt_oss_pruned_to_mxfp4

Create a HuggingFace checkpoint with MXFP4 MoE weights from the original gpt-oss-120b model.

This script: 1. Copies non-MoE weights from the student model (trained attention, embeddings, etc.) 2. Extracts MoE expert weights from the original gpt-oss-120b in MXFP4 format 3. Deduces expert mappings by comparing weights 4. Outputs a new pruned (heterogeneous) checkpoint with PACKED MXFP4 expert weights

Classes

Any

Special type indicating an unconstrained type.

safe_open

Opens a safetensors lazily and returns tensors as asked

tqdm

Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested.

Functions

convert_moe_packed_tensors

Convert the mxfp4 weights again, dequantizing and makes them compatible with the forward pass of GPT_OSS.

copy_config_files

Copy configuration files from student model and update config.json.

copy_non_moe_weights

Copy non-MoE weights from student model.

deduce_experts_for_layer

Deduce which original experts match the student experts by comparing weights.

load_layer_tensors

Load all MoE-related tensors for a layer, potentially from multiple files.

load_original_index

Load the original model's safetensors index.

main

process_single_layer

Process a single layer - loads tensors from potentially multiple files.

save_file

Saves a dictionary of tensors into raw bytes in safetensors format.