gpt_oss_pruned_to_mxfp4

Create a HuggingFace checkpoint with MXFP4 MoE weights from the original gpt-oss-120b model.

This script: 1. Copies non-MoE weights from the student model (trained attention, embeddings, etc.) 2. Extracts MoE expert weights from the original gpt-oss-120b in MXFP4 format 3. Deduces expert mappings by comparing weights 4. Outputs a new pruned (heterogeneous) checkpoint with PACKED MXFP4 expert weights