{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Accelerating Hugging Face Mixtral MoE Fine-Tuning with Transformer Engine\n", "\n", "
MixtralDecoderLayer (left) wrapped by TE modules (right).MXFP8 path fuses multiple operations into one kernel before the down projection.