model_utils

Utility functions for model type detection and classification.

MODEL_NAME_TO_TYPE={'GPT2': 'gpt', 'Mllama': 'mllama', 'Llama4': 'llama4', 'Llama': 'llama', 'Mistral': 'llama', 'GPTJ': 'gptj', 'FalconForCausalLM': 'falcon', 'RWForCausalLM': 'falcon', 'baichuan': 'baichuan', 'MPT': 'mpt', 'Bloom': 'bloom', 'ChatGLM': 'chatglm', 'QWen': 'qwen', 'RecurrentGemma': 'recurrentgemma', 'Gemma3': 'gemma3', 'Gemma2': 'gemma2', 'Gemma': 'gemma', 'phi3small': 'phi3small', 'phi3': 'phi3', 'PhiMoEForCausalLM': 'phi3', 'Phi4MMForCausalLM': 'phi4mm', 'phi': 'phi', 'TLGv4ForCausalLM': 'phi', 'MixtralForCausalLM': 'llama', 'ArcticForCausalLM': 'llama', 'StarCoder': 'gpt', 'Dbrx': 'dbrx', 'T5': 't5', 'Bart': 'bart', 'GLM': 'glm', 'InternLM2ForCausalLM': 'internlm', 'ExaoneForCausalLM': 'exaone', 'Nemotron': 'gpt', 'Deepseek': 'deepseek', 'Whisper': 'whisper', 'gptoss': 'gptoss'}

Functions

get_model_type

Try get the model type from the model name.

is_multimodal_model

Check if a model is a Vision-Language Model (VLM) or multimodal model.

get_model_type(model)

Try get the model type from the model name. If not found, return None.

is_multimodal_model(model)

Check if a model is a Vision-Language Model (VLM) or multimodal model.

This function detects various multimodal model architectures by checking for: - Standard vision configurations (vision_config) - Language model attributes (language_model) - Specific multimodal model types (phi4mm) - Vision LoRA configurations - Audio processing capabilities - Image embedding layers

Parameters:

model – The HuggingFace model instance to check

Returns:

True if the model is detected as multimodal, False otherwise

Return type:

bool

Examples

>>> model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
>>> is_multimodal_model(model)
True
>>> model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-multimodal-instruct")
>>> is_multimodal_model(model)
True