model_utils
Utility functions for model type detection and classification.
MODEL_NAME_TO_TYPE={'GPT2': 'gpt', 'Mllama': 'mllama', 'Llama4': 'llama4', 'Llama': 'llama', 'Mistral': 'llama', 'GPTJ': 'gptj', 'FalconForCausalLM': 'falcon', 'RWForCausalLM': 'falcon', 'baichuan': 'baichuan', 'MPT': 'mpt', 'Bloom': 'bloom', 'ChatGLM': 'chatglm', 'QWen': 'qwen', 'RecurrentGemma': 'recurrentgemma', 'Gemma3': 'gemma3', 'Gemma2': 'gemma2', 'Gemma': 'gemma', 'phi3small': 'phi3small', 'phi3': 'phi3', 'PhiMoEForCausalLM': 'phi3', 'Phi4MMForCausalLM': 'phi4mm', 'phi': 'phi', 'TLGv4ForCausalLM': 'phi', 'MixtralForCausalLM': 'llama', 'ArcticForCausalLM': 'llama', 'StarCoder': 'gpt', 'Dbrx': 'dbrx', 'T5': 't5', 'Bart': 'bart', 'GLM': 'glm', 'InternLM2ForCausalLM': 'internlm', 'ExaoneForCausalLM': 'exaone', 'Nemotron': 'gpt', 'Deepseek': 'deepseek', 'Whisper': 'whisper', 'gptoss': 'gptoss'}
Functions
Extract the language model component from a Vision-Language Model (VLM). |
|
Try get the model type from the model name. |
|
Check if a model is a Vision-Language Model (VLM) or multimodal model. |
- get_language_model_from_vl(model)
Extract the language model component from a Vision-Language Model (VLM).
This function handles the common patterns for accessing the language model component in various VLM architectures. It checks multiple possible locations where the language model might be stored.
- Parameters:
model – The VLM model instance to extract the language model from
- Returns:
- (language_model, parent_model) where:
language_model: The extracted language model component, or None if not found
parent_model: The parent model containing the language_model attribute
- Return type:
tuple
Examples
>>> # For LLaVA-style models >>> lang_model, parent = get_language_model_from_vl(vlm_model) >>> if lang_model is not None: ... # Work with the language model component ... quantized_lang_model = quantize(lang_model) ... # Update the parent model ... parent.language_model = quantized_lang_model
- get_model_type(model)
Try get the model type from the model name. If not found, return None.
- is_multimodal_model(model)
Check if a model is a Vision-Language Model (VLM) or multimodal model.
This function detects various multimodal model architectures by checking for: - Standard vision configurations (vision_config) - Language model attributes (language_model) - Specific multimodal model types (phi4mm) - Vision LoRA configurations - Audio processing capabilities - Image embedding layers
- Parameters:
model – The HuggingFace model instance to check
- Returns:
True if the model is detected as multimodal, False otherwise
- Return type:
bool
Examples
>>> model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") >>> is_multimodal_model(model) True
>>> model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-multimodal-instruct") >>> is_multimodal_model(model) True