model_utils
Utility functions for model type detection and classification.
MODEL_NAME_TO_TYPE={'GPT2': 'gpt', 'Mllama': 'mllama', 'Llama4': 'llama4', 'Llama': 'llama', 'Mistral': 'llama', 'GPTJ': 'gptj', 'FalconForCausalLM': 'falcon', 'RWForCausalLM': 'falcon', 'baichuan': 'baichuan', 'MPT': 'mpt', 'Bloom': 'bloom', 'ChatGLM': 'chatglm', 'QWen': 'qwen', 'RecurrentGemma': 'recurrentgemma', 'Gemma3': 'gemma3', 'Gemma2': 'gemma2', 'Gemma': 'gemma', 'phi3small': 'phi3small', 'phi3': 'phi3', 'PhiMoEForCausalLM': 'phi3', 'Phi4MMForCausalLM': 'phi4mm', 'phi': 'phi', 'TLGv4ForCausalLM': 'phi', 'MixtralForCausalLM': 'llama', 'ArcticForCausalLM': 'llama', 'StarCoder': 'gpt', 'Dbrx': 'dbrx', 'T5': 't5', 'Bart': 'bart', 'GLM': 'glm', 'InternLM2ForCausalLM': 'internlm', 'ExaoneForCausalLM': 'exaone', 'Nemotron': 'gpt', 'Deepseek': 'deepseek', 'Whisper': 'whisper', 'gptoss': 'gptoss'}
Functions
Extract the language model lineage from a Vision-Language Model (VLM). |
|
Try get the model type from the model name. |
|
Check if a model is a Vision-Language Model (VLM) or multimodal model. |
- get_language_model_from_vl(model)
Extract the language model lineage from a Vision-Language Model (VLM).
This function handles the common patterns for accessing the language model component in various VLM architectures. It checks multiple possible locations where the language model might be stored.
- Parameters:
model – The VLM model instance to extract the language model from
- Returns:
the lineage path towards the language model
- Return type:
list
Examples
>>> # For LLaVA-style models >>> lineage = get_language_model_from_vl(vlm_model) >>> # lineage[0] is vlm_model >>> # lineage[1] is vlm_model.language_model
- get_model_type(model)
Try get the model type from the model name. If not found, return None.
- is_multimodal_model(model)
Check if a model is a Vision-Language Model (VLM) or multimodal model.
This function detects various multimodal model architectures by checking for: - Standard vision configurations (vision_config) - Language model attributes (language_model) - Specific multimodal model types (phi4mm) - Vision LoRA configurations - Audio processing capabilities - Image embedding layers
- Parameters:
model – The HuggingFace model instance to check
- Returns:
True if the model is detected as multimodal, False otherwise
- Return type:
bool
Examples
>>> model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") >>> is_multimodal_model(model) True
>>> model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-multimodal-instruct") >>> is_multimodal_model(model) True