vlm_dataset_utils
Utility functions for getting samples and forward loop function for different vlm datasets.
Functions
| Retrieves a list of vlm datasets supported. | |
| Get a dataloader with the dataset name and processor of the target model. | 
- get_supported_vlm_datasets()
- Retrieves a list of vlm datasets supported. - Returns:
- A list of strings, where each string is the name of a supported dataset. 
- Return type:
- list[str] 
 - Example usage: - from modelopt.torch.utils import get_supported_vlm_datasets print("Supported datasets:", get_supported_vlm_datasets()) 
- get_vlm_dataset_dataloader(dataset_name='scienceqa', processor=None, batch_size=1, num_samples=512)
- Get a dataloader with the dataset name and processor of the target model. - Parameters:
- dataset_name (str) – Name of the dataset to load. 
- processor (MllamaImageProcessor) – Processor used for encoding images and text data. 
- batch_size (int) – Batch size of the returned dataloader. 
- num_samples (int) – Number of samples from the dataset. 
 
- Returns:
- An instance of dataloader. 
- Return type:
- DataLoader