vlm_dataset_utils

Utility functions for getting samples and forward loop function for different vlm datasets.

Functions

`get_supported_vlm_datasets`	Retrieves a list of vlm datasets supported.
`get_vlm_dataset_dataloader`	Get a dataloader with the dataset name and processor of the target model.

get_supported_vlm_datasets()

Retrieves a list of vlm datasets supported.

Returns:: A list of strings, where each string is the name of a supported dataset.
Return type:: list[str]

Example usage:

from modelopt.torch.utils import get_supported_vlm_datasets

print("Supported datasets:", get_supported_vlm_datasets())

get_vlm_dataset_dataloader(dataset_name='scienceqa', processor=None, batch_size=1, num_samples=512)

Get a dataloader with the dataset name and processor of the target model.

Parameters:

dataset_name (str) – Name of the dataset to load.
processor (MllamaImageProcessor) – Processor used for encoding images and text data.
batch_size (int) – Batch size of the returned dataloader.
num_samples (int) – Number of samples from the dataset.

Returns:

An instance of dataloader.

Return type:

DataLoader