dataloaders
DataLoader utilities for language model training and validation.
Functions
- create_padded_tensor(tensor, desired_shape, padding_value=0)
- Parameters:
tensor (TensorT)
desired_shape (Sequence[int])
padding_value (float)
- Return type:
TensorT
- create_validation_dataloader(accelerator, seed, tokenizer, block_size, dataset, content_field, fim_rate, fim_spm_rate, micro_batch_size, eval_samples=None, load_dataset_fn=<function load_from_disk_fn>, dataset_name='__auto__', keep_in_memory=False, source_datasets_to_discard=(), bos_rate=1.0, varlen=True, shuffle_seed=None)
- Parameters:
accelerator (Accelerator | None)
seed (int)
tokenizer (PreTrainedTokenizerBase)
block_size (int)
dataset (str | Mapping[str, Dataset])
content_field (str)
fim_rate (float)
fim_spm_rate (float)
micro_batch_size (int)
eval_samples (int | None)
load_dataset_fn (LoadDatasetFn)
dataset_name (str)
keep_in_memory (bool)
source_datasets_to_discard (Sequence[str])
bos_rate (float)
varlen (bool)
shuffle_seed (int | None)