Batching utils
pad_token_ids(token_ids, padding_value=0, padding_len=None, pad_size_divisible_by=1, **convert_to_kwargs)
Pads token ids with padding value, and return the padded tokens and the corresponding mask.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token_ids
|
Union[List[int], List[Tensor]]
|
List of token ids or tensors |
required |
padding_value
|
int
|
Value to pad with. Defaults to 0. |
0
|
padding_len
|
Optional[int]
|
Max length of the padded token ids. Defaults to None. |
None
|
pad_size_divisible_by
|
int
|
Pad the length of the token ids to be divisible by this number. Defaults to 1. |
1
|
**convert_to_kwargs
|
Passed directly to tensor.to(**kwargs) if provided |
{}
|
Returns:
Type | Description |
---|---|
Tuple[Tensor, Tensor]
|
Tuple[List[int], List[int]]: Padded token ids and mask |
Source code in bionemo/core/utils/batching_utils.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|