Mask
DiscreteMaskedPrior
Bases: DiscretePriorDistribution
A subclass representing a Discrete Masked prior distribution.
Source code in bionemo/moco/distributions/prior/discrete/mask.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
__init__(num_classes=10, mask_dim=None, inclusive=True)
Discrete Masked prior distribution.
Theres 3 ways I can think of defining the problem that are hard to mesh together.
- [..., M, ....] inclusive anywhere --> exisiting LLM tokenizer where the mask has a specific location not at the end
- [......, M] inclusive on end --> mask_dim = None with inclusive set to True default stick on the end
- [.....] + [M] exclusive --> the number of classes representes the number of data classes and one wishes to add a separate MASK dimension.
- Note the pad_sample function is provided to help add this extra external dimension.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_classes
|
int
|
The number of classes in the distribution. Defaults to 10. |
10
|
mask_dim
|
int
|
The index for the mask token. Defaults to num_classes - 1 if inclusive or num_classes if exclusive. |
None
|
inclusive
|
bool
|
Whether the mask is included in the specified number of classes. If True, the mask is considered as one of the classes. If False, the mask is considered as an additional class. Defaults to True. |
True
|
Source code in bionemo/moco/distributions/prior/discrete/mask.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
is_masked(sample)
Creates a mask for whether a state is masked.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample
|
Tensor
|
The sample to check. |
required |
Returns:
Name | Type | Description |
---|---|---|
Tensor |
Tensor
|
A float tensor indicating whether the sample is masked. |
Source code in bionemo/moco/distributions/prior/discrete/mask.py
88 89 90 91 92 93 94 95 96 97 |
|
pad_sample(sample)
Pads the input sample with zeros along the last dimension.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample
|
Tensor
|
The input sample to be padded. |
required |
Returns:
Name | Type | Description |
---|---|---|
Tensor |
Tensor
|
The padded sample. |
Source code in bionemo/moco/distributions/prior/discrete/mask.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
sample(shape, mask=None, device='cpu', rng_generator=None)
Generates a specified number of samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shape
|
Tuple
|
The shape of the samples to generate. |
required |
device
|
str
|
cpu or gpu. |
'cpu'
|
mask
|
Optional[Tensor]
|
An optional mask to apply to the samples. Defaults to None. |
None
|
rng_generator
|
Optional[Generator]
|
An optional :class: |
None
|
Returns:
Name | Type | Description |
---|---|---|
Float |
Tensor
|
A tensor of samples. |
Source code in bionemo/moco/distributions/prior/discrete/mask.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|