Api
ESM2Config
dataclass
Bases: ESM2GenericConfig
, IOMixinWithGettersSetters
Configuration class for ESM2 model.
Source code in bionemo/esm2/model/model.py
356 357 358 359 360 361 362 |
|
ESM2GenericConfig
dataclass
Bases: BioBertConfig[ESM2ModelT, MegatronLossType]
Configuration class for ESM2 model.
Attributes:
Name | Type | Description |
---|---|---|
num_layers |
int
|
Number of layers in the model. |
hidden_size |
int
|
Hidden size of the model. |
num_attention_heads |
int
|
Number of attention heads in the model. |
ffn_hidden_size |
int
|
Hidden size of the feed-forward network. |
hidden_dropout |
float
|
Dropout rate for hidden layers. |
attention_dropout |
float
|
Dropout rate for attention layers. |
apply_residual_connection_post_layernorm |
bool
|
Whether to apply residual connection after layer normalization. |
layernorm_epsilon |
float
|
Epsilon value for layer normalization. |
layernorm_zero_centered_gamma |
float
|
Whether to zero-center the gamma parameter in layer normalization. |
activation_func |
Callable
|
Activation function used in the model. |
init_method_std |
float
|
Standard deviation for weight initialization. |
apply_query_key_layer_scaling |
float
|
Whether to apply scaling to query and key layers. |
masked_softmax_fusion |
float
|
Whether to use a kernel that fuses attention softmax with its mask. |
fp16_lm_cross_entropy |
bool
|
Whether to move the cross entropy unreduced loss calculation for lm head to fp16. |
share_embeddings_and_output_weights |
bool
|
Whether to share embeddings and output weights. |
enable_autocast |
bool
|
Whether to enable autocast for mixed precision. |
biobert_spec_option |
BiobertSpecOption
|
BiobertSpecOption for the model. |
position_embedding_type |
PositionEmbeddingKinds
|
Type of position embedding used in the model. |
seq_length |
int
|
Length of the input sequence. |
make_vocab_size_divisible_by |
int
|
Make the vocabulary size divisible by this value. |
token_dropout |
bool
|
Whether to apply token dropout. |
use_attention_mask |
bool
|
Whether to use attention mask. |
use_esm_attention |
bool
|
Whether to use ESM attention. |
attention_softmax_in_fp32 |
bool
|
Whether to use fp32 for attention softmax. |
optimizer_fn |
Optional[Callable[[MegatronBioBertModel], Optimizer]]
|
Optional optimizer function for the model. |
parallel_output |
bool
|
Whether to use parallel output. |
rotary_base |
int
|
Base value for rotary positional encoding. |
rotary_percent |
float
|
Percentage of rotary positional encoding. |
seq_len_interpolation_factor |
Optional[float]
|
Interpolation factor for sequence length. |
get_attention_mask_from_fusion |
Optional[float]
|
Whether to get attention mask from fusion. |
nemo1_ckpt_path |
str | None
|
Path to NEMO1 checkpoint. |
return_only_hidden_states |
bool
|
Whether to return only hidden states. |
loss_reduction_class |
bool
|
Loss reduction class for the model. Default to BERTMLMLossWithReduction. |
Source code in bionemo/esm2/model/model.py
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 |
|
__post_init__()
Check configuration compatibility.
Source code in bionemo/esm2/model/model.py
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 |
|
ESM2Model
Bases: MegatronBioBertModel
ESM2 Transformer language model.
Source code in bionemo/esm2/model/model.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 |
|
__init__(config, num_tokentypes, transformer_layer_spec, vocab_size, max_sequence_length, tokenizer=None, pre_process=True, post_process=True, fp16_lm_cross_entropy=False, parallel_output=True, share_embeddings_and_output_weights=False, position_embedding_type='learned_absolute', rotary_percent=1.0, seq_len_interpolation_factor=None, add_binary_head=True, return_embeddings=False, include_embeddings=False, use_full_attention_mask=False, include_hiddens=False, skip_logits=False)
Initialize the ESM2 model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
TransformerConfig
|
transformer config |
required |
num_tokentypes
|
int
|
Set to 2 when args.bert_binary_head is True, and 0 otherwise. Defaults to 0. |
required |
transformer_layer_spec
|
ModuleSpec
|
Specifies module to use for transformer layers |
required |
vocab_size
|
int
|
vocabulary size |
required |
max_sequence_length
|
int
|
maximum size of sequence. This is used for positional embedding |
required |
tokenizer
|
AutoTokenizer
|
optional tokenizer object (currently only used in the constructor of ESM2Model) |
None
|
pre_process
|
bool
|
Include embedding layer (used with pipeline parallelism) |
True
|
post_process
|
bool
|
Include an output layer (used with pipeline parallelism) |
True
|
fp16_lm_cross_entropy
|
bool
|
Whether to move the cross entropy unreduced loss calculation for lm head to fp16. |
False
|
parallel_output
|
bool
|
Do not gather the outputs, keep them split across tensor parallel ranks |
True
|
share_embeddings_and_output_weights
|
bool
|
When True, input embeddings and output logit weights are shared. Defaults to False. |
False
|
position_embedding_type
|
string
|
Position embedding type. Options ['learned_absolute', 'rope']. Defaults is 'learned_absolute'. |
'learned_absolute'
|
rotary_percent
|
float
|
Percent of rotary dimension to use for rotary position embeddings. Defaults to 1.0 (100%). Ignored unless position_embedding_type is 'rope'. |
1.0
|
seq_len_interpolation_factor
|
Optional[float]
|
Interpolation factor for sequence length. Defaults to None. |
None
|
add_binary_head
|
bool
|
Whether to add a binary head. Defaults to True. |
True
|
return_embeddings
|
bool
|
Whether to return embeddings. Defaults to False. |
False
|
include_embeddings
|
bool
|
Whether to include embeddings in the output dictionary. Defaults to False. |
False
|
use_full_attention_mask
|
bool
|
Whether to use full attention mask. Defaults to False. |
False
|
include_hiddens
|
bool
|
Whether to include hidden states in the output dictionary. Defaults to False. |
False
|
skip_logits
|
bool
|
Skip writing the token logits in output dict |
False
|
Source code in bionemo/esm2/model/model.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
|
embedding_forward(input_ids, position_ids, tokentype_ids=None, attention_mask=None)
Forward pass of the embedding layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids
|
Tensor
|
The input tensor of shape (batch_size, sequence_length) containing the input IDs. |
required |
position_ids
|
Tensor
|
The tensor of shape (batch_size, sequence_length) containing the position IDs. |
required |
tokentype_ids
|
Tensor
|
The tensor of shape (batch_size, sequence_length) containing the token type IDs. Defaults to None. |
None
|
attention_mask
|
Tensor
|
The tensor of shape (batch_size, sequence_length) containing the attention mask. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Tensor |
The output tensor of shape (batch_size, sequence_length, hidden_size) containing the embedded representations. |
Source code in bionemo/esm2/model/model.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 |
|