Skip to content

Llama

Eden11BConfig dataclass

Bases: EdenConfig

Eden-flavoured Llama-3.1 ~14B (keeps all Eden behaviors).

Source code in bionemo/evo2/models/llama.py
59
60
61
62
63
64
65
66
67
68
69
70
71
@dataclass
class Eden11BConfig(EdenConfig):
    """Eden-flavoured Llama-3.1 ~14B (keeps all Eden behaviors)."""

    # If you want long context like Eden-long, bump this; else inherit 8192.
    seq_length: int = 8192  # or remove this line to keep 8192

    # ~14B sizing (head_dim ≈ 128)
    num_layers: int = 36
    hidden_size: int = 5120
    ffn_hidden_size: int = 13824
    num_attention_heads: int = 40
    num_query_groups: int = 8  # GQA (inherited value is also fine if already 8)

Eden18BConfig dataclass

Bases: EdenConfig

Eden-flavoured Llama-3.1 ~18B (keeps all Eden behaviors).

Source code in bionemo/evo2/models/llama.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
@dataclass
class Eden18BConfig(EdenConfig):
    """Eden-flavoured Llama-3.1 ~18B (keeps all Eden behaviors)."""

    # If you want long context like Eden-long, bump this; else inherit 8192.
    seq_length: int = 8192  # or remove this line to keep 8192

    # ~18B sizing (head_dim ≈ 128)
    num_layers: int = 48
    hidden_size: int = 6144
    ffn_hidden_size: int = 16384
    num_attention_heads: int = 48
    num_query_groups: int = 8  # GQA (inherited value is also fine if already 8)
    old_context_len: int = 8192  # or remove this line to keep 8192

Eden21BConfig dataclass

Bases: EdenConfig

Eden-flavoured Llama-3.1 ~21B (keeps all Eden behaviors).

Source code in bionemo/evo2/models/llama.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
@dataclass
class Eden21BConfig(EdenConfig):
    """Eden-flavoured Llama-3.1 ~21B (keeps all Eden behaviors)."""

    seq_length: int = 8192

    # ~21B sizing (head_dim = 128)
    num_layers: int = 42  # 42 layers for 21B target
    hidden_size: int = 7168  # 56 * 128 = 7168 for exact head_dim
    ffn_hidden_size: int = 19456  # ~2.7x hidden_size
    num_attention_heads: int = 56  # Divisible by 8
    num_query_groups: int = 8  # GQA
    old_context_len: int = 8192

Eden24BConfig dataclass

Bases: EdenConfig

Eden-flavoured Llama-3.1 ~8B (keeps all Eden behaviors).

Source code in bionemo/evo2/models/llama.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
@dataclass
class Eden24BConfig(EdenConfig):
    """Eden-flavoured Llama-3.1 ~8B (keeps all Eden behaviors)."""

    # If you want long context like Eden-long, bump this; else inherit 8192.
    seq_length: int = 32768  # or remove this line to keep 8192

    # ~8B sizing (head_dim ≈ 128)
    num_layers: int = 46
    hidden_size: int = 6144
    ffn_hidden_size: int = 23296
    num_attention_heads: int = 48
    num_query_groups: int = 8  # GQA (inherited value is also fine if already 8)
    old_context_len: int = 8192

Eden27BConfig dataclass

Bases: EdenConfig

Eden-flavoured Llama-3.1 ~8B (keeps all Eden behaviors).

Source code in bionemo/evo2/models/llama.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
@dataclass
class Eden27BConfig(EdenConfig):
    """Eden-flavoured Llama-3.1 ~8B (keeps all Eden behaviors)."""

    # If you want long context like Eden-long, bump this; else inherit 8192.
    seq_length: int = 32768  # or remove this line to keep 8192

    # ~8B sizing (head_dim ≈ 128)
    num_layers: int = 46
    hidden_size: int = 6656
    ffn_hidden_size: int = 23296
    num_attention_heads: int = 52
    num_query_groups: int = 8  # GQA (inherited value is also fine if already 8)
    old_context_len: int = 8192

Eden28BConfig dataclass

Bases: EdenConfig

Eden-flavoured Llama-3.1 ~28B (keeps all Eden behaviors).

Source code in bionemo/evo2/models/llama.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
@dataclass
class Eden28BConfig(EdenConfig):
    """Eden-flavoured Llama-3.1 ~28B (keeps all Eden behaviors)."""

    # If you want long context like Eden-long, bump this; else inherit 8192.
    seq_length: int = 8192  # or remove this line to keep 8192

    # ~8B sizing (head_dim ≈ 128)
    num_layers: int = 48
    hidden_size: int = 6144
    ffn_hidden_size: int = 26368
    num_attention_heads: int = 48
    num_query_groups: int = 8  # GQA (inherited value is also fine if already 8)
    old_context_len: int = 8192  # or remove this line to keep 8192

Eden35BConfig dataclass

Bases: EdenConfig

Eden-flavoured Llama-3.1 ~35B (keeps all Eden behaviors).

Source code in bionemo/evo2/models/llama.py
153
154
155
156
157
158
159
160
161
162
163
164
165
@dataclass
class Eden35BConfig(EdenConfig):
    """Eden-flavoured Llama-3.1 ~35B (keeps all Eden behaviors)."""

    seq_length: int = 8192

    # ~35B sizing (head_dim ≈ 128)
    num_layers: int = 64
    hidden_size: int = 7168
    ffn_hidden_size: int = 20480
    num_attention_heads: int = 56
    num_query_groups: int = 8  # GQA
    old_context_len: int = 8192

EdenConfig dataclass

Bases: Llama3Config8B

Eden-flavoured Llama-3.1 ~8B (keeps all Eden behaviors).

Source code in bionemo/evo2/models/llama.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
@dataclass
class EdenConfig(llm.Llama3Config8B):
    """Eden-flavoured Llama-3.1 ~8B (keeps all Eden behaviors)."""

    rotary_base: int = 500_000
    seq_length: int = 8192
    num_layers: int = 32
    hidden_size: int = 4096
    ffn_hidden_size: int = 14336
    num_attention_heads: int = 32

    scale_factor: int = 1
    low_freq_factor: int = 1
    high_freq_factor: int = 4
    old_context_len: int = 8192
    init_method_std: float = 0.02
    embedding_init_method_std: Optional[float] = None

    def configure_model(self, *args, **kwargs):
        """Configure and instantiate a Megatron Core Llama 3.1 model.

        Extends the base configuration with Llama 3.1 specific RoPE scaling.
        """
        model = super(EdenConfig, self).configure_model(*args, **kwargs)
        # Apply rope scaling for Llama3.1 model
        model.rotary_pos_emb.inv_freq = apply_rope_scaling(
            model.rotary_pos_emb.inv_freq,
            factor=self.scale_factor,
            low_freq_factor=self.low_freq_factor,
            high_freq_factor=self.high_freq_factor,
            old_context_len=self.old_context_len,
        )
        return model

configure_model(*args, **kwargs)

Configure and instantiate a Megatron Core Llama 3.1 model.

Extends the base configuration with Llama 3.1 specific RoPE scaling.

Source code in bionemo/evo2/models/llama.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def configure_model(self, *args, **kwargs):
    """Configure and instantiate a Megatron Core Llama 3.1 model.

    Extends the base configuration with Llama 3.1 specific RoPE scaling.
    """
    model = super(EdenConfig, self).configure_model(*args, **kwargs)
    # Apply rope scaling for Llama3.1 model
    model.rotary_pos_emb.inv_freq = apply_rope_scaling(
        model.rotary_pos_emb.inv_freq,
        factor=self.scale_factor,
        low_freq_factor=self.low_freq_factor,
        high_freq_factor=self.high_freq_factor,
        old_context_len=self.old_context_len,
    )
    return model