Skip to content

Loss

ClassifierLossReduction

Bases: BERTMLMLossWithReduction

A class for calculating the cross entropy loss of classification output.

This class used for calculating the loss, and for logging the reduced loss across micro batches.

Source code in bionemo/esm2/model/finetune/loss.py
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
class ClassifierLossReduction(BERTMLMLossWithReduction):
    """A class for calculating the cross entropy loss of classification output.

    This class used for calculating the loss, and for logging the reduced loss across micro batches.
    """

    def forward(
        self, batch: Dict[str, Tensor], forward_out: Dict[str, Tensor]
    ) -> Tuple[Tensor, PerTokenLossDict | SameSizeLossDict]:
        """Calculates the loss within a micro-batch. A micro-batch is a batch of data on a single GPU.

        Args:
            batch: A batch of data that gets passed to the original forward inside LitAutoEncoder.
            forward_out: the output of the forward method inside classification head.

        Returns:
            A tuple where the loss tensor will be used for backpropagation and the dict will be passed to
            the reduce method, which currently only works for logging.
        """
        targets = batch["labels"].squeeze()  # [b] or [b, s] for sequence-level or token-level classification

        classification_output = forward_out["classification_output"]  # [b, num_class] or [b, s, num_class]
        # [b, s, num_class] -> [b, num_class, s] to satisfy toke-level input dims for cross_entropy loss
        if classification_output.dim() == 3:
            classification_output = classification_output.permute(0, 2, 1)

        loss_mask = batch["loss_mask"]  # [b, s]

        cp_size = parallel_state.get_context_parallel_world_size()
        if cp_size == 1:
            losses = torch.nn.functional.cross_entropy(classification_output, targets, reduction="none")
            # token-level losses may contain NaNs at masked locations. We use masked_select to filter out these NaNs
            if classification_output.dim() == 3:
                masked_loss = torch.masked_select(losses, loss_mask)
                loss = masked_loss.sum() / loss_mask.sum()
            else:
                loss = losses.mean()  # sequence-level single value classification
        else:
            raise NotImplementedError("Context Parallel support is not implemented for this loss")

        return loss, {"avg": loss}

    def reduce(self, losses_reduced_per_micro_batch: Sequence[SameSizeLossDict]) -> Tensor:
        """Works across micro-batches. (data on single gpu).

        Note: This currently only works for logging and this loss will not be used for backpropagation.

        Args:
            losses_reduced_per_micro_batch: a list of the outputs of forward

        Returns:
            A tensor that is the mean of the losses. (used for logging).
        """
        losses = torch.stack([loss["avg"] for loss in losses_reduced_per_micro_batch])
        return losses.mean()

forward(batch, forward_out)

Calculates the loss within a micro-batch. A micro-batch is a batch of data on a single GPU.

Parameters:

Name Type Description Default
batch Dict[str, Tensor]

A batch of data that gets passed to the original forward inside LitAutoEncoder.

required
forward_out Dict[str, Tensor]

the output of the forward method inside classification head.

required

Returns:

Type Description
Tensor

A tuple where the loss tensor will be used for backpropagation and the dict will be passed to

PerTokenLossDict | SameSizeLossDict

the reduce method, which currently only works for logging.

Source code in bionemo/esm2/model/finetune/loss.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
def forward(
    self, batch: Dict[str, Tensor], forward_out: Dict[str, Tensor]
) -> Tuple[Tensor, PerTokenLossDict | SameSizeLossDict]:
    """Calculates the loss within a micro-batch. A micro-batch is a batch of data on a single GPU.

    Args:
        batch: A batch of data that gets passed to the original forward inside LitAutoEncoder.
        forward_out: the output of the forward method inside classification head.

    Returns:
        A tuple where the loss tensor will be used for backpropagation and the dict will be passed to
        the reduce method, which currently only works for logging.
    """
    targets = batch["labels"].squeeze()  # [b] or [b, s] for sequence-level or token-level classification

    classification_output = forward_out["classification_output"]  # [b, num_class] or [b, s, num_class]
    # [b, s, num_class] -> [b, num_class, s] to satisfy toke-level input dims for cross_entropy loss
    if classification_output.dim() == 3:
        classification_output = classification_output.permute(0, 2, 1)

    loss_mask = batch["loss_mask"]  # [b, s]

    cp_size = parallel_state.get_context_parallel_world_size()
    if cp_size == 1:
        losses = torch.nn.functional.cross_entropy(classification_output, targets, reduction="none")
        # token-level losses may contain NaNs at masked locations. We use masked_select to filter out these NaNs
        if classification_output.dim() == 3:
            masked_loss = torch.masked_select(losses, loss_mask)
            loss = masked_loss.sum() / loss_mask.sum()
        else:
            loss = losses.mean()  # sequence-level single value classification
    else:
        raise NotImplementedError("Context Parallel support is not implemented for this loss")

    return loss, {"avg": loss}

reduce(losses_reduced_per_micro_batch)

Works across micro-batches. (data on single gpu).

Note: This currently only works for logging and this loss will not be used for backpropagation.

Parameters:

Name Type Description Default
losses_reduced_per_micro_batch Sequence[SameSizeLossDict]

a list of the outputs of forward

required

Returns:

Type Description
Tensor

A tensor that is the mean of the losses. (used for logging).

Source code in bionemo/esm2/model/finetune/loss.py
120
121
122
123
124
125
126
127
128
129
130
131
132
def reduce(self, losses_reduced_per_micro_batch: Sequence[SameSizeLossDict]) -> Tensor:
    """Works across micro-batches. (data on single gpu).

    Note: This currently only works for logging and this loss will not be used for backpropagation.

    Args:
        losses_reduced_per_micro_batch: a list of the outputs of forward

    Returns:
        A tensor that is the mean of the losses. (used for logging).
    """
    losses = torch.stack([loss["avg"] for loss in losses_reduced_per_micro_batch])
    return losses.mean()

RegressorLossReduction

Bases: BERTMLMLossWithReduction

A class for calculating the MSE loss of regression output.

This class used for calculating the loss, and for logging the reduced loss across micro batches.

Source code in bionemo/esm2/model/finetune/loss.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
class RegressorLossReduction(BERTMLMLossWithReduction):
    """A class for calculating the MSE loss of regression output.

    This class used for calculating the loss, and for logging the reduced loss across micro batches.
    """

    def forward(
        self, batch: Dict[str, Tensor], forward_out: Dict[str, Tensor]
    ) -> Tuple[Tensor, PerTokenLossDict | SameSizeLossDict]:
        """Calculates the loss within a micro-batch. A micro-batch is a batch of data on a single GPU.

        Args:
            batch: A batch of data that gets passed to the original forward inside LitAutoEncoder.
            forward_out: the output of the forward method inside classification head.

        Returns:
            A tuple containing [<loss_tensor>, ReductionT] where the loss tensor will be used for
                backpropagation and the ReductionT will be passed to the reduce method
                (which currently only works for logging.).
        """
        regression_output = forward_out["regression_output"]
        targets = batch["labels"].to(dtype=regression_output.dtype)  # [b, 1]

        cp_size = parallel_state.get_context_parallel_world_size()
        if cp_size == 1:
            loss = torch.nn.functional.mse_loss(regression_output, targets)
        else:
            raise NotImplementedError("Context Parallel support is not implemented for this loss")

        return loss, {"avg": loss}

    def reduce(self, losses_reduced_per_micro_batch: Sequence[SameSizeLossDict]) -> Tensor:
        """Works across micro-batches. (data on single gpu).

        Note: This currently only works for logging and this loss will not be used for backpropagation.

        Args:
            losses_reduced_per_micro_batch: a list of the outputs of forward

        Returns:
            A tensor that is the mean of the losses. (used for logging).
        """
        losses = torch.stack([loss["avg"] for loss in losses_reduced_per_micro_batch])
        return losses.mean()

forward(batch, forward_out)

Calculates the loss within a micro-batch. A micro-batch is a batch of data on a single GPU.

Parameters:

Name Type Description Default
batch Dict[str, Tensor]

A batch of data that gets passed to the original forward inside LitAutoEncoder.

required
forward_out Dict[str, Tensor]

the output of the forward method inside classification head.

required

Returns:

Type Description
Tuple[Tensor, PerTokenLossDict | SameSizeLossDict]

A tuple containing [, ReductionT] where the loss tensor will be used for backpropagation and the ReductionT will be passed to the reduce method (which currently only works for logging.).

Source code in bionemo/esm2/model/finetune/loss.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def forward(
    self, batch: Dict[str, Tensor], forward_out: Dict[str, Tensor]
) -> Tuple[Tensor, PerTokenLossDict | SameSizeLossDict]:
    """Calculates the loss within a micro-batch. A micro-batch is a batch of data on a single GPU.

    Args:
        batch: A batch of data that gets passed to the original forward inside LitAutoEncoder.
        forward_out: the output of the forward method inside classification head.

    Returns:
        A tuple containing [<loss_tensor>, ReductionT] where the loss tensor will be used for
            backpropagation and the ReductionT will be passed to the reduce method
            (which currently only works for logging.).
    """
    regression_output = forward_out["regression_output"]
    targets = batch["labels"].to(dtype=regression_output.dtype)  # [b, 1]

    cp_size = parallel_state.get_context_parallel_world_size()
    if cp_size == 1:
        loss = torch.nn.functional.mse_loss(regression_output, targets)
    else:
        raise NotImplementedError("Context Parallel support is not implemented for this loss")

    return loss, {"avg": loss}

reduce(losses_reduced_per_micro_batch)

Works across micro-batches. (data on single gpu).

Note: This currently only works for logging and this loss will not be used for backpropagation.

Parameters:

Name Type Description Default
losses_reduced_per_micro_batch Sequence[SameSizeLossDict]

a list of the outputs of forward

required

Returns:

Type Description
Tensor

A tensor that is the mean of the losses. (used for logging).

Source code in bionemo/esm2/model/finetune/loss.py
63
64
65
66
67
68
69
70
71
72
73
74
75
def reduce(self, losses_reduced_per_micro_batch: Sequence[SameSizeLossDict]) -> Tensor:
    """Works across micro-batches. (data on single gpu).

    Note: This currently only works for logging and this loss will not be used for backpropagation.

    Args:
        losses_reduced_per_micro_batch: a list of the outputs of forward

    Returns:
        A tensor that is the mean of the losses. (used for logging).
    """
    losses = torch.stack([loss["avg"] for loss in losses_reduced_per_micro_batch])
    return losses.mean()