Statistics and Metrics#
Use statistics to reduce data over dimensions (for example, computing means or variances). Use metrics to compare two inputs (for example, correlation or bias between forecast and observation).
Statistics#
Statistics are distinct from prognostic and diagnostic models in principle because we assume that statistics reduce existing coordinates so that the output tensors have a coordinate system that is a subset of the input coordinate system. This makes statistics less flexible than diagnostic models while having fewer API requirements.
In this section, “statistic” refers to a single reduction operation; “statistics” refers to the class of such operations.
Statistics Interface#
Statistics API only specifies a __call__() method that matches similar methods
across the package.
@runtime_checkable
class Statistic(Protocol):
"""Statistic interface."""
@property
def reduction_dimensions(self) -> list[str]:
"""Gives the input dimensions of which the statistic performs a reduction
over. The is used to determine, a priori, the output dimensions of a statistic.
"""
pass
def output_coords(self, input_coords: CoordSystem) -> CoordSystem:
"""Output coordinate system of the computed statistic, corresponding to the given input coordinates
Parameters
----------
input_coords : CoordSystem
Input coordinate system to transform into output_coords
Returns
The base API hints at, and inspection of the earth2studio.statistics.moments
examples reveals, the use of a few properties to make statistic handling easier:
reduction_dimensions, which are a list of dimensions that will be reduced overweights, which must be broadcastable withreduction_dimensionsbatch_update, which is useful for applying statistics when data comes in streams and batches
Where applicable, specified reduction_dimensions set a requirement for the
coordinates passed in the call method.
Custom Statistics#
To integrate your own statistic, satisfy the interface above. We recommend that you review the custom statistic example in Extending Earth2Studio.
Metrics#
Like statistics, metrics are reductions across existing dimensions. Unlike statistics, which are usually defined over a single input, we define metrics to take a pair of inputs. Otherwise, the API and requirements are similar to the statistics requirements.
Metrics Interface#
) -> tuple[torch.Tensor, CoordSystem]:
"""Apply statistic to data `x`, with coordinates `coords` and reduce
over dimensions `reduction_dimensions`.
Parameters
----------
x : torch.Tensor
Input tensor intended to apply statistic to.
coords : CoordSystem
Ordered dict representing coordinate system that describes the tensor.
`reduction_dimensions` must be in coords.
"""
pass
@runtime_checkable
class Metric(Protocol):
"""Metrics interface."""
@property
def reduction_dimensions(self) -> list[str]:
pass
def output_coords(self, input_coords: CoordSystem) -> CoordSystem:
"""Output coordinate system of the computed statistic, corresponding to the given input coordinates
Parameters
----------
input_coords : CoordSystem
Input coordinate system to transform into output_coords
Returns
-------
CoordSystem
Coordinate system dictionary
"""
pass
def __call__(
self,
x: torch.Tensor,
x_coords: CoordSystem,
y: torch.Tensor,
y_coords: CoordSystem,
) -> tuple[torch.Tensor, CoordSystem]:
"""Apply metric to data `x` and `y`, checking that their coordinates
are broadcastable. While reducing over `reduction_dimensions`.
Parameters
----------
x : torch.Tensor
Input tensor #1 intended to apply metric to. `x` is typically understood
to be the forecast or prediction tensor.
x_coords : CoordSystem
Ordered dict representing coordinate system that describes the `x` tensor.
`reduction_dimensions` must be in coords.
y : torch.Tensor
Input tensor #2 intended to apply statistic to. `y` is typically the observation
or validation tensor.
y_coords : CoordSystem
Ordered dict representing coordinate system that describes the `y` tensor.
`reduction_dimensions` must be in coords.
"""
pass
Contributing Statistics and Metrics#
Want to add your own statistics or metrics to the package? We are happy to
work with you. At the minimum we expect the statistic or metric to abide by the interfaces defined
above. We can also work with you to ensure that there are reduction_dimensions
applicable and, if possible, weight and batching support.