bionemo-core
Common code that all BioNeMo framework packages depend on. Contains highly reusable, battle-tested abstractions and implementations that are valuable across a wide variety of domains and applications.
Crucially, the bionemo-core
Python package (namespace bionemo.core
) depends on PyTorch and PyTorch
Lightning. Other key BioNeMo component libraries, such as bionemo-llm
and
bionemo-geometric
, obtain their PyTorch dependencies via bionemo-core
.
Developer Setup
After following the setup specified in the README, you may install this project's code in your environment via executing:
pip install -e .
To run unit tests with code coverage, execute:
pytest -v --cov=bionemo --cov-report=term .
Package Highlights
In bionemo.core.model.config
:
- ModelOutput
: A Model's forward pass may produce a tensor, multiple tensors, or named tensors.
- LossType
: A generic type parameter for a loss function.
- Model
: An interface for any ML model that accepts and produces torch.Tensor
s.
- ModelType
: A generic type parameter that is constrained to the Model
interface.
- BionemoModelConfig
: An abstract class that enables parameterizable model instantiation that is compatible with Megatron.
- BionemoTrainableModelConfig
: An extension that includes the loss function to use with the model during training.
In bionemo.core.utils
:
- the batching_utils
module's pad_token_ids
, which pads token ids with padding value & returns a mask.
- the dtype
module's get_autocast_dtype
, which converts from nemo/nvidia datatypes to their PyTorch equivalents.
- the random_utils
module, which includes functions for managing random seeds and performing sampling.
In the bionemo.data
package, there is:
- multi_epoch_dataset
: contains many dataset implements that are useful for mutli-epoch training.
- resamplers
: contains a P-RNG based Dataset implementation.
There's a constant global value, bionemo.core.BIONEMO_CACHE_DIR
, which is used as a local on-disk cache for resources.