Typing module (type aliases, enums, protocols)#

Type definitions for molecular and crystal graph representations.

For tensors, we use jaxtyping to annotate the types, particularly to note their data type and shapes.

The notation we will use for shapes is as follows: - B: Batch size - V: Number of nodes (vertices) - E: Number of edges - H: Hidden feature dimensionality - A: Number of attributes - C: Number of centroids - M: Number of ensemble members - K: Number of max neighbors - 3: Number of dimensions for coordinates

The notation for data types is as follows: - Dimensionalities are assumed to be batch-able; i.e. there are redundant dimensions for things like charges, spins, and energy. For concatenated properties like atomic number and masses, they do not have a redundant dimension. - masses refers to the atomic masses, assumed in amu. - Coordinates can refer to fractional or Cartesian coordinates. - Charges can refer to partial or total charges, and can be graph and node level.

class nvalchemi._typing.AbstractQueue(*args, **kwargs)[source]

Bases: Protocol[T]

Represents a generic queue interface; the requirements are that the queue can be used to put and get items of type T.

We do not require that the queue is thread-safe or process-safe, nor the ordering of the items within the queue.

get()[source]

Remove an item from the queue.

Returns:: item – The item removed from the queue.
Return type:: T

put(item)[source]

Add an item to the queue.

The item can be placed anywhere in the queue by the concrete implementation.

Parameters:: item (T) – The item to add to the queue.
Return type:: None

class nvalchemi._typing.AtomCategory(*values)[source]

Bases: Enum

Categorical mapping for atom classifications within a system.

This can be used to distinguish between different atoms during modeling, such as applying different kinds of constraints to them during training or inference (e.g. freezing dynamics for surface and bulk atoms).

This Enum should encompass as many modeling types as possible, and is not limited to condensed phase modeling.

The categories are as follows: - GAS: Gas phase atoms. - LIQUID: Liquid phase, or solvent atoms. - GL_INTERFACE: Gas-liquid interface atoms. - SURFACE: Surface atoms. - GS_INTERFACE: Gas-surface interface atoms. - BULK: Bulk atoms; typically those that can be assumed to be non-interacting. - SB_INTERFACE: Solid-bulk interface atoms. - FRAGMENT: Fragment atoms - CLUSTER: Atoms consituting clusters. - TERMINAL: Terminal atoms in molecules. - CENTRAL: Central atoms in molecules. - SPECIAL: Atoms designated to be generically special.

While the categories are meant to be mapped to their respective chemistries, it would also be valid to just treat the Enum as distinct types without the mapping (e.g. 0/1 to differentiate between arbitrary types). In the binary case, where you just have two atom categories, we recommend using 0/-1, with SPECIAL atoms being used for your particular operation.

BULK = 5

CENTRAL = 10

CLUSTER = 8

FRAGMENT = 7

GAS = 0

GL_INTERFACE = 2

GS_INTERFACE = 4

LIQUID = 1

SB_INTERFACE = 6

SPECIAL = -1

SURFACE = 3

TERMINAL = 9

class nvalchemi._typing.AtomsLike(*args, **kwargs)[source]

Bases: Protocol

Represents the minimum viable data structure that is agnostic to batch and unbatched atomic data.

This is only intended for use when type-hinting, and when the concrete cases can be used (e.g. AtomicData or Batch), those should be used instead of this.

atomic_numbers

1D tensor containing atomic numbers.

Type:: AtomicNumbers

positions

2D tensor containing atomic positions.

Type:: NodePositions

cell

3D tensor containing lattice parameters for each structure within a batch.

Type:: LatticeVectors

atomic_numbers: Integer[Tensor, 'V']

cell: Float[Tensor, 'B 3 3'] | None

energy: Float[Tensor, 'B 1'] | None

forces: Float[Tensor, 'V 3'] | None

positions: Float[Tensor, 'V 3']

class nvalchemi._typing.EmbeddingModel(*args, **kwargs)[source]

Bases: Protocol

A protocol that defines an abstract interface for retrieving graph level embeddings from a model, given some data samples.

compute_embeddings(samples)[source]

Interface that will compute embeddings for a single or batch of samples.

Parameters:: samples (AtomicData | Batch) – The samples to compute the embeddings for.
Returns:: graph_embeddings – The graph embeddings for the samples.
Return type:: GraphEmbeddings