nvalchemi.data.AtomicData#

class nvalchemi.data.AtomicData(*, atomic_numbers, positions, atomic_masses=None, atom_categories=None, neighbor_list=None, shifts=None, neighbor_list_shifts=None, neighbor_matrix=None, neighbor_matrix_shifts=None, num_neighbors=None, cell=None, pbc=None, forces=None, energy=None, stress=None, virial=None, dipole=None, charges=None, charge=None, node_attrs=None, node_alpha_spins=None, node_beta_spins=None, spin=None, graph_alpha_spins=None, node_embeddings=None, edge_embeddings=None, graph_embeddings=None, velocities=None, momenta=None, kinetic_energies=None, info=<factory>, **extra_data)[source]#

Atomic data structure for molecular systems.

Represents molecular systems as graphs with atomic properties and interactions. Uses Pydantic for validation and serialization, with DataMixin for graph functionality.

Parameters:
  • atomic_numbers (Annotated[Integer[Tensor, 'V'], FieldInfo(annotation=NoneType, required=True, description='Atomic numbers for each node [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • positions (Annotated[Float[Tensor, 'V 3'], FieldInfo(annotation=NoneType, required=True, description='Cartesian coordinates for each atom [n_nodes, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • atomic_masses (Annotated[Float[Tensor, 'V'] | None, FieldInfo(annotation=NoneType, required=True, description='Atomic masses [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • atom_categories (Annotated[list[AtomCategory] | Integer[Tensor, 'V'] | None, FieldInfo(annotation=NoneType, required=True, description='Atom categorical index, based on _typing.AtomCategory Enum [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • neighbor_list (Annotated[Integer[Tensor, 'E 2'] | None, FieldInfo(annotation=NoneType, required=True, description='Neighbor list [n_edges, 2]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • shifts (Annotated[Float[Tensor, 'E 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Cartesian displacement vectors for each edge (neighbor_list_shifts @ cell) [n_edges, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • neighbor_list_shifts (Annotated[Num[Tensor, 'E 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Integer lattice image indices for periodic edges [n_edges, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • neighbor_matrix (Annotated[Integer[Tensor, 'V K'] | None, FieldInfo(annotation=NoneType, required=True, description='Dense neighbor matrix [n_nodes, max_neighbors]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • neighbor_matrix_shifts (Annotated[Num[Tensor, 'V K 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Periodic shifts for the dense neighbor matrix [n_nodes, max_neighbors, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • num_neighbors (Annotated[Integer[Tensor, 'V'] | None, FieldInfo(annotation=NoneType, required=True, description='Number of valid neighbors per atom [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • cell (Annotated[Float[Tensor, 'B 3 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Unit cell vectors [3, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • pbc (Annotated[Bool[Tensor, 'B 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Boolean tensor indicating periodic boundary conditions along each dimension'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • forces (Annotated[Float[Tensor, 'V 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Atomic forces [n_nodes, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • energy (Annotated[Float[Tensor, 'B 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Total energy [1]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • stress (Annotated[Float[Tensor, 'B 3 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Cauchy stress W/V (eV/A^3) [1, 3, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • virial (Annotated[Float[Tensor, 'B 3 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Virial tensor [1, 3, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • dipole (Annotated[Float[Tensor, 'B 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Dipole moment of the system.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • charges (Annotated[Float[Tensor, 'V'] | None, FieldInfo(annotation=NoneType, required=True, description='Partial atomic charges [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • charge (Annotated[Float[Tensor, 'B 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Total system charge [1]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • node_attrs (Annotated[Float[Tensor, 'V A'] | None, FieldInfo(annotation=NoneType, required=True, description='Node attributes [n_nodes, n_node_attrs]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • node_alpha_spins (Annotated[Float[Tensor, 'V 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Alpha spins for each atom, [n_nodes, 1]. Use this field for closed-shell spins.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • node_beta_spins (Annotated[Float[Tensor, 'V 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Beta spins for each atom, [n_nodes, 1]. For restricted spin, use ``node_alpha_spins`` instead.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • spin (Annotated[Float[Tensor, 'B 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Spin or multiplicity value for the system, [1, 1]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • graph_alpha_spins (Annotated[Float[Tensor, 'B 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Alpha spins for the entire graph, [1, 1]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • node_embeddings (Annotated[Float[Tensor, 'V H'] | None, FieldInfo(annotation=NoneType, required=True, description='Embeddings for each node within the batch/graph.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • edge_embeddings (Annotated[Float[Tensor, 'E H'] | None, FieldInfo(annotation=NoneType, required=True, description='Embeddings for each edge within the batch/graph.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • graph_embeddings (Annotated[Float[Tensor, 'B H'] | None, FieldInfo(annotation=NoneType, required=True, description='Embeddings for the entire graph/graphs within a batch.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • velocities (Annotated[Float[Tensor, 'V 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Atomic velocities [n_nodes, 3], in units set by positions.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • momenta (Annotated[Float[Tensor, 'V 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Atomic momenta [n_nodes, 3], in units set by positions.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • kinetic_energies (Annotated[Float[Tensor, 'V 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Per-atom kinetic energies [n_nodes, 1], with the same units as energy.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])

  • info (dict[str, Tensor])

  • extra_data (Any)

atomic_numbers#

Atomic numbers of each atom [n_nodes]

Type:

torch.Tensor

positions#

Cartesian coordinates [n_nodes, 3]

Type:

torch.Tensor

atomic_masses#

Atomic masses [n_nodes]

Type:

torch.Tensor

neighbor_list#

Neighbor list [n_edges, 2]

Type:

torch.Tensor

node_attrs#

Node attributes [n_nodes, n_node_feats]

Type:

torch.Tensor

shifts#

Cartesian displacement vectors for each edge [n_edges, 3], computed as neighbor_list_shifts @ cell.

Type:

torch.Tensor

neighbor_list_shifts#

Integer lattice image indices for periodic edges [n_edges, 3].

Type:

torch.Tensor

neighbor_matrix#

Dense neighbor matrix [n_nodes, max_neighbors]

Type:

torch.Tensor

neighbor_matrix_shifts#

Periodic shifts for the dense neighbor matrix [n_nodes, max_neighbors, 3]

Type:

torch.Tensor

num_neighbors#

Number of valid neighbors per atom [n_nodes]

Type:

torch.Tensor

cell#

Unit cell vectors [3, 3]

Type:

torch.Tensor

pbc#

Periodic boundary conditions [3]

Type:

torch.Tensor

forces#

Atomic forces [n_nodes, 3]

Type:

torch.Tensor

energy#

Total energy [1]

Type:

torch.Tensor

stress#

Stress tensor [1, 3, 3]

Type:

torch.Tensor

virial#

Virial tensor [1, 3, 3]

Type:

torch.Tensor

dipole#

Dipole moment [1, 3]

Type:

torch.Tensor

charges#

Partial atomic charges [n_nodes]

Type:

torch.Tensor

charge#

Total system charge [1]

Type:

torch.Tensor

info#

Additional information about the system

Type:

dict

add_edge_property(key, value)[source]#

Add an edge property to the graph.

Parameters:
  • key (str)

  • value (Any)

Return type:

None

add_node_property(key, value, node_dim=0)[source]#

Add a node property to the graph.

Parameters:
  • key (str)

  • value (Tensor)

  • node_dim (int)

Return type:

None

add_system_property(key, value)[source]#

Add a system property to the graph.

Parameters:
  • key (str)

  • value (Any)

Return type:

None

check_edge_consistency()[source]#

Validate that all edge-level properties have consistent atom counts.

This validator runs after all field validators and checks that any edge-level property that is set has the same number of edges as neighbor_list.

Returns:

Returns self if validation passes.

Return type:

Self

Raises:

ValueError – If any edge-level property has an inconsistent number of edges.

check_fp_dtype_consistency()[source]#

Ensures all floating point tensors are at the same precision as the positions tensor.

Return type:

AtomicData

check_node_consistency()[source]#

Validate that all node-level properties have consistent atom counts.

This validator runs after all field validators and checks that any node-level property that is set has the same number of nodes as atomic_numbers.

Returns:

Returns self if validation passes.

Return type:

Self

Raises:

ValueError – If any node-level property has an inconsistent number of nodes.

property chemical_hash: str#

Generate a unique hash for the chemical system using the blake2s hashing algorithm.

The hash is unique to a given atomic composition and structure, invariant to the ordering of atoms in the data. The hash also differentiates between periodic and non-periodic systems, and for the former, lattice vectors and directions of periodicity.

Returns:

A blake2s hash string representing the chemical system.

Return type:

str

Notes

The hash is generated by: 1. Sorting atoms by atomic number to ensure invariance to atom ordering 2. Including atomic numbers and positions of sorted atoms 3. Including periodic boundary conditions and cell parameters if present 4. Computing a BLAKE2s hash of the formatted string representation

property device: device#

Get the device of the positions tensor.

property dtype: dtype#

Get the dtype of the positions tensor.

property edge_properties: dict[str, Any]#

Get the edge properties of the graph.

enforce_device_consistency()[source]#

Enforces all tensors to be on the same device.

In instances where the devices of atomic numbers and positions are different, we will try and promote them to offload over host CPU.

Return type:

AtomicData

classmethod from_atoms(atoms, energy_key='energy', forces_key='forces', stress_key='stress', virials_key='virials', dipole_key='dipole', charges_key='charges', device='cpu', dtype=torch.float32, z_table=None)[source]#

Create an AtomicData from an ASE-like Atoms object.

Only fields that are actually present in the input object are populated; absent optional fields (energy, forces, stress, virials, dipole, charges) remain None. The input atoms object is not mutated.

The returned info dict contains only tensor-convertible entries from atoms.info (np.ndarray, list, int, float, and their numpy equivalents). bool, np.bool_, strings, and other types are dropped.

Parameters:
  • atoms (ase.Atoms) – An ASE Atoms object.

  • energy_key (str) – Key in atoms.info for total energy.

  • forces_key (str) – Key in atoms.arrays for atomic forces.

  • stress_key (str) – Key in atoms.info for the stress tensor.

  • virials_key (str) – Key in atoms.info for the virial tensor.

  • dipole_key (str) – Key in atoms.info for the dipole moment.

  • charges_key (str) – Key in atoms.arrays for per-atom partial charges.

  • device (str | torch.device) – Target device for all output tensors.

  • dtype (torch.dtype) – Target floating-point dtype for all output tensors.

  • z_table (AtomicNumberTable | None) – Atomic number table used to build one-hot node attributes.

Return type:

AtomicData

classmethod from_structure(structure, energy_key='energy', forces_key='forces', stress_key='stress', virials_key='virials', dipole_key='dipole', charges_key='charges', device='cpu', dtype=torch.float32, z_table=None)[source]#

Create an AtomicData from a pymatgen Structure or Molecule.

Only fields that are actually present in the input are populated; absent optional fields (energy, forces, stress, virials, dipole, charges) remain None. The input object is not mutated.

The returned info dict contains tensor-convertible entries from structure.properties (np.ndarray, list, int, float, and their numpy equivalents), excluding keys already consumed into dedicated fields. Unsupported types raise TypeError.

Stress and virials accept 3×3 matrices, 6-component Voigt vectors, or 9-component flat vectors (see voigt_to_matrix()).

Parameters:
  • structure (pymatgen.core.Structure | pymatgen.core.Molecule) – A pymatgen Structure (periodic) or Molecule (non-periodic). For Molecule, cell and pbc are set to None.

  • energy_key (str) – Key in structure.properties for total energy.

  • forces_key (str) – Key in structure.site_properties for atomic forces.

  • stress_key (str) – Key in structure.properties for the stress tensor.

  • virials_key (str) – Key in structure.properties for the virial tensor.

  • dipole_key (str) – Key in structure.properties for the dipole moment.

  • charges_key (str) – Key in structure.site_properties for per-atom partial charges.

  • device (str | torch.device) – Target device for all output tensors.

  • dtype (torch.dtype) – Target floating-point dtype for all output tensors.

  • z_table (AtomicNumberTable | None) – Atomic number table used to build one-hot node attributes.

Return type:

AtomicData

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_AtomicData__context)[source]#

Create per-instance mutable copies of the key sets.

The class-level defaults are frozen to prevent accidental mutation. Each instance gets its own mutable set so that add_node_property and friends only affect the instance they are called on.

Uses model_post_init rather than model_validator because validate_assignment=True causes model validators to re-run on every setattr call, which would reset the key sets and lose previously added custom keys.

Parameters:

_AtomicData__context (Any)

Return type:

None

property node_properties: dict[str, Any]#

Get the node properties of the graph.

property num_edges: int#

Return the number of edges in the graph.

property num_nodes: int#

Return the number of nodes in the graph.

property system_properties: dict[str, Any]#

Get the system properties of the graph.

use_default_categories()[source]#

Check to make sure categories for atoms are set.

In the case that a list is passed, which should be validated by pydantic, we will convert it to a tensor.

Return type:

AtomicData

use_default_masses()[source]#

If no atomic masses are set, automatically fill in with default masses from periodictable.

Returns:

Returns self if validation passes.

Return type:

Self

use_default_velocities()[source]#

If no velocities are set, initialize as zeros with proper shape and dtype.

Returns:

Returns self if validation passes.

Return type:

Self