nvalchemi.data.AtomicData#
- class nvalchemi.data.AtomicData(*, atomic_numbers, positions, atomic_masses=None, atom_categories=None, neighbor_list=None, shifts=None, neighbor_list_shifts=None, neighbor_matrix=None, neighbor_matrix_shifts=None, num_neighbors=None, cell=None, pbc=None, forces=None, energy=None, stress=None, virial=None, dipole=None, charges=None, charge=None, node_attrs=None, node_alpha_spins=None, node_beta_spins=None, spin=None, graph_alpha_spins=None, node_embeddings=None, edge_embeddings=None, graph_embeddings=None, velocities=None, momenta=None, kinetic_energies=None, info=<factory>, **extra_data)[source]#
Atomic data structure for molecular systems.
Represents molecular systems as graphs with atomic properties and interactions. Uses Pydantic for validation and serialization, with DataMixin for graph functionality.
- Parameters:
atomic_numbers (Annotated[Integer[Tensor, 'V'], FieldInfo(annotation=NoneType, required=True, description='Atomic numbers for each node [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
positions (Annotated[Float[Tensor, 'V 3'], FieldInfo(annotation=NoneType, required=True, description='Cartesian coordinates for each atom [n_nodes, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
atomic_masses (Annotated[Float[Tensor, 'V'] | None, FieldInfo(annotation=NoneType, required=True, description='Atomic masses [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
atom_categories (Annotated[list[AtomCategory] | Integer[Tensor, 'V'] | None, FieldInfo(annotation=NoneType, required=True, description='Atom categorical index, based on _typing.AtomCategory Enum [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
neighbor_list (Annotated[Integer[Tensor, 'E 2'] | None, FieldInfo(annotation=NoneType, required=True, description='Neighbor list [n_edges, 2]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
shifts (Annotated[Float[Tensor, 'E 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Cartesian displacement vectors for each edge (neighbor_list_shifts @ cell) [n_edges, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
neighbor_list_shifts (Annotated[Num[Tensor, 'E 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Integer lattice image indices for periodic edges [n_edges, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
neighbor_matrix (Annotated[Integer[Tensor, 'V K'] | None, FieldInfo(annotation=NoneType, required=True, description='Dense neighbor matrix [n_nodes, max_neighbors]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
neighbor_matrix_shifts (Annotated[Num[Tensor, 'V K 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Periodic shifts for the dense neighbor matrix [n_nodes, max_neighbors, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
num_neighbors (Annotated[Integer[Tensor, 'V'] | None, FieldInfo(annotation=NoneType, required=True, description='Number of valid neighbors per atom [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
cell (Annotated[Float[Tensor, 'B 3 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Unit cell vectors [3, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
pbc (Annotated[Bool[Tensor, 'B 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Boolean tensor indicating periodic boundary conditions along each dimension'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
forces (Annotated[Float[Tensor, 'V 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Atomic forces [n_nodes, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
energy (Annotated[Float[Tensor, 'B 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Total energy [1]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
stress (Annotated[Float[Tensor, 'B 3 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Cauchy stress W/V (eV/A^3) [1, 3, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
virial (Annotated[Float[Tensor, 'B 3 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Virial tensor [1, 3, 3]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
dipole (Annotated[Float[Tensor, 'B 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Dipole moment of the system.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
charges (Annotated[Float[Tensor, 'V'] | None, FieldInfo(annotation=NoneType, required=True, description='Partial atomic charges [n_nodes]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
charge (Annotated[Float[Tensor, 'B 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Total system charge [1]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
node_attrs (Annotated[Float[Tensor, 'V A'] | None, FieldInfo(annotation=NoneType, required=True, description='Node attributes [n_nodes, n_node_attrs]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
node_alpha_spins (Annotated[Float[Tensor, 'V 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Alpha spins for each atom, [n_nodes, 1]. Use this field for closed-shell spins.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
node_beta_spins (Annotated[Float[Tensor, 'V 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Beta spins for each atom, [n_nodes, 1]. For restricted spin, use ``node_alpha_spins`` instead.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
spin (Annotated[Float[Tensor, 'B 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Spin or multiplicity value for the system, [1, 1]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
graph_alpha_spins (Annotated[Float[Tensor, 'B 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Alpha spins for the entire graph, [1, 1]'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
node_embeddings (Annotated[Float[Tensor, 'V H'] | None, FieldInfo(annotation=NoneType, required=True, description='Embeddings for each node within the batch/graph.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
edge_embeddings (Annotated[Float[Tensor, 'E H'] | None, FieldInfo(annotation=NoneType, required=True, description='Embeddings for each edge within the batch/graph.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
graph_embeddings (Annotated[Float[Tensor, 'B H'] | None, FieldInfo(annotation=NoneType, required=True, description='Embeddings for the entire graph/graphs within a batch.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
velocities (Annotated[Float[Tensor, 'V 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Atomic velocities [n_nodes, 3], in units set by positions.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
momenta (Annotated[Float[Tensor, 'V 3'] | None, FieldInfo(annotation=NoneType, required=True, description='Atomic momenta [n_nodes, 3], in units set by positions.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
kinetic_energies (Annotated[Float[Tensor, 'V 1'] | None, FieldInfo(annotation=NoneType, required=True, description='Per-atom kinetic energies [n_nodes, 1], with the same units as energy.'), PlainSerializer(func=~nvalchemi.data.atomic_data._tensor_serialization, return_type=PydanticUndefined, when_used=json)])
info (dict[str, Tensor])
extra_data (Any)
- atomic_numbers#
Atomic numbers of each atom [n_nodes]
- Type:
- positions#
Cartesian coordinates [n_nodes, 3]
- Type:
- atomic_masses#
Atomic masses [n_nodes]
- Type:
- neighbor_list#
Neighbor list [n_edges, 2]
- Type:
- node_attrs#
Node attributes [n_nodes, n_node_feats]
- Type:
- shifts#
Cartesian displacement vectors for each edge [n_edges, 3], computed as
neighbor_list_shifts @ cell.- Type:
- neighbor_list_shifts#
Integer lattice image indices for periodic edges [n_edges, 3].
- Type:
- neighbor_matrix#
Dense neighbor matrix [n_nodes, max_neighbors]
- Type:
- neighbor_matrix_shifts#
Periodic shifts for the dense neighbor matrix [n_nodes, max_neighbors, 3]
- Type:
- num_neighbors#
Number of valid neighbors per atom [n_nodes]
- Type:
- cell#
Unit cell vectors [3, 3]
- Type:
- pbc#
Periodic boundary conditions [3]
- Type:
- forces#
Atomic forces [n_nodes, 3]
- Type:
- energy#
Total energy [1]
- Type:
- stress#
Stress tensor [1, 3, 3]
- Type:
- virial#
Virial tensor [1, 3, 3]
- Type:
- dipole#
Dipole moment [1, 3]
- Type:
- charges#
Partial atomic charges [n_nodes]
- Type:
- charge#
Total system charge [1]
- Type:
- info#
Additional information about the system
- Type:
dict
- add_edge_property(key, value)[source]#
Add an edge property to the graph.
- Parameters:
key (str)
value (Any)
- Return type:
None
- add_node_property(key, value, node_dim=0)[source]#
Add a node property to the graph.
- Parameters:
key (str)
value (Tensor)
node_dim (int)
- Return type:
None
- add_system_property(key, value)[source]#
Add a system property to the graph.
- Parameters:
key (str)
value (Any)
- Return type:
None
- check_edge_consistency()[source]#
Validate that all edge-level properties have consistent atom counts.
This validator runs after all field validators and checks that any edge-level property that is set has the same number of edges as neighbor_list.
- Returns:
Returns self if validation passes.
- Return type:
Self
- Raises:
ValueError – If any edge-level property has an inconsistent number of edges.
- check_fp_dtype_consistency()[source]#
Ensures all floating point tensors are at the same precision as the positions tensor.
- Return type:
- check_node_consistency()[source]#
Validate that all node-level properties have consistent atom counts.
This validator runs after all field validators and checks that any node-level property that is set has the same number of nodes as atomic_numbers.
- Returns:
Returns self if validation passes.
- Return type:
Self
- Raises:
ValueError – If any node-level property has an inconsistent number of nodes.
- property chemical_hash: str#
Generate a unique hash for the chemical system using the blake2s hashing algorithm.
The hash is unique to a given atomic composition and structure, invariant to the ordering of atoms in the data. The hash also differentiates between periodic and non-periodic systems, and for the former, lattice vectors and directions of periodicity.
- Returns:
A
blake2shash string representing the chemical system.- Return type:
str
Notes
The hash is generated by: 1. Sorting atoms by atomic number to ensure invariance to atom ordering 2. Including atomic numbers and positions of sorted atoms 3. Including periodic boundary conditions and cell parameters if present 4. Computing a BLAKE2s hash of the formatted string representation
- property edge_properties: dict[str, Any]#
Get the edge properties of the graph.
- enforce_device_consistency()[source]#
Enforces all tensors to be on the same device.
In instances where the devices of atomic numbers and positions are different, we will try and promote them to offload over host CPU.
- Return type:
- classmethod from_atoms(atoms, energy_key='energy', forces_key='forces', stress_key='stress', virials_key='virials', dipole_key='dipole', charges_key='charges', device='cpu', dtype=torch.float32, z_table=None)[source]#
Create an AtomicData from an ASE-like Atoms object.
Only fields that are actually present in the input object are populated; absent optional fields (energy, forces, stress, virials, dipole, charges) remain
None. The inputatomsobject is not mutated.The returned
infodict contains only tensor-convertible entries fromatoms.info(np.ndarray,list,int,float, and their numpy equivalents).bool,np.bool_, strings, and other types are dropped.- Parameters:
atoms (ase.Atoms) – An ASE Atoms object.
energy_key (str) – Key in
atoms.infofor total energy.forces_key (str) – Key in
atoms.arraysfor atomic forces.stress_key (str) – Key in
atoms.infofor the stress tensor.virials_key (str) – Key in
atoms.infofor the virial tensor.dipole_key (str) – Key in
atoms.infofor the dipole moment.charges_key (str) – Key in
atoms.arraysfor per-atom partial charges.device (str | torch.device) – Target device for all output tensors.
dtype (torch.dtype) – Target floating-point dtype for all output tensors.
z_table (AtomicNumberTable | None) – Atomic number table used to build one-hot node attributes.
- Return type:
- classmethod from_structure(structure, energy_key='energy', forces_key='forces', stress_key='stress', virials_key='virials', dipole_key='dipole', charges_key='charges', device='cpu', dtype=torch.float32, z_table=None)[source]#
Create an AtomicData from a pymatgen Structure or Molecule.
Only fields that are actually present in the input are populated; absent optional fields (energy, forces, stress, virials, dipole, charges) remain
None. The input object is not mutated.The returned
infodict contains tensor-convertible entries fromstructure.properties(np.ndarray,list,int,float, and their numpy equivalents), excluding keys already consumed into dedicated fields. Unsupported types raiseTypeError.Stress and virials accept 3×3 matrices, 6-component Voigt vectors, or 9-component flat vectors (see
voigt_to_matrix()).- Parameters:
structure (pymatgen.core.Structure | pymatgen.core.Molecule) – A pymatgen Structure (periodic) or Molecule (non-periodic). For Molecule,
cellandpbcare set toNone.energy_key (str) – Key in
structure.propertiesfor total energy.forces_key (str) – Key in
structure.site_propertiesfor atomic forces.stress_key (str) – Key in
structure.propertiesfor the stress tensor.virials_key (str) – Key in
structure.propertiesfor the virial tensor.dipole_key (str) – Key in
structure.propertiesfor the dipole moment.charges_key (str) – Key in
structure.site_propertiesfor per-atom partial charges.device (str | torch.device) – Target device for all output tensors.
dtype (torch.dtype) – Target floating-point dtype for all output tensors.
z_table (AtomicNumberTable | None) – Atomic number table used to build one-hot node attributes.
- Return type:
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_AtomicData__context)[source]#
Create per-instance mutable copies of the key sets.
The class-level defaults are frozen to prevent accidental mutation. Each instance gets its own mutable set so that
add_node_propertyand friends only affect the instance they are called on.Uses
model_post_initrather thanmodel_validatorbecausevalidate_assignment=Truecauses model validators to re-run on everysetattrcall, which would reset the key sets and lose previously added custom keys.- Parameters:
_AtomicData__context (Any)
- Return type:
None
- property node_properties: dict[str, Any]#
Get the node properties of the graph.
- property num_edges: int#
Return the number of edges in the graph.
- property num_nodes: int#
Return the number of nodes in the graph.
- property system_properties: dict[str, Any]#
Get the system properties of the graph.
- use_default_categories()[source]#
Check to make sure categories for atoms are set.
In the case that a list is passed, which should be validated by
pydantic, we will convert it to a tensor.- Return type: