Crash Simulation ETL Pipeline#

This example demonstrates a Source → Filter → Sink pipeline for curating LS-DYNA crash simulation data.

Automotive crash simulations produce multi-timestep shell meshes stored in the d3plot binary format. The pipeline reads these files with D3PlotSource, removes non-deforming wall nodes with WallNodeFilter, logs mesh metadata, converts fields to single precision, and writes the processed meshes to disk.

Note

This example requires the lasso-python package for reading d3plot files. Install it with pip install lasso-python.

Imports#

Import the pipeline building blocks: a Source for LS-DYNA d3plot data, the WallNodeFilter for removing non-deforming boundary nodes, informational and precision filters, a Sink for writing outputs, and run_pipeline() for parallel execution.

from physicsnemo_curator.domains.mesh.filters.mesh_info import MeshInfoFilter
from physicsnemo_curator.domains.mesh.filters.precision import PrecisionFilter
from physicsnemo_curator.domains.mesh.filters.wall_node import WallNodeFilter
from physicsnemo_curator.domains.mesh.sinks.mesh_writer import MeshSink
from physicsnemo_curator.domains.mesh.sources.d3plot import D3PlotSource
from physicsnemo_curator.run import run_pipeline

Configure the Source#

D3PlotSource scans input_dir for subdirectories containing a d3plot file. Each subdirectory corresponds to one crash simulation run.

Set read_stress=True to include von Mises stress and effective plastic strain as cell data fields. Set read_k_file=True to parse companion .k keyword files for per-node shell thickness.

INPUT_DIR = "/data/crash_simulations"

source = D3PlotSource(
    input_dir=INPUT_DIR,
    read_stress=True,
    read_k_file=True,
)

Build the Pipeline#

Chain several filters in order:

  1. WallNodeFilter — Removes non-deforming “wall” nodes whose maximum displacement variation across all timesteps falls below a threshold. This typically removes 30–60% of nodes, significantly reducing dataset size while preserving the structural response.

  2. MeshInfoFilter — Logs mesh metadata (node counts, cell counts, field names) and writes a JSON-lines summary.

  3. PrecisionFilter — Converts floating-point fields from float64 to float32 to halve memory and storage requirements.

Finally a MeshSink writes each processed mesh as a TensorDict memory-mapped directory.

OUTPUT_DIR = "/data/crash_processed"

pipeline = (
    source.filter(WallNodeFilter(threshold=1.0))
    .filter(MeshInfoFilter(output=f"{OUTPUT_DIR}/mesh_info.jsonl"))
    .filter(PrecisionFilter(target_dtype="float32"))
    .write(MeshSink(output_dir=OUTPUT_DIR))
)

Run the Pipeline#

Process the first 3 runs in parallel using a process pool with 2 workers. Crash simulations can be memory-intensive, so a modest worker count helps avoid out-of-memory conditions.

results = run_pipeline(
    pipeline,
    n_jobs=2,
    backend="process_pool",
    indices=range(min(3, len(source))),
    progress=True,
)

Inspect Results#

run_pipeline returns a list of output paths per index. Each entry is the list of files written by the sink for that run.

for idx, paths in enumerate(results):
    print(f"Run {idx}: {len(paths)} output(s)")
    for p in paths:
        print(f"  {p}")

Summary#

This example showed how to:

  • Read LS-DYNA d3plot crash simulation data with D3PlotSource.

  • Remove non-deforming wall nodes with WallNodeFilter.

  • Log mesh metadata and convert precision in a composable filter chain.

  • Write processed meshes in parallel with run_pipeline.

For production workloads, increase n_jobs and remove the indices limit to process the full dataset.

Gallery generated by Sphinx-Gallery