# Frequently Asked Questions ## General ### How do I get started? For installation instructions, check the [installation guide](install). For a quick working example, see the [User Guide](../index). For detailed API usage, refer to the [API documentation](../../modules/index). If your question is not answered here, please submit a Github [Issue][issues_]. ### What hardware does this support? ALCHEMI Toolkit-Ops runs on: - CUDA-capable NVIDIA GPUs (Compute Capability 8.0+, i.e. A100 and newer) - CPU execution via NVIDIA Warp (x86 and ARM, including Apple Silicon) For best performance, we recommend CUDA 12+ with driver version 570.xx or newer. See the [installation guide](install) for full prerequisites. ### I need a kernel that does not exist yet If the existing API is missing functionality you need and you think it would benefit the community, please start a discussion on Github [Issues][issues_]. ## Neighbor Lists ### What is the difference between cell_list and naive algorithms? The two algorithm families have different computational complexity: - `cell_list()` uses spatial decomposition for O(N) scaling. It is optimized for large systems (roughly >5000 atoms) where the cutoff is small relative to the simulation box. - `naive_neighbor_list()` computes all pairwise distances for O(N²) scaling. It has lower overhead and can be faster for smaller systems. The crossover point depends on hardware, system density, and cutoff radius. We recommend benchmarking both on your specific workload. ### How does this compare to ASE neighbor lists? [ASE](https://wiki.fysik.dtu.dk/ase/) provides CPU-based neighbor list implementations. ALCHEMI Toolkit-Ops differs in several ways: - GPU acceleration via NVIDIA Warp kernels - Native batch processing for multiple systems - `torch.compile` compatibility for ML training loops - Both dense (neighbor matrix) and sparse (COO) output formats The acceleration is substantial, particularly for larger system sizes where GPU utilization is amortized. ## Troubleshooting ### Using `torch.compile` Select kernels support `torch.compile`; those that do will say so in their docstrings. For `torch.compile` to work without graph breaks, you typically need to pre-allocate output tensors. See the [neighbor list documentation](../components/neighborlist) for details on pre-allocation patterns. We recommend reading the general [`torch.compile` troubleshooting guide](https://docs.pytorch.org/docs/stable/torch.compiler_troubleshooting.html) and the [PhysicsNeMo performance tuning guide](https://docs.nvidia.com/physicsnemo/latest/user-guide/performance_docs/torch_compile_support.html#torch-compile). If a kernel is expected to be `torch.compile` compatible but is not working, please open a Github [Issue][issues_]. [issues_]: https://www.github.com/NVIDIA/nvalchemi-toolkit-ops/issues/new/choose