Parallel mesh loading and partitioning for large-scale simulation

Applicant

PD Dr. Florian Frank
Lehrstuhl für Wissenschaftliches Rechnen
Department Mathematik
Friedrich-Alexander-Universität Erlangen-Nürnberg

Project Overview

The objective of this project was to enhance the parallelization of the mesh loading and distribution procedure within the context of our Digital Rock CFD simulator Simulicious. The meshes being analyzed consist of segmented high-resolution 3D images of rock samples generated through micro-computed tomography with each voxel either representing the pore space or the solid matrix of the rock, where the union of pore voxels is the computational domain.

Simulicious is written in C++ using the finite element library MFEM, which is known for its parallel scalability, HPC efficiency, and support for high-order finite elements. The sparse matrix assembly and linear solvers/preconditioners, such as MINRES and AMG, have demonstrated strong scaling capabilities through MPI parallelization with thousands of CPU cores. However, as of MFEM version 4.3, the parallel grid class ParMesh primarily relies on global mesh information derived from the serial (global) mesh object passed to the ParMesh constructor. This involves loading the full global mesh from an ASCII mesh file, constructing the global mesh topology, and calling the partitioner METIS on each rank. While this approach is reasonable for coarse meshes that are later refined in parallel, it is a major bottleneck in terms of performance and memory consumption for the primary application of Simulicious, which involves Digital Rock simulations utilizing large predefined voxel meshes.

With the support of the KONWIHR funding, we were able to make significant improvements to the mesh loading and distribution process. To reduce disk storage and enhance file loading performance, we introduced the ability to store and load meshes in a 1-bit format, with each bit representing a single voxel. Furthermore, the role of loading voxel data from a 1-bit binary file and calling the partitioner METIS is assigned exclusively to the master rank. The master rank then computes the mesh topology of the local mesh fragments based on the partitioning information and communicates them over the network to other ranks. By avoiding the creation of a global mesh, we successfully accelerated the mesh loading procedure and reduced memory consumption. As a result of these optimizations, we can now run simulations faster and on larger meshes than previously possible.

 

Publications that benefited from the project

A. Meier, E. Bänsch, F. Frank (2022). Schur preconditioning of the Stokes equations in channel-dominated domains. Computer Methods in Applied Mechanics and Engineering 398(1). https://doi.org/10.1016/j.cma.2022.115264

S. Gärttner, F. O. Alpak., A. Meier, N. Ray, F. Frank (2023). Estimating permeability of 3D micro-CT images by physics-informed CNNs based on DNS. Computational Geosciences. https://doi.org/10.1007/s10596-022-10184-0