Effective OpenFOAM MPI-I/O library for HDF5 archive output


Prof. Dr.-Ing. habil Stefan Becker
Institut für Prozessmaschinen und Anlagetechnik
Friedrich-Alexander-Universität Erlangen-Nürnberg

Project Overview

One important issue for large, three dimensional and time dependent computational fluid dynamics (CFD) simulations is to output and store the computed data in an efficient and practical way. OpenFOAM, the leading open-source software for CFD, has a special way to write and structure the generated data when dealing with parallelized cases on distributed processors. The method of parallel computing used by OpenFOAM is known as domain decomposition, in which the geometry and associated fields are broken into pieces and allocated to separate processors for solution. The process of parallel computation involves the decomposition of mesh and fields and running the simulation in parallel. The latter process uses openMPI implementation of the standard message passing interface (MPI). To use the full capability of post-processing methods, the decomposed mesh and fields have to be reconstructed to a complete domain (after the parallel run). The OpenFOAM utility decomposePar handles the decomposition of the mesh and fields to break up the domain into a specified number of subdomains to guarantee a fairly economic solution on each processor (one subdomain for each processor). The utility generates a number of sub-directories according to the specified number of processors. While the simulation is running, each sub-directory contains several time directories that contain in turn the decomposed field variables and mesh descriptions for the relevant time step. While this file structure is well-organized, for large parallel cases, it generates a large number of files. For example, if the case is distributed on 1000 cores and is computing and storing 5 flow quantities (velocity, pressure, etc.) the number of files would become 5000 per single time step. If there is the need to store 2000 time steps this ends up in a total number of ten million files. Therefore, in large simulations, users can experience problems with the parallel file system on a cluster and/or by hitting limits on the number of open files imposed by the operating system. Additionally, reconstructing the decomposed domain is a time intensive process and can not be neglected for highly parallelized cases where an enormous amount of time steps have to be stored (e. g. in fluid-structure-acoustic interaction simulations). In the latest versions of OpenFOAM, the so-called collated file format was introduced to attenuate the problem of generating large amounts of single files. With this alternative file format the data for each decomposed field and mesh is collated into a single file that is written on a master processor. This results in one file per time step per flow quantity (velocity, pressure, etc.) instead of the default one file per process per time step per flow variable. However, for cases where large amount of time steps are needed to be stored, the generated number of files can still be quite high. Furthermore, the problem of the time intensive reconstruction process is still not adressed by the new collated file format. Within this project, an OpenFOAM library specialized for highly parallelized simulations should be developed in order to output data in an efficient way by writing single HDF5 archives instead of the default “one-file-per-process-per-timestep-per-variable” approach. The idea behind this is to avoid large number of single files and the slow reconstruction process, thereby facilitating the postprocessing and the coupling with other software that are already using HDF5 as their default I/O data format (e. g. coupled field simulation software CFS++).