Dr. Denis Davydov
Chair of Applied Mechanics
University of Erlangen-Nürnberg
The goal of this project was to improve and further develop algorithms and data structures for matrix-free finite element (FE) operators with MPI-parallel sparse multi-vectors, relevant to quantum mechanics calculations.
To that end we adopted block compress sparse row (BCSR) matrices with row-wise distribution over MPI and SIMD vectorization over columns.
Tailored algorithms and data structures have been developed to support matrix-free FE operators within the deal.II FE library.
Loop overhead due to BCSR data structures were shown to be less than 5% of the walltime.
Matrix-free FE operator with BCSR multivectors achieves around one fourth of the peak performance of a 10-core Intel Ivy Bridge processor and around an eighth on a 20-core Intel Cascade Lake processor.
The performance gap was explained by the cost of data access such as zeroing the destination multivector,
whereas the BCSR-specific components were shown to only marginally increase the run time.
Preliminary inter-node scaling results confirm that the adopted BCSR storage format is a good candidate for efficient and scalable operations on sparse multivectors within the context of FE method applied to problems in quantum mechanics.