Jun.-Prof. Dr. Stephan Gekle
Fakultät für Mathematik, Physik und Informatik
Nachwuchsgruppe Theoretische Physik
Starting with a non-optimized OpenMP version of our code, we have implemented several node-level improvements together with an hybrid OpenMP and MPI implementation within the framework of the KONWIHR project within 6 months. The total scaling behavior of the code was found to be similar to a comparable GROMACS setup. We identified the occurrence of multiple FFTs as the main issue. Furthermore, the biggest data structures (Verlet lists and Fourier grids) are distributed over several compute nodes, enabling the simulation of larger systems that would otherwise not fit into a single machine. On the whole, all proposed goals have been achieved (MPI parallelization of the real and Fourier spaces, and a scheme to find the optimal parameter set). Beyond that, several other parts of the program received additional parallelization and node-level optimizations. These amount to a speed-up of around 2.5× using a single compute node compared to the original version. Although an optimization of the existing FFT libraries is not feasible, some other performance improvements would be possible for a future project. For example, the Verlet list can be changed to use less memory and exhibit a better scaling behavior, the memory access patterns in the real space can be optimized, and SIMD instructions could be used to further improve the node-level performance. Finally, switching the underlying SPME method to a fast multipole method would eliminate the FFT bottleneck, but it is estimated to be a rather lengthy and complex task.