Thinking beyond SuperMUC-NG: Porting MGLET to GPUs


Dr. Yoshiyuki Sakai
Simon von Wenczowski
Prof. Michael Manhart
Professorship of Hydromechanics
Technische Universität München

Project Overview

Adapting to the fast-moving development in modern HPC hardware is an important topic for the well-established codes to maintain their performance and efficiency. In this proposed project, we port our in-house CFD code MGLET(Multi Grid Large Eddy Turbulence) to GPU-accelerated heterogeneous systems.

Over the two KONWIHR projects in 2015 & 2017, MGLET improved its parallel-scaling perform-ance by a factor of ≈4 and the I/O performance by a factor of≈25. At the time of writing (July 2020), the code exhibits a satisfactory strong scaling up to a problem size of ≈17 billion discrete cells distributed over approximately 32000 parallel processes, whilst a sufficient weak scaling was demonstrated up to 135000 parallel processes. Consequently, we performed a SIMD optimisation to our two pressure solvers within the third KONWIHR project in 2019. This was motivated by the recent trend that the modern HPC processors are equipped with ever more powerful yet more energy-efficient internal vectorisation hardware to maintain the performance growth while coping with the stagnated nominal frequency, as well as the ever growing energy consumption for the HPC systems. One important example of such systems for us is SuperMUC-NG at LRZ, that is based on Intel Skylake processors being equipped with 512-bit ultrawide vector registers.  By exploiting the Skylake’s extensive SIMD capability, our optimised code shows up to 20% overall performance improvement even in the range of O(10⁴) MPI processes. This SIMD optimisation effort alone will contribute to a reduction up to ≈8 million CPU-h within our latest SuperMUC project. Furthermore, the corresponding positive environmental impact via the improved energy efficiency should not be overlooked.

Though the integrated vector units inside the general-purpose CPUs have significantly improved the performance, or more importantly the performance-energy ratio over the last years, there is a hard limit to the improvement and it may have been reached already. Therefore during the past years, the HPC community has witnessed a persistent trend towards GPU-accelerated heterogeneous systems. As boldly claimed by several experts, the famous Moore’s law may have lost its validity, at least for the conventional CPUs. In contrast, the GPU accelerators are equipped with a great degree of thread-level parallelism combined with the fast-access memory. It is, therefore, of our essential interest, and consequently the objective of this project, to upgrade our SIMD-optimised code to be able to run on the heterogeneous systems. We believe this upgrade is a logical step forward for MGLET, since the modifications applied during the SIMD-optimisation project are expected to benefit the performance on GPUs as well.