Yoshiyuki Sakai und Michael Manhart
Chair of Hydromechanics
Technical University of Munich
80333 Munich, Germany
In this proposed project, we aim to perform a node-level (SIMD) optimisation to our in-house CFD code MGLET (Multi Grid Large Eddy Turbulence). Over the last two successful performance optimisation projects funded by KONWHIR, MGLET improved its parallel-scaling performance by a factor of ≈ 4 (cf. Section 4.1), and the I/O performance by a factor of ≈ 25 (cf. Section 4.2). At the time of writing (Feb. 2018), the code exhibits a satisfactory parallel efficiency up to a problem size of ≈ 17 billion discrete cells distributed over approximately 32000 parallel processes.
As the relative weight of the MPI-level communication overhead in the execution time decreases, the importance of the pure computing performance at node-level has become progressively crucial for the overall performance of MGLET. Even more so, the modern HPC processors are equipped with ever more powerful internal vectorisation hardware to maintain the performance growth while coping with the stagnated nominal frequency issue, as well as the ever growing energy consumption concern for the HPC systems. For instance, the Intel Sandy Bridge processor used in SuperMUC Phase 1 was equipped with 256-bit vector registers controlled by AVX instruction sets, the Intel Haswell that is used in SuperMUC Phase 2 utilises the refined AVX2 instruction, and Intel Knights Landing (KNL) as well as Skylake, which will be used in the next-generation SuperMUC-NG supercomputer, are equipped with 512-bit vector registers with AVX-512 instruction sets. To prepare for the indicated future HPC development, such as the new SuperMUC-NG system, it is critical to perform an extensive SIMD-optimisation to MGLET at this point, which leads to the purpose of this project.