Continuous Benchmarking for the GHODDESS framework

Applicant

Prof. Dr. Harald Köstler
Prof. Dr. Ulrich Rüde
Chair for Computer Science 10 – System simulation
Friedrich-Alexander-Universität Erlangen-Nürnberg

Project Overview

The simulation of ocean currents, tides, and coastal ocean circulation is an important field of research and as a common model, the shallow water equations (SWE) are used. One efficient approach to discretize the SWE is the discontinuous Galerkin (DG) method due to its parallelizability, its ability to use high-order approximation spaces, its robustness for problems with shocks but also due to its natural support for h- and p-adaptivity. GHODDESS (Generation of Higher-Order Discretizations Deployed as ExaSlang Specifications) is a Python-based front-end to ExaSlang providing capabilities to express the DG discretization of whole ocean simulations in a compact way.

GHODDESS builds on the symbolic algebra package SymPy combined with domain-specific abstractions for the application at hand. Simulation specifications are automatically mapped to corresponding ExaSlang variants. From here on, the full power of the ExaStencils toolchain can be used to generate optimized and parallelized code. Recently, many efforts were put into optimizing the performance of the p-adaptive method by algorithmic adaptions and distribution of the compute kernels across CPUs and GPUs. First results already show a speedup compared to a pure CPU or a pure GPU implementation.

The goals of the project are to integrate our GHODDESS framework into the continuous benchmarking infrastructure available in Erlangen and then to do performance analysis and optimizations for different architectures included in the test cluster. Currently, our existing continuous integration framework constantly checks the physical correctness of our code. In addition to that, it is very important to assure that the computational performance is not negatively affected by any code changes. This will be ensured by the continuous benchmarking pipeline which constantly measures performance. In our case, automation is of particular importance due to the high number of variants emitted by the code generation. Since performance on different types of hardware architectures (including hybrid ones) is a special focus of our application, in-depth knowledge about parallelization techniques on HPC systems and hardware-specific performance engineering is necessary.