Improving the Parallelism of Ultrasoft Pseudopotential Calculations within the CPMD code

Applicant

Prof. Dr. Bernd Meyer
Interdisciplinary Center for Molecular Materials and Computer-Chemistry-Center
Department of Chemistry and Pharmacy
FAU Erlangen-Nürnberg

Project Overview

The Car-Parrinello molecular dynamics (CPMD) program is one of the most popular codes for ab initio MD simulations in quantum chemistry. A significant amount of computer time at many supercomputer centers in the world is used for CPMD simulations. For example, in the “Summer of Simulation” project this year at LRZ, three out of seven projects were related to CPMD. For those projects alone 15 Mio CPUh were granted.

In CPMD, the atomic interactions are determined by solving the quantum-mechanical electronic structure problem using density-functional theory (DFT). Specifically, the electronic wave functions are represented by plane waves, and the Fast Fourier Transform (FFT) algorithm is exploited for an efficient evaluation of the quantum-mechanical expressions. The MPI parallelization in CPMD mostly relies on the distribution of the 3D FFT grid over the number of processes. As long as the number of FFT grid points in one direction of the mesh is larger than the number of MPI processes, scaling of CPMD is almost linear. However, as soon as the number of processes has reached the number of FFT grid points, hardly any speedup can be achieved.

In most CPMD applications, the so-called “norm conserving pseudopotentials” are used. This approach uses a large number of plane waves, but has the advantage, that basically everything can be evaluated by FFTs. FFT grids are large with a thousand and more grid points in one direction, so that good scaling up to thousands of cores is achieved.

There is, however, an alternative, the so-called “ultrasoft Vanderbilt pseudopotentials”, which are less frequently used. They require a much smaller number of plane waves, at the cost that other expressions, not related to FFTs, have to be evaluated. Nevertheless, this approach requires much less computer time than the “norm conserving pseudotentials”. Savings can be up to a factor 10. The drawback is, however, that because of the smaller number of plane waves, the FFT grids are much smaller, typically in HPC application in the range of 200, so that scaling of CPMD ends at about this number of processor cores (see benchmark below). The bottleneck are now the additional, Vanderbilt-specific parts in the code, which are not well or not at all parallelized. Scalability of CPMD could be extended significantly by improving the MPI parallelization of these subroutines and, in particular, by adding an OpenMP parallelization.