Improving FAIRness of HPC research data


Prof. Christian Stemmer
Chair of Aerodynamics and Fluid Mechanics
Technische Universität München

Project Overview

The project is part of the DFG funded NFDI4Ing consortium (National Research Data Infrastructure for Engineering Sciences). All major Tier 0 HPC facilities within GCS (HLRS, JSC, LRZ) are participants in the NFDI4Ing project.
Within NFDI4Ing, the task area DORIS develops research data concepts and software infrastructure for data from high performance measurement and computing data. The main goal is to make Tier 0-HPC research data findable, accessible, interoperable, and reusable (FAIR). Due to the high storage demands, data are currently immobile and too large to be copied to local work stations for the evaluation. Post-processing generally is done on HPC systems.
Data from HPC applications are currently only partially accessible to other researchers as they are too large to be included in state-of-the-art repositories (e.g. DaRUS). The data are currently archived on tape at HPC centers in the personal account of the data generator. Often, this data is neither documented nor accompanied/equipped with metadata. This is in part credited to the absence of a thorough semantics to describe engineering sciences in HPC. We are working on solutions to make HPC data accessible to and reusable by other research groups. Individual and custom fit software solutions for research data management, that can be transferred to other HPC-systems are developed and tested within DORIS.
Therefore, we will test software that simplifies access to the data (Grid-FTP, Globus e.g.) and investigate new sharing possibilities (e.g. via Compute Cloud)5. Furthermore, new ways for reusing, reproducing, and post-processing HPC data for example via Charliecloud or Singularity are to be examined. Finally, the automated equipment of existing and new HPC datasets with sensible metadata and publishing to repositories will be investigated.