Improving FAIRness of HPC research data

Applicant

Prof. Christian Stemmer
Chair of Aerodynamics and Fluid Mechanics
Technische Universität München

Project Overview

The goal of this project was to enhance the FAIRness (Findable, Accessible, Interoperable, and Reusable) of research data in high-performance computing (HPC). Currently, challenges exist in transferring large datasets to local workstations for analysis due to substantial storage requirements. Post-processing primarily occurs on HPC systems, limiting accessibility to other researchers. Software in High-Performance Computing (HPC) faces limitations as it is often restricted to a specific HPC system and is incompatible with different architectures. Additionally, there are challenges in automating the collection of metadata from performed simulations.

The project addresses three key initiatives in the realm of High-Performance Computing (HPC): container usage, data accessibility via the LRZ Compute Cloud, and the development of the HPMC tool for Ontology-based Metadata Extraction and Re-use (HOMER).

Container Usage in HPC

To comply with FAIR principles, software must be compatible with diverse systems, but HPC software faces challenges in portability and compatibility.
A user guide for constructing a software container is being developed, allowing programs to run independently of the host system, a novel approach in the HPC community.
Container software has been successfully tailored for HPC centers in Garching and Juelich, with plans for expansion.

Data Accessibility via LRZ Compute Cloud

Handling vast amounts of data in HPC, a cloud-based solution using the LRZ Compute Cloud has been implemented to address storage and sharing challenges.
A tool enables LRZ users to grant external users access to their data through the compute cloud, enhancing the ability to share extensive datasets among research colleagues.

Metadata Standards and Metadata Crawler (HOMER)

Building upon the recently released NFDI4Ing ontology (Metadata4Ing), we have created a workflow based HPMC sub-ontology defining and promoting metadata standards specific to the High-Performance and Scientific Computing environments.
The HPMC tool, HOMER, is a Python-based metadata crawler developed to streamline the metadata collection process in HPC simulations.
The crawler, based on a specified ontology, gathers metadata from diverse input sources, including file formats, terminal commands, and user inputs.
The crawler’s automation through shell scripts ensures efficient and consistent metadata collection, saving time and eliminating redundancy.
HOMER’s output is a JSON file organized according to the ontology, facilitating machine and human readability and promoting efficient data retrieval.