BG.DAF: Bavarian Genome Data and Analysis Facility


Prof. Dr. Burkhard Rost
TU München

Prof. Dr. Erika von Mutius
Dr. von Haunersches Kinderspital
LMU München

Project Overview

In order to provide a centralized data storage and computing facility for genome research we propose the installation of a facility consisting of hardware, software and human resources (Bavarian Genome Data and Analysis Facility, BG.DAF). The facility is located at the LRZ and can in the initial phase support several researchers as test users and will later be transformed into a permanent LRZ service for the Bavarian life science researchers. It is focused on scientists in the field of genome research which can use it as a one-stop shop for their research spanning the whole data life cycle of data storage, data preprocessing, data analytics, long term data availability and data publication.

The facility consists of three layers: A basic services layer which provides the infrastructure for authentication and access to the data, the hardware layer which consists of the LRZ Cloud and LRZ HPC Resources and a software application layer which provides demonstrator-like implementations of the most demanded state of the art software application suites for genomic analysis.

Security aspects are evaluated already in the planning phase and the facility will provide security by design.

As a prototypical implementation, this KONWIHR project will provide basic functional units of the final BG.DAF to two selected user groups. The first sub-project is led by Professor Ege from the Dr. von Hauner Children’s Hospital of Ludwig Maximilian University and will process the work package “Genetic and microbial determinants of childhood asthma and their interactions”. From a computational point of view this work package focuses on interactive supercomputing for statistical data analysis using web interfaces like RStudio and Shiny. The second sub-project, which is managed by the Rostlab group at TU Munich, addresses a new implementation of the web service PredictProteine at LRZ. The web service has been run by the Rostlab for over 20 years and has experienced an enormous increase in data volume over the years. This exponential growth in data volume has severely exacerbated the maintainability and performance of the service. Thus, this subproject plans to lay the foundation for its operation in the next decade by porting it to the LRZ and its ample resources for storing and processing big data.