Computational methods for mapping of regulatory elements from large-scale RNA-sequencing compendia

Applicant

Prof. Dr. Julien Gagneur
Faculty of Informatics
Technical University of Munich

Project Summary

Machine learning models allowing interpretation of genetic variations as arising in cancer or rare diseases must be trained on large sets of explanatory variables and companion target variables. A major source of such data stem from compendia of transcriptome sequencing (RNA-seq) of typically tens to hundreds of terabytes. The overarching goal of this project was the development of software leveraging high-performance computing platforms to effectively compute explanatory features and target variables for machine learning models from massive RNA-seq datasets. The project, led by Julien Gagneur (TUM) in collaboration with the LRZ, led to the successful developments of parallelized RNA-seq processing pipelines now routinely used for RNA-seq based rare disease diagnostics, genetic variant feature extraction tools now part of the Kipoi ecosystem, and to MMSplice, a challenge-winning machine learning algorithm for predicting impact of genetic variants on RNA splicing.