Scalable framework for scoring deleteriousness of genetic variants with CADD and Kipoi

Applicant

Prof. Dr. Julien Gagneur
Faculty of Informatics
Technical University of Munich

Project Summary

Predicting which genetic variant is deleterious is of major importance in human genetics. One of the most successful computational tools employed to this end is CADD (Combined Annotation Dependent Depletion), which integrates multiple annotations into one single metric. However, the scalability of CADD is challenged by the rapid development of a plethora of machine learning algorithms focussing on the impact of genetic variants on various molecular and cellular processes. The aim of this project was to address limitations of CADD by providing a scalable feature and scoring scheme, leveraging the LRZ HPC cluster. The project led to a successful implementation of pipelines to generate state-of-the-art CADD features and new features using models of the model zoo for genomics Kipoi. An improved version of CADD itself could not be achieved, but the project led to a new model, MTSplice, predicting effects of genetic variants on splicing in a tissue-specific fashion.