A deep-learning-based RNA-seq germline variant caller

Aarti Venkat
Andrew Carroll
Daniel Cook
Dennis Yelizarov
Francisco De La Vega
Yannick Pouliot
Bioinformatics Advances (2023)

Abstract

RNA-seq is a widely used technology for quantifying and studying gene expression. Many other applications have been developed for RNA-seq as well such as identifying quantitative trait loci, or identifying gene fusion events. However, germline variant calling has not been widely used because RNA-seq data tend to have high error rates and require special processing by variant callers. Here, we introduce a DeepVariant RNA-seq model capable of producing highly accurate variant calls from RNA-sequencing data. Our model outperforms existing approaches such as Platypus and GATK. We examine factors that influence accuracy, how our model addresses RNA editing events, and how additional thresholding can be used to allow for our models' use in a production pipeline.

Research Areas