Jump to Content

DeepVariant over the years

Andrew Carroll
Daniel Cook
Gunjan Baid
Howard Yang
Maria Nattestad
(2021)

Abstract

The development of DeepVariant was motivated by the following question: if computational biologists can look at pileup images of reads to identify variants, can we train an image classification model to perform this task? To answer this question, we began working on DeepVariant in 2015, and the first open-source version (v0.4) of the software was released in late 2017. Since v0.4, the project has come a long way, and there have been eight additional releases. We originally began development on Illumina whole-genome sequencing (WSG) data, and the first release included one model for this data type. Over the years, we have added support for additional sequencing technologies, and we now provide models for Illumina whole-exome sequencing (WES) data, Pacific Bioscience (PacBio) Hifi data, and a hybrid model for Illumina and PacBio WGS data combined. We have also collaborated with a team at UC Santa Cruz to train DeepVariant using Oxford Nanopore data. The resulting tool, PEPPER-DeepVariant, uses PEPPER to generate candidates more effectively for Nanopore data. In addition to new models, new capabilities have been added, such as the best practices for cohort calling in v0.9 and DeepTrio, a trio and duo caller, in v1.1. For each release, we focus on building highly-accurate models, reducing runtime, and improving the user experience. In this post, we summarize the improvements in accuracy and runtime over the years and highlight a few categories of changes that have led to these improvements.