- Aaron Wenger
- Alexey Kolesnikov
- Andrew Walker Carroll
- Armin Töpfer
- Ashish Teku Vaswani
- Cory McLean
- Daniel Cook
- Felipe Llinares
- Gunjan Baid
- Howard Cheng-Hao Yang
- Jean-Philippe Vert
- Kishwar Shafin
- Maria Nattestad
- Pi-Chuan Chang
- Quentin Berthet
- Taedong Yun
- Waleed Ammar
- William J. Rowell
Abstract
Genomic analysis requires accurate sequencing in sufficient coverage and over difficult genome regions. Through repeated sampling of a circular template, Pacific Biosciences developed long (10-25kb) reads with high overall accuracy, but lower homopolymer accuracy. Here, we introduce DeepConsensus, a transformer-based approach which leverages a unique alignment loss to correct sequencing errors. DeepConsensus reduces errors in PacBio HiFi reads by 42%, compared to the current approach. We show this increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27%, and at Q40 by 90%. With two SMRT cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9Mb to 17.2Mb), increase gene completeness (94% to 97%), reduce false gene duplication rate (1.1% to 0.5%), and improve assembly base accuracy (QV43 to QV45), and also reduce variant calling errors by 24%.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work