Jump to Content

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome

Aaron Wenger
Andrew Carroll
Arkarachai Fungtammasan
Armin Töpfer
Chen-Shan Chin
David R. Rank
Fritz J. Sedlazeck
Gene Myers
Gregory T. Concepcion
Heng Li
Jana Ebler
Jue Ruan
Justin Zook
Mark DePristo
Medhat Mahmoud
Michael Alonge
Michael C. Schatz
Michael W. Hunkapiller
Nathan D. Olson
Paul Peluso
Richard J. Hall
Sergey Koren
Tobias Marschall
William J. Rowell
Yufeng Qian
Nature Biotechnology (2019)

Abstract

The major DNA sequencing technologies in use today produce either highly-accurate short reads or noisy long reads. We develop a protocol based on single-molecule, circular consensus sequencing (CCS) to generate highly-accurate, long reads and apply it to sequence the well-characterized human, HG002/NA24385, to 28-fold coverage with 13.5 kb CCS reads that average 99.5% accuracy. We apply existing tools to comprehensively detect variants, and achieve precision and recall above 99.9% for SNVs, 95.9% for indels, and 95.2% for structural variants. Nearly all (99.6%) variants are phased into haplotypes, which further improves variant detection. De novo assembly produces a highly contiguous and accurate genome with contig N50 above 15 Mb and concordance over Q45 (99.997%). From manual curation of discordances, we estimate 1,283 mistakes in the high-quality Genome in a Bottle benchmark are correctable with CCS reads. With only CCS reads, we match or exceed performance of variant detection with accurate short reads and assembly with noisy long reads.

Research Areas