Google Research

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome

  • Aaron M. Wenger
  • Paul Peluso
  • William J. Rowell
  • Pi-Chuan Chang
  • Richard J. Hall
  • Gregory T. Concepcion
  • Jana Ebler
  • Arkarachai Fungtammasan
  • Alexey Kolesnikov
  • Nathan D. Olson
  • Armin Töpfer
  • Michael Alonge
  • Medhat Mahmoud
  • Yufeng Qian
  • Chen-Shan Chin
  • Adam M. Phillippy
  • Michael C. Schatz
  • Gene Myers
  • Mark A. DePristo
  • Jue Ruan
  • Tobias Marschall
  • Fritz J. Sedlazeck
  • Justin M. Zook
  • Heng Li
  • Sergey Koren
  • Andrew Carroll
  • David R. Rank
  • Michael W. Hunkapiller
Nature Biotechnology (2019)

Abstract

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the ‘genome in a bottle’ (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work