Scalable nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation

Benedict Paten
Kimberley Billingsley
Kishwar Shafin
Mikhail Kolmogorov
Nature Methods (2023)

Abstract

Long-read sequencing technologies substantially overcome the limitations of short-reads but
have to date been a combination of too expensive, not scalable enough, or too error-prone to be
considered a feasible replacement at scale. Here we develop an efficient and scalable wet lab
and computational protocol for Oxford Nanopore Technologies long-read sequencing that seeks
to provide a true alternative to short-reads for whole genome sequencing. We applied this
protocol to cell line and brain tissue samples as part of a pilot project for the NIH Center for
Alzheimer’s and Related Dementias (CARD). Using a single PromethION flow cell we can
detect SNPs with F1-score better than Illumina short-read sequencing (standard for large-scale
genomic projects). Further, we can discover structural variants comparable to state-of-the-art
long-read based de novo assembly methods involving Pacific Biosciences HiFi sequencing and
trio information, but at a much lower cost and far greater throughput. We can then combine and
phase small and structural variants at megabase scales using long-read based phasing. The
protocol also produces highly accurate, haplotype specific methylation calls. This makes
large-scale long-read sequencing projects feasible; the protocol is currently being used to
sequence thousands of brain based genomes as a part of the NIH CARD initiative. This protocol uses a de novo assembly-based framework for structural variants discovery that improves over
reference-based methods. We provide the protocol and software as open source integrated
pipelines for generating phased variant calls and assemblies.

Research Areas