A draft human pangenome reference

Wen-Wei Liao
Mobin Asri
Jana Ebler
Daniel Doerr
Marina Haukness
Shuangjia Lu
Julian K. Lucas
Jean Monlong
Haley J. Abel
Silvia Buonaiuto
Xian Chang
Haoyu Cheng
Justin Chu
Vincenza Colonna
Jordan M. Eizenga
Xiaowen Feng
Christian Fischer
Robert S. Fulton
Shilpa Garg
Cristian Groza
Andrea Guarracino
William T. Harvey
Simon Heumos
Kerstin Howe
Miten Jain
Tsung-Yu Lu
Charles Markello
Fergal J. Martin
Matthew W. Mitchell
Katherine M. Munson
Moses Njagi Mwaniki
Adam M. Novak
Hugh E. Olsen
Trevor Pesout
David Porubsky
Pjotr Prins
Jonas A. Sibbesen
Jouni Sirén
Chad Tomlinson
Flavia Villani
Mitchell R. Vollger
Lucinda L Antonacci-Fulton
Gunjan Baid
Carl A. Baker
Anastasiya Belyaeva
Konstantinos Billis
Andrew Carroll
Sarah Cody
Daniel Cook
Robert M. Cook-Deegan
Omar E. Cornejo
Mark Diekhans
Peter Ebert
Susan Fairley
Olivier Fedrigo
Adam L. Felsenfeld
Giulio Formenti
Adam Frankish
Yan Gao
Nanibaa’ A. Garrison
Carlos Garcia Giron
Richard E. Green
Leanne Haggerty
Kendra Hoekzema
Thibaut Hourlier
Hanlee P. Ji
Eimear E. Kenny
Barbara A. Koenig
Jan O. Korbel
Jennifer Kordosky
Sergey Koren
HoJoon Lee
Alexandra P. Lewis
Hugo Magalhães
Santiago Marco-Sola
Pierre Marijon
Ann McCartney
Jennifer McDaniel
Jacquelyn Mountcastle
Maria Nattestad
Sergey Nurk
Nathan D. Olson
Alice B. Popejoy
Daniela Puiu
Mikko Rautiainen
Allison A. Regier
Arang Rhie
Samuel Sacco
Ashley D. Sanders
Valerie A. Schneider
Baergen I. Schultz
Kishwar Shafin
Michael W. Smith
Heidi J. Sofia
Ahmad N. Abou Tayoun
Francoise Thibauld-Nissen
Francesa Floriana Tricomi
Justin Wagner
Brian Walenz
Jonathan M. D. Wood
Aleksey V. Zimin
Guillaume Borque
Mark J. P. Chaisson
Paul Flicek
Adam M. Phillippy
Justin Zook
Evan E. Eichler
David Haussler
Ting Wang
Erich D. Jarvis
Karen H. Miga
Glenn Hickey
Erik Garrison
Tobias Marschall
Ira M. Hall
Heng Li
Benedict Paten
Nature (2023)

Abstract

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.

Research Areas