The complete sequence of a human Y chromosome

Arang Rhie
Sergey Nurk
Monika Cechova
Savannah Hoyt
Dylan Taylor
Nicolas Altemose
Mikko Rautiainen
Ivan Alexandrov
Kishwar Shafin
Jamie Allen
Michael C. Schatz
Sergey Koren
Karen Miga
Adam M. Phillippy
Nature, TBD (2023)

Abstract

The human Y chromosome has been notoriously difficult to sequence and assemble because of
its complex repeat structure including long palindromes, tandem repeats, and segmental
duplications. As a result, more than half of the Y chromosome is missing from the GRCh38
reference sequence and it remains the last human chromosome to be finished. Here, the
Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence
of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in
GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the
complete ampliconic structures of TSPY, DAZ, and RBMY; 42 additional protein-coding genes,
mostly from the TSPY gene family; and an alternating pattern of human satellite 1 and 3 blocks
in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the
CHM13 genome and mapped available population variation, clinical variants, and functional
genomics data to produce a complete and comprehensive reference sequence for all 24 human
chromosomes.

Research Areas