The complete sequence of a human Y chromosome
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of
its complex repeat structure including long palindromes, tandem repeats, and segmental
duplications. As a result, more than half of the Y chromosome is missing from the GRCh38
reference sequence and it remains the last human chromosome to be finished. Here, the
Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence
of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in
GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the
complete ampliconic structures of TSPY, DAZ, and RBMY; 42 additional protein-coding genes,
mostly from the TSPY gene family; and an alternating pattern of human satellite 1 and 3 blocks
in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the
CHM13 genome and mapped available population variation, clinical variants, and functional
genomics data to produce a complete and comprehensive reference sequence for all 24 human
chromosomes.
its complex repeat structure including long palindromes, tandem repeats, and segmental
duplications. As a result, more than half of the Y chromosome is missing from the GRCh38
reference sequence and it remains the last human chromosome to be finished. Here, the
Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence
of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in
GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the
complete ampliconic structures of TSPY, DAZ, and RBMY; 42 additional protein-coding genes,
mostly from the TSPY gene family; and an alternating pattern of human satellite 1 and 3 blocks
in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the
CHM13 genome and mapped available population variation, clinical variants, and functional
genomics data to produce a complete and comprehensive reference sequence for all 24 human
chromosomes.