A population-specific reference panel for improved genotype imputation in African Americans

Jared O’Connell
Meghan Moreno
Helen Li
Nadia Litterman
Elizabeth Noblin
Anjali Shastri
Elizabeth H. Dorfman
Suyash Shringarpure
23andMe Research Team
Adam Auton
Andrew Carroll
Communications Biology(2021)


There is currently a dearth of accessible whole genome sequencing (WGS) data for individuals residing in the Americas with Sub-Saharan African ancestry. We generated whole genome sequencing data at intermediate (15×) coverage for 2,294 individuals with large amounts of Sub-Saharan African ancestry, predominantly Atlantic African admixed with varying amounts of European and American ancestry. We performed extensive comparisons of variant callers, phasing algorithms, and variant filtration on these data to construct a high quality imputation panel containing data from 2,269 unrelated individuals. With the exception of the TOPMed imputation server (which notably cannot be downloaded), our panel substantially outperformed other available panels when imputing African American individuals. The raw sequencing data, variant calls and imputation panel for this cohort are all freely available via dbGaP and should prove an invaluable resource for further study of admixed African genetics.