Covering All Your Bases: Strategies to Expand Training Data for Specialized Genomics Problems
Abstract
We explore three different training strategies to leverage whole-genome sequencing data to improve model performance for the specialized task of variant calling from whole-exome sequencing data: 1) jointly trainIng with both WGS and WES data, 2) warmstarting from a pre-trained WGS model, and 3) including sequencing type as an input to the model.