GAP Shared Task Overview
Abstract
We overview a shared task we ran using the GAP coreference challenge, including the logistics of running a task with 263 active participants and the modeling trends we observed. We found that fine-tuning BERT with gender balanced data produced a fair model, which serves as a recommendation to the community about one way to approach fairness in NLP modeling.