Towards Globally Responsible Generative AI Benchmarks
Abstract
As generative AI globalizes, there is an opportunity to reorient our nascent development frameworks and evaluative practices towards a global context. This paper uses lessons from a community-centered study on the failure modes of text to Image models in the South Asian context, to give suggestions on how the AI/ML community can develop culturally and contextually situated benchmarks. We present three forms of mitigations for culturally situated- evaluations: 1) diversifying our diversity measures 2) participatory prompt dataset curation 2) multi-tiered evaluations structures for community engagement. Through these mitigations we present concrete methods to make our evaluation processes more holistic and human-centered while also engaging with demands of deployment at global scale.