Edem Wornyo

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract This course explores how researchers and practitioners can engage ethically with Indigenous communities when developing AI- and data-intensive applications. Some key issues such as fair engagement, legal constraints, reciprocity, and informed consent are discussed based on the examples drawn from the instructors’ experience. The course also examines good practices in terms of co-designing and co-development processes, data governance and sovereignty issues and systems, decolonial software licensing, and processes of technology transfer and appropriation. In its practical part, the course critically discusses examples and cases gathered from the audience to explore the diversity of issues and solutions when working with Indigenous communities. View details
    Preview abstract This course explores how researchers and practitioners can engage ethically with Indigenous communities when developing AI- and data-intensive applications. Some key issues such as fair engagement, legal constraints, reciprocity, and informed consent are discussed based on the examples drawn from the instructors’ experience. The course also examines good practices in terms of co-designing and co-development processes, data governance and sovereignty issues and systems, decolonial software licensing, and processes of technology transfer and appropriation. In its practical part, the course critically discusses examples and cases gathered from the audience to explore the diversity of issues and solutions when working with Indigenous communities. View details
    Socially Responsible Data for Large Multilingual Language Models
    Zara Wudiri
    Mbangula Lameck Amugongo
    Alex
    Stanley Uwakwe
    João Sedoc
    Seyi Olojo
    Amber Ebinama
    Suzanne Dikker
    2024
    Preview abstract Large Language Models (LLMs) have rapidly increased in size and apparent capabilities in the last three years but their training data is largely English text. There is growing interest in language inclusivity in LLMs, and various efforts are striving for models to accommodate language communities outside of the Global North1 , which include many languages that have been historically underrepresented digitally. These languages have been coined as “low resource languages” or “long tail languages”, and LLMs performance on these languages is generally poor. While expanding the use of LLMs to more languages may bring many potential benefits, such as assisting cross-community communication and language preservation, great care must be taken to ensure that data collection on these languages is not extractive and that it does not reproduce exploitative practices of the past. Collecting data from languages spoken by previously colonized people, indigenous people, and non-Western languages raises many complex sociopolitical and ethical questions, e.g., around consent, cultural safety, and data sovereignty. Furthermore, linguistic complexity and cultural nuances are often lost in LLMs. This position paper builds on recent scholarship, and our own work, and outlines several relevant social, cultural, and ethical considerations and potential ways to mitigate them through qualitative research, community partnerships and participatory design approaches. We provide twelve recommendations for consideration when collecting language data on underrepresented language communities outside of the Global North. View details
    ×