Brian Roark

Improving Informally Romanized Language Identification

Adrian Benton

Christo Kirov

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Suzhou, China, 2318–2336

Context-aware Transliteration of Romanized South Asian Languages

Christo Kirov

Cibu Johny

Anna Katanova

Alexander Gutkin

Brian Roark

Computational Linguistics, 50 (2) (2024), 475–534

Spelling convention sensitivity in neural language models

Elizabeth Nielsen

Christo Kirov

Brian Roark

Findings of EACL (2023), pp. 1304-1316

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Sebastian Ruder

Jon Clark

Alexander Gutkin

Mihir Sanjay Kale

Min Ma

Massimo Nicosia

Shruti Rijhwani

Parker Riley

Jean-Michel Sarr

Cindy Wang

John Wieting

Nitish Gupta

Anna Katanova

Christo Kirov

Dana L. Dickinson

Brian Roark

Bidisha Samanta

Connie Tao

David Adelani

Vera Axelrod

Isaac Caswell

Colin Cherry

Dan Garrette

Reeve Ingle

Melvin Johnson

Dmitry Panteleev

Partha Talukdar

Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics, Singapore, pp. 1856-1884

Design principles of an open-source language modeling microservice package for AAC text-entry applications

Brian Edward Roark

Alexander Gutkin

9th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022), Association for Computational Linguistics (ACL), Dublin, Ireland, pp. 1-16

Beyond Arabic: Software for Perso-Arabic Script Manipulation

Alexander Gutkin

Cibu Johny

Raiomond Doctor

Brian Roark

Richard Sproat

Proceedings of the 7th Arabic Natural Language Processing Workshop (WANLP2022) at EMNLP, Association for Computational Linguistics (ACL), Abu Dhabi, United Arab Emirates (Hybrid), pp. 381-387

Graphemic Normalization of the Perso-Arabic Script

Raiomond Doctor

Alexander Gutkin

Cibu Johny

Brian Roark

Richard Sproat

Proceedings of Grapholinguistics in the 21st Century, 2022 (G21C, Grafematik), Fluxus Editions, Brest, France, pp. 315-376

Criteria for Useful Automatic Romanization in South Asian Languages

Isin Demirsahin

Cibu Johny

Alexander Gutkin

Brian Edward Roark

Proceedings of the 13th Language Resources and Evaluation Conference.(LREC), European Language Resources Association (ELRA), 20-25 June, Marseille, France (2022), 6662‑6673

Extensions to Brahmic script processing within the Nisaba library: new scripts, languages and utilities

Alexander Gutkin

Cibu Johny

Raiomond Doctor

Lawrence Wolf-Sonkin

Brian Edward Roark

Proceedings of the 13th Language Resources and Evaluation Conference.(LREC), European Language Resources Association (ELRA), 20-25 June, Marseille, France (2022), 6450‑6460

Approximating probabilistic models as weighted finite automata

Ananda Theertha Suresh

Brian Edward Roark

Michael D. Riley

Vlad Schogol

Computational Linguistics, 47 (2021), pp. 221-254

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Brian Roark

Research Areas

Join us

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Brian Roark

Research Areas

Filter by:

Publications

Years

Research Areas

Teams

Join us