
Brian Roark
Brian Roark is a computational linguist working on various topics in natural language processing. His research interests include: language modeling for automatic speech recognition, text entry and other applications; text normalization and transliteration; text entry, accessibility and augmentative and alternative communication (AAC).
Before joining Google, he was a faculty member for 9 years in the Center for Spoken Language Understanding (CSLU) at Oregon Health & Science University (OHSU) – part of what used to be the Oregon Graduate Institute (OGI). Before that, he was in the Speech Algorithms Department at AT&T Labs - Research from 2001–2004. He received his Ph.D. in the Department of Cognitive and Linguistic Sciences at Brown University in 2001.
More information, including publications, CV and other links, can be found at his external webpage here.
Before joining Google, he was a faculty member for 9 years in the Center for Spoken Language Understanding (CSLU) at Oregon Health & Science University (OHSU) – part of what used to be the Oregon Graduate Institute (OGI). Before that, he was in the Speech Algorithms Department at AT&T Labs - Research from 2001–2004. He received his Ph.D. in the Department of Cognitive and Linguistic Sciences at Brown University in 2001.
More information, including publications, CV and other links, can be found at his external webpage here.
Research Areas
Authored Publications
Sort By
Google
Context-aware Transliteration of Romanized South Asian Languages
Christo Kirov
Computational Linguistics, 50 (2) (2024), 475–534
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Sebastian Ruder
Mihir Sanjay Kale
Shruti Rijhwani
Jean-Michel Sarr
Cindy Wang
John Wieting
Christo Kirov
Dana L. Dickinson
Bidisha Samanta
Connie Tao
David Adelani
Reeve Ingle
Dmitry Panteleev
Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics, Singapore, pp. 1856-1884
Graphemic Normalization of the Perso-Arabic Script
Raiomond Doctor
Richard Sproat
Proceedings of Grapholinguistics in the 21st Century, 2022 (G21C, Grafematik), Paris, France
Criteria for Useful Automatic Romanization in South Asian Languages
Proceedings of the 13th Language Resources and Evaluation Conference.(LREC), European Language Resources Association (ELRA), 20-25 June, Marseille, France (2022), 6662‑6673
Extensions to Brahmic script processing within the Nisaba library: new scripts, languages and utilities
Raiomond Doctor
Proceedings of the 13th Language Resources and Evaluation Conference.(LREC), European Language Resources Association (ELRA), 20-25 June, Marseille, France (2022), 6450‑6460
Design principles of an open-source language modeling microservice package for AAC text-entry applications
9th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022), Association for Computational Linguistics (ACL), Dublin, Ireland, pp. 1-16
Beyond Arabic: Software for Perso-Arabic Script Manipulation
Raiomond Doctor
Richard Sproat
Proceedings of the 7th Arabic Natural Language Processing Workshop (WANLP2022) at EMNLP, Association for Computational Linguistics (ACL), Abu Dhabi, United Arab Emirates (Hybrid), pp. 381-387
Finite-state script normalization and processing utilities: The Nisaba Brahmic library
The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021): System Demonstrations, Association for Computational Linguistics, [Online], Kyiv, Ukraine, April, 2021, pp. 14-23