Proper Name Transcription/Transliteration with ICU Transforms

Sascha Brawer
Martin Jansche
Hiroshi Takenaka
Yui Terashima
34th Internationalization & Unicode Conference (2010)

Abstract

We describe our experience with a deep localization of Google Maps™, where millions of geographic names from diverse origins had to be represented in several target languages, including Russian, Mandarin, and Japanese. For example, a map of Western Europe on maps.google.co.jp shows Japanese labels for almost all labeled features. We tackle the problem of transliterating from several source languages into several target languages by pivoting through an explicit intermediate phonetic representation. Each transliteration scheme is implemented as a sequence of ICU transforms, reusing a few existing transforms from ICU and CLDR, but consisting mostly of transforms that we wrote specifically for this problem. Dividing the problem this way results in many reusable components that make it simple to transliterate between multiple languages. We discuss the steps that go into building transliteration rules, describe existing official and de facto standards and guidelines, and give suggestions for what to do when no consistent guidelines are available. We provide general recommendations for developing and testing custom ICU transforms.