Statistical machine translation live

April 28, 2006

Posted by Franz Och, Research Scientist

Because we want to provide everyone with access to all the world's information, including information written in every language, one of the exciting projects at Google Research is machine translation. Most state-of-the-art commercial machine translation systems in use today have been developed using a rules-based approach and require a lot of work by linguists to define vocabularies and grammars.

Several research systems, including ours, take a different approach: we feed the computer with billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model. We have achieved very good results in research evaluations.

Now you can see the results for yourself. We recently launched an online version of our system for Arabic-English and English-Arabic. Try it out! Arabic is a very challenging language to translate to and from: it requires long-distance reordering of words and has a very rich morphology. Our system works better for some types of text (e.g. news) than for others (e.g. novels) -- and you probably should not try to translate poetry ... but do stay tuned for more exciting developments.

Update: We've just opened a discussion forum for all topics related to machine translation.

Update: Fixed broken link to NIST results.