Mia Chen
Research Areas
Authored Publications
Sort By
Building Machine Translation Systems for the Next Thousand Languages
Julia Kreutzer
Mengmeng Niu
Pallavi Nikhil Baljekar
Xavier Garcia
Maxim Krikun
Pidong Wang
Apu Shah
Macduff Richard Hughes
Google Research (2022)
Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation
Naveen Ari
ACL 2020 (2020)
Preview abstract
Over the last few years two promising research directions in low-resource neural machine translation (NMT) have emerged. The first focuses on utilizing high-resource languages to improve the quality of low-resource languages via multilingual NMT. The second direction employs monolingual data with self-supervision to pre-train translation models, followed by fine-tuning on small amounts of supervised data. In this work, we join these two lines of research and demonstrate the efficacy of monolingual data with self-supervision in multilingual NMT. We offer three major results: (i) Using monolingual data significantly boosts the translation quality of low-resource languages in multilingual models. (ii) Self-supervision improves zero-shot translation quality in multilingual models. (iii) Leveraging monolingual data with self-supervision provides a viable path towards adding new languages to multilingual models, getting up to 28 BLEU on ro-en translation without any parallel data or back-translation.
View details
Towards End-to-End In-Image Neural Machine Translation
Elman Mansimov
Jakob Uszkoreit
Mitchell Stern
Puneet Jain
EMNLP, NLP Beyond Text workshop, 2020 (2020)
Preview abstract
In this paper, we offer a preliminary investigation into the task of in-image machine translation: transforming an image containing text in one language into an image containing the same text in another language. We propose an end-to-end neural model for this task inspired by recent approaches to neural machine translation, and demonstrate promising initial results based purely on pixel-level supervision. We then offer a qualitative evaluation of our system outputs and discuss some common failure modes. Finally, we conclude with directions for future work.
View details
Preview abstract
Motivated by the fact that most of the information relevant to the prediction of target tokens is drawn from the source sentence S=s1,...,sS, we propose truncating the target-side context used for incremental predictions by making a Markov (N-gram) assumption. Experiments on WMT EnDe and EnFr data sets show that the N-gram masked self-attention model loses very little in BLEU score for N values in the range 4,...,8, depending on the task.
View details
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
Dmitry (Dima) Lepikhin
George Foster
Maxim Krikun
Naveen Ari
(2019)
Preview abstract
We introduce our efforts towards building a universal neural machine translation (NMT) system capable of translating between any language pair. We set a milestone towards this goal by building a single massively multilingual NMT model handling 103 languages trained over 25 billion examples. Our system demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines. We provide in-depth analysis of various aspects of model building that are crucial to the quality and practicality towards universal NMT. While we prototype a high-quality universal translation system, our extensive empirical analysis exposes issues that need to be further addressed, and we suggest directions for future research.
View details
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Youlong Cheng
Dehao Chen
HyoukJoong Lee
Jiquan Ngiam
NeurIPS (2019)
Preview abstract
Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address the need for efficient and task-independent model parallelism, we introduce GPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, GPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, GPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii) Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.
View details
Gmail Smart Compose: Real-Time Assisted Writing
Jackie Tsay
Justin Lu
Shuyuan Zhang
Tim Sohn
Yinan Wang
KDD 2019 (2019)
Preview abstract
In this paper, we present Smart Compose, a novel system for generating interactive, real-time suggestions in Gmail that assists users in writing mails by reducing repetitive typing. In the design and deployment of such a large-scale and complicated system, we faced several challenges including model selection, performance evaluation, serving and other practical issues. At the core of Smart Compose is a large-scale neural language model. We leveraged state-of-the-art machine learning techniques for language model training which enabled high-quality suggestion prediction, and constructed novel serving infrastructure for high-throughput and real-time inference. Experimental results show the effectiveness of our proposed system design and deployment approach. This system is currently being served in Gmail.
View details
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
George Foster
Llion Jones
Macduff Hughes
Mike Schuster
Niki J. Parmar
ACL'18 (2018) (to appear)
Preview abstract
The past year has witnessed rapid advances in sequence-to-sequence (seq2seq)
modeling for Machine Translation (MT). The classic RNN-based approaches to MT
were first out-performed by the convolutional seq2seq model, which was then
out-performed by the more recent Transformer model. Each of these new
approaches consists of a fundamental architecture accompanied by a set of
modeling and training techniques that are in principle applicable to other
seq2seq architectures. In this paper, we tease apart the new architectures and
their accompanying techniques in two ways. First, we identify several key
modeling and training techniques, and apply them to the RNN architecture,
yielding a new RNMT+ model that outperforms all of the three fundamental architectures
on the benchmark WMT'14 English to French and
English to German tasks. Second, we analyze the properties of each
fundamental seq2seq architecture and devise new hybrid architectures intended
to combine their strengths. Our hybrid models obtain further improvements,
outperforming the RNMT+ model on both benchmark datasets.
View details
Preview abstract
While current state-of-the-art NMT models, both LSTM based and Transformers, are much deeper compared to their early counterparts, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper transformer and BiLSTM encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in significant improvements on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.
View details