![Heng-Tze Cheng](https://storage.googleapis.com/gweb-research2023-media/pubtools/3466.png)
Heng-Tze Cheng
Heng-Tze Cheng is a Technical Lead Manager and Senior Staff Software Engineer on the Google Brain team, part of Google Research & AI. Heng-Tze currently leads a research team focusing on Neural Sequence Modeling research for Task-oriented Dialogues, Personalized Semantic Search, and Recommender Systems productionized across Google, such as Google Duplex Assistant, YouTube, and more. Heng-Tze also founded and led the Wide & Deep Learning project in TensorFlow, and has worked on large-scale machine learning platforms that are widely used for retrieval, ranking, and recommender systems.
Prior to joining Google in 2014, Heng-Tze received his Ph.D. from Carnegie Mellon University in 2013 and B.S. from National Taiwan University in 2008. His research interests range across machine learning, information retrieval, user behavior modeling, and human activity recognition.
Authored Publications
Google Publications
Other Publications
Sort By
HyperPrompt: Prompt-based Task-Conditioning of Transformers
Cosmo Du
Steven Zheng
Vamsi Aribandi
Yi Tay
Yun He
Zhao Chen
Zhe Zhao
ICML(2022)
Preview abstract
Prompt-tuning is becoming a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate prompts. We propose a novel architecture of HyperPrompt: prompt-based task-conditioned parameterization of self-attention in Transformers. We show that HyperPrompt is very competitive against strong multi-task learning baselines with only 1% of additional task-conditioning parameters. The prompts are end-to-end learnable via generation by a HyperNetwork. The additional parameters scale sub-linearly with the number of downstream tasks, which makes it very parameter efficient for multi-task learning. Hyper-Prompt allows the network to learn task-specific feature maps where the prompts serve as task global memories. Information sharing is enabled among tasks through the HyperNetwork to alleviate task conflicts during co-training. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior performances over strong T5 multi-task learning base-lines and parameter-efficient adapter variants including Prompt-Tuning on Natural Language Understanding benchmarks of GLUE and Super-GLUE across all the model sizes explored.
View details
LaMDA: Language Models for Dialog Applications
Aaron Daniel Cohen
Alena Butryna
Alicia Jin
Apoorv Kulshreshtha
Ben Zevenbergen
Chung-ching Chang
Cosmo Du
Daniel De Freitas Adiwardana
Dehao Chen
Dmitry (Dima) Lepikhin
Erin Hoffman-John
Hongrae Lee
Igor Krivokon
James Qin
Jamie Hall
Joe Fenton
Johnny Soraker
Lora Mois Aroyo
Maarten Paul Bosma
Marc Joseph Pickett
Marcelo Amorim Menegali
Marian Croak
Maxim Krikun
Meredith Ringel Morris
Noam Shazeer
Rachel Bernstein
Ravi Rajakumar
Ray Kurzweil
Romal Thoppilan
Steven Zheng
Taylor Bos
Toju Duke
Tulsee Doshi
Vincent Y. Zhao
Will Rusch
Yuanzhong Xu
arXiv(2022)
Preview abstract
We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and arepre-trained on 1.56T words of public dialog data and web text. While model scaling alone canimprove quality, it shows less improvements on safety and factual grounding. We demonstrate thatfine-tuning with annotated data and enabling the model to consult external knowledge sources canlead to significant improvements towards the two key challenges of safety and factual grounding.The first challenge, safety, involves ensuring that the model’s responses are consistent with a set ofhuman values, such as preventing harmful suggestions and unfair bias. We quantify safety using ametric based on an illustrative set of values, and we find that filtering candidate responses using aLaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promisingapproach to improving model safety. The second challenge, factual grounding, involves enabling themodel to consult external knowledge sources, such as an information retrieval system, a languagetranslator, and a calculator. We quantify factuality using a groundedness metric, and we find that ourapproach enables the model to generate responses grounded in known sources, rather than responsesthat merely sound plausible. Finally, we explore the use of LaMDA in the domains of education andcontent recommendations, and analyze their helpfulness and role consistency.
View details
Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries
Ajit Apte
Ambarish Jash
Amol H Wankhede
Ankit Kumar
Ayooluwakunmi Jeje
Dima Kuzmin
Ellie Ka In Chio
Harry Fung
Jon Effrat
Nitin Jindal
Pei Cao
Senqiang Zhou
Sukhdeep S. Sodhi
Tameen Khan
Tarush Bali
KDD(2021)
Preview abstract
As more and more online search queries come from voice, automatic speech recognition becomes a key component to deliver relevant search results. Errors introduced by automatic speech recognition (ASR) lead to irrelevant search results returned to the user, thus causing user dissatisfaction. In this paper, we introduce an approach, Mondegreen, to correct voice queries in text space without depending on audio signals, which may not always be available due to system constraints or privacy or bandwidth (for example, some ASR systems run on-device) considerations. We focus on voice queries transcribed via several proprietary commercial ASR systems. These queries come from users making internet, or online service search queries. We first present an analysis showing how different the language distribution coming from user voice queries is from that in traditional text corpora used to train off-the-shelf ASR systems. We then demonstrate that Mondegreen can achieve significant improvements in increased user interaction by correcting user voice queries in one of the largest search systems in Google. Finally, we see Mondegreen as complementing existing highly-optimized production ASR systems, which may not be frequently retrained and thus lag behind due to vocabulary drifts.
View details
Zero-Shot Transfer Learning for Query-Item Cold Start in Search Retrieval and Recommendations
Ankit Kumar
Cosmo Du
Dima Kuzmin
Ellie Chio
John Roberts Anderson
Li Zhang
Nitin Jindal
Pei Cao
Ritesh Agarwal
Steffen Rendle
Tao Wu
Wen Li
CIKM(2020)
Preview abstract
Most search retrieval and recommender systems predict top-K items given a query by learning directly from a large training set of (query, item) pairs, where a query can include natural language (NL), user, and context features. These approaches fall into the traditional supervised learning framework where the algorithm trains on labeled data from the target task. In this paper, we propose a new zero-shot transfer learning framework, which first learns representations of items and their NL features by predicting (item, item) correlation graphs as an auxiliary task, followed by transferring learned representations to solve the target task (query-to-item prediction), without having seen any (query, item) pairs in training. The advantages of applying this new framework include: (1) Cold-starting search and recommenders without abundant query-item data; (2) Generalizing to previously unseen or rare (query, item) pairs and alleviating the "rich get richer" problem; (3) Transferring knowledge of (item, item) correlation from domains outside of search. We show that the framework is effective on a large-scale search and recommender system.
View details
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Vihan Jain
Jing Wang
Sanmit Narvekar
Ritesh Agarwal
Rui Wu
Morgane Lustman
Vince Gatto
Paul Covington
Jim McFadden
arXiv(2019)
Preview abstract
Most practical recommender systems focus on estimating immediate user engagement without considering the long-term effects of recommendations on user behavior. Reinforcement learning (RL) methods offer the potential to optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. In this work, we address the challenge of making slate-based recommendations to optimize long-term value using RL. Our contributions are three-fold. (i) We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user-choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. (ii) We outline a methodology that leverages existing myopic learning-based recommenders to quickly develop a recommender that handles LTV. (iii) We demonstrate our methods in simulation, and validate the scalability of decomposed TD-learning using SlateQ in live experiments on YouTube.
View details
SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets
Vihan Jain
Jing Wang
Sanmit Narvekar
Ritesh Agarwal
Rui Wu
Proceedings of the Twenty-eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macau, China(2019), pp. 2592-2599
Preview abstract
Reinforcement learning (RL) methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.
View details
TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks
Cassandra Xia
Clemens Mewald
George Roumpos
Illia Polosukhin
Jamie Alexander Smith
Jianwei Xie
Lichan Hong
Mustafa Ispir
Philip Daniel Tucker
Yuan Tang
Proceedings of the 23th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada(2017)
Preview abstract
We present a framework for specifying, training, evaluating, and deploying machine learning models. Our focus is to simplify writing cutting edge machine learning models in a way that enables bringing those models into production. Recognizing the fast evolution of the field of deep learning, we make no attempt to capture the design space of all possible model architectures in a DSL or similar configuration. We allow users to write code to define their models, but provide abstractions that guide developers to write models in ways conducive to productionization, as well as providing a unifying Estimator interface, a unified interface making it possible to write downstream infrastructure (distributed training, hyperparameter tuning, …) independent of the model implementation.
We balance the competing demands for flexibility and simplicity by offering APIs at different levels of abstraction, making common model architectures available “out of the box”, while providing a library of utilities designed to speed up experimentation with model architectures. To make out of the box models flexible and usable across a wide range of problems, these canned Estimators are parameterized not only over traditional hyperparameters, but also using feature columns, a declarative specification describing how to interpret input data.
We discuss our experience in using this framework in research and production environments, and show the impact on code health, maintainability, and development speed.
View details
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
Akshay Naresh Modi
Chiu Yuen Koo
Chuan Yu Foo
Clemens Mewald
Denis M. Baylor
Levent Koc
Lukasz Lew
Martin A. Zinkevich
Mustafa Ispir
Neoklis Polyzotis
Steven Whang
Sudip Roy
Sukriti Ramesh
Vihan Jain
Xin Zhang
KDD 2017
Preview abstract
Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components—a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt.
We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions.
We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.
View details
Wide & Deep Learning for Recommender Systems
Levent Koc
Tal Shaked
Glen Anderson
Wei Chai
Mustafa Ispir
Rohan Anil
Lichan Hong
Vihan Jain
Xiaobing Liu
Hemal Shah
arXiv:1606.07792(2016)
Preview abstract
Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models.
View details
Nonparametric Discovery of Human Routine from Sensor Data
Feng-Tso Sun
Yi-Ting Yeh
Cynthia Kuo
Martin Griss
IEEE International Conference on Pervasive Computing and Communications (PerCom)(2014)
Preview abstract
People engage in routine behaviors. Automatic routine discovery goes beyond low-level activity recognition such as sitting or standing and analyzes human behaviors at a higher level (e.g., commuting to work). With recent developments in ubiquitous sensor technologies, it becomes easier to acquire a massive amount of sensor data. One main line of research is to mine human routines from sensor data using parametric topic models such as latent Dirichlet allocation. The main shortcoming of parametric models is that it assumes a fixed, pre-specified parameter regardless of the data. Choosing an appropriate parameter usually requires an inefficient trial-and-error model selection process. Furthermore, it is even more difficult to find optimal parameter values in advance for personalized applications. In this paper, we present a novel nonparametric framework for human routine discovery that can infer high-level routines without knowing the number of latent topics beforehand. Our approach is evaluated on public datasets in two routine domains: a 34-daily-activity dataset and a transportation mode dataset. Experimental results show that our nonparametric framework can automatically learn the appropriate model parameters from sensor data without any form of model selection procedure and can outperform traditional parametric approaches for human routine discovery tasks.
View details
Towards zero-shot learning for human activity recognition using semantic attribute sequence model
Martin Griss
Paul Davis
Jianguo Li
Di You
UbiComp '13 Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, ACM
Preview abstract
Understanding human activities is important for user-centric and context-aware applications. Previous studies showed promising results using various machine learning algorithms. However, most existing methods can only recognize the activities that were previously seen in the training data. In this paper, we present a new zero-shot learning framework for human activity recognition that can recognize an unseen new activity even when there are no training samples of that activity in the dataset. We propose a semantic attribute sequence model that takes into account both the hierarchical and sequential nature of activity data. Evaluation on datasets in two activity domains show that the proposed zero-shot learning approach achieves 70-75% precision and recall recognizing unseen new activities, and outperforms supervised learning with limited labeled data for the new classes.
View details
NuActiv: Recognizing Unseen New Activities Using Semantic Attribute-Based Learning
Feng-Tso Sun
Martin Griss
Paul Davis
Jianguo Li
Di You
Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, ACM, New York, NY, USA(2013), pp. 361-374
Preview abstract
We study the problem of how to recognize a new human activity when we have never seen any training example of that activity before. Recognizing human activities is an essential element for user-centric and context-aware applications. Previous studies showed promising results using various machine learning algorithms. However, most existing methods can only recognize the activities that were previously seen in the training data. A previously unseen activity class cannot be recognized if there were no training samples in the dataset. Even if all of the activities can be enumerated in advance, labeled samples are often time consuming and expensive to get, as they require huge effort from human annotators or experts. In this paper, we present NuActiv, an activity recognition system that can recognize a human activity even when there are no training data for that activity class. Firstly, we designed a new representation of activities using semantic attributes, where each attribute is a human readable term that describes a basic element or an inherent characteristic of an activity. Secondly, based on this representation, a two-layer zero-shot learning algorithm is developed for activity recognition. Finally, to reinforce recognition accuracy using minimal user feedback, we developed an active learning algorithm for activity recognition. Our approach is evaluated on two datasets, including a 10-exercise-activity dataset we collected, and a public dataset of 34 daily life activities. Experimental results show that using semantic attribute-based learning, NuActiv can generalize knowledge to recognize unseen new activities. Our approach achieved up to 79% accuracy in unseen activity recognition.
View details
SensOrchestra: Collaborative Sensing for Symbolic Location Recognition
Feng-Tso Sun
Senaka Buthpitiya
Martin Griss
International Conference on Mobile Computing, Applications, and Services, 2010
Preview abstract
Symbolic location of a user, like a store name in a mall, is essential for context-based mobile advertising. Existing fingerprint-based localization using only a single phone is susceptible to noise, and has a major limitation in that the phone has to be held in the hand at all times. In this paper, we present SensOrchestra, a collaborative sensing framework for symbolic location recognition that groups nearby phones to recognize ambient sounds and images of a location collaboratively. We investigated audio and image features, and designed a classifier fusion model to integrate estimates from different phones. We also evaluated the energy consumption, bandwidth, and response time of the system. Experimental results show that SensOrchestra achieved 87.7% recognition accuracy, which reduces the error rate of single-phone approach by 2X, and eliminates the limitations on how users carry their phones. We believe general location or activity recognition systems can all benefit from this collaborative framework.
View details
Automatic Chord Recognition for Music Classification and Retrieval
Yi-Hsuan Yang
Yu-Ching Lin
I-Bin Liao
Homer H. Chen
IEEE International Conference on Multimedia and Expo (ICME), 2008
Preview abstract
As one of the most important mid-level features of music, chord contains rich information of harmonic structure that is useful for music information retrieval. In this paper, we present a chord recognition system based on the N-gram model. The system is time-efficient, and its accuracy is comparable to existing systems. We further propose a new method to construct chord features for music emotion classification and evaluate its performance on commercial song recordings. Experimental results demonstrate the advantage of using chord features for music classification and retrieval.
View details