 
                Ed H. Chi
            Ed H. Chi is a Distinguished Scientist at Google, leading several machine learning research teams focusing on neural modeling, reinforcement learning, dialog modeling, reliable/robust machine learning, and recommendation systems in Google Brain team. His team has delivered significant improvements for YouTube, News, Ads, Google Play Store at Google with >420 product improvements since 2013. With 39 patents and >150 research articles, he is also known for research on user behavior in web and social media.
Prior to Google, he was the Area Manager and a Principal Scientist at Palo Alto Research Center's Augmented Social Cognition Group, where he led the team in understanding how social systems help groups of people to remember, think and reason. Ed completed his three degrees (B.S., M.S., and Ph.D.) in 6.5 years from University of Minnesota. Recognized as an ACM Distinguished Scientist and elected into the CHI Academy, he recently received a 20-year Test of Time award for research in information visualization. He has been featured and quoted in the press, including the Economist, Time Magazine, LA Times, and the Associated Press. An avid swimmer, photographer and snowboarder in his spare time, he also has a blackbelt in Taekwondo. See Ed's personal website.
        
        Prior to Google, he was the Area Manager and a Principal Scientist at Palo Alto Research Center's Augmented Social Cognition Group, where he led the team in understanding how social systems help groups of people to remember, think and reason. Ed completed his three degrees (B.S., M.S., and Ph.D.) in 6.5 years from University of Minnesota. Recognized as an ACM Distinguished Scientist and elected into the CHI Academy, he recently received a 20-year Test of Time award for research in information visualization. He has been featured and quoted in the press, including the Economist, Time Magazine, LA Times, and the Associated Press. An avid swimmer, photographer and snowboarder in his spare time, he also has a blackbelt in Taekwondo. See Ed's personal website.
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Shuo Yang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aniruddh Nath
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yang Liu
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Li Wei
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shawn Andrews
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Maciej Kula
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jarrod Kahn
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zhe Zhao
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lichan Hong
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            2024
          
          
        
        
        
          
              Preview abstract
          
          
              Knowledge Distillation (KD) is a powerful approach for compressing large models into smaller, more efficient models, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems.  This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring the consistent and reliable generation of high-quality teacher labels from continuous data streams.
              
  
View details
          
        
      
    
        
          
            
              Improving Training Stability for Multitask Ranking Models in Recommender Systems
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Justin Gilmer
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Li Wei
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lichan Hong
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mahesh Sathiamoorthy
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            KDD 2023 (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              Recommender systems play an important role in YouTube, one of the largest online video platforms across the world. In this paper, we focus on a real-world multitask ranking model for YouTube recommendations.
While most of the recommendation research is dedicated to designing better models to improve user engagement and satisfaction, we found that research on stabilizing the training for such models is severely under-explored.
As the recommendation models become larger and more sophisticated, they are more vulnerable to training instability issues, \emph{i.e.}, the loss diverges (instead of converging) which can make the model unusable, wasting significant resources and blocking model iterations.
In this paper, we share our understanding and best practices we learned for improving the training stability of a multitask ranking model used in production. We show some properties of the model that lead to unstable training and speculate on the cause. Furthermore, we propose an effective solution to improve training stability based on our observations of training dynamics when model training starts to become unstable. Our experiments on a proprietary dataset show the effectiveness of the proposed method over several commonly used baseline methods.
              
  
View details
          
        
      
    
        
          
            
              Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Ben Coleman
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ruoxi Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lichan Hong
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Advances in Neural Information Processing Systems (2023), pp. 56234-56255
          
          
        
        
        
          
              Preview abstract
          
          
              Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, which introduces hundreds of billions of parameters for extremely high-cardinality features. This bottleneck has led to substantial progress in alternative embedding algorithms. Many of these methods, however, make the assumption that each feature uses an independent embedding table. This work introduces a simple yet highly effective framework, Feature Multiplexing, where one single representation space is used for many different categorical features. Our theoretical and empirical analysis reveals that multiplexed embeddings can be decomposed into components from each constituent feature, allowing models to distinguish between features. We show that multiplexed representations give Pareto-optimal space-accuracy tradeoffs for three public benchmark datasets. Further, we propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. Unified embedding gives significant improvements in offline and online metrics compared to highly competitive baselines across five web-scale search, ads, and recommender systems, where it serves billions of users across the world in industry-leading products.
              
  
View details
          
        
      
    
        
          
            
              Self-Consistency Improves Chain of Thought Reasoning in Language Models
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Jason Wei
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sharan Narang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aakanksha Chowdhery
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            ICLR 2023 (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).
              
  
View details
          
        
      
    
        
          
            
              LaMDA: Language Models for Dialog Applications
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Aaron Daniel Cohen
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alena Butryna
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alicia Jin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Apoorv Kulshreshtha
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ben Zevenbergen
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chung-ching Chang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Cosmo Du
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Daniel De Freitas Adiwardana
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Dehao Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Dmitry (Dima) Lepikhin
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Erin Hoffman-John
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Igor Krivokon
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        James Qin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jamie Hall
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joe Fenton
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Johnny Soraker
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kathy Meier-Hellstern
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Maarten Paul Bosma
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Marc Joseph Pickett
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Marcelo Amorim Menegali
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Marian Croak
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Maxim Krikun
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Noam Shazeer
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rachel Bernstein
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ravi Rajakumar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ray Kurzweil
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Romal Thoppilan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Steven Zheng
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Taylor Bos
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Toju Duke
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tulsee Doshi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Vincent Y. Zhao
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Will Rusch
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yanping Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yuanzhong Xu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zhifeng Chen
                      
                    
                  
              
            
          
          
          
          
            arXiv (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and arepre-trained on 1.56T words of public dialog data and web text.  While model scaling alone canimprove quality, it shows less improvements on safety and factual grounding. We demonstrate thatfine-tuning with annotated data and enabling the model to consult external knowledge sources canlead to significant improvements towards the two key challenges of safety and factual grounding.The first challenge, safety, involves ensuring that the model’s responses are consistent with a set ofhuman values, such as preventing harmful suggestions and unfair bias. We quantify safety using ametric based on an illustrative set of values, and we find that filtering candidate responses using aLaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promisingapproach to improving model safety. The second challenge, factual grounding, involves enabling themodel to consult external knowledge sources, such as an information retrieval system, a languagetranslator, and a calculator. We quantify factuality using a groundedness metric, and we find that ourapproach enables the model to generate responses grounded in known sources, rather than responsesthat merely sound plausible. Finally, we explore the use of LaMDA in the domains of education andcontent recommendations, and analyze their helpfulness and role consistency.
              
  
View details
          
        
      
    
        
          
            
              Surrogate for Long-Term User Experience in Recommender Systems
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Can Xu
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lisa Mijung Chung
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mohit Sharma
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Qian Sun
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sriraj Badam
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yuyan Wang
                      
                    
                  
              
            
          
          
          
          
            KDD 2022 (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              Over the years we have seen recommender systems shifting focus from optimizing short-term engagement toward improving long-term user experience on the platforms. While defining good long-term user experience is still an active research area, we focus on one specific aspect of improved long-term user experience here, which is user revisiting the platform. These long term outcomes however are much harder to optimize due to the sparsity in observing these events and low signal-to-noise ratio (weak connection) between these long-term outcomes and a single recommendation. To address these challenges, we propose to establish the association between these long-term outcomes and a set of more immediate term user behavior signals that can serve as surrogates for optimization. 
To this end, we conduct a large-scale study of user behavior logs on one of the largest industrial recommendation platforms serving billions of users. We study a broad set of sequential user behavior patterns and standardize a procedure to pinpoint the subset that has strong predictive power of the change in users' long-term visiting frequency. Specifically, they are predictive of users' increased visiting to the platform in $5$ months among the group of users with the same visiting frequency to begin with. We validate the identified subset of user behaviors by incorporating them as reward surrogates for long-term user experience in a reinforcement learning (RL) based recommender. Results from multiple live experiments on the industrial recommendation platform demonstrate the effectiveness of the proposed set of surrogates in improving long-term user experience.
              
  
View details
          
        
      
    
        
          
            
              Emergent abilities of large language models
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Barret Zoph
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Colin Raffel
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Dani Yogatama
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jason Wei
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Liam B. Fedus
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Maarten Paul Bosma
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Percy Liang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sebastian Borgeaud
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tatsunori B. Hashimoto
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yi Tay
                      
                    
                  
              
            
          
          
          
          
            TMLR (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              Scaling up language models has been shown to predictably confer a range of benefits such as improved performance and sample efficiency. This paper discusses an unpredictable phenomenon that we call emergent abilities of large language models. Such emergent abilities have close to random performance until evaluated on a model of sufficiently large scale, and hence their emergence cannot be predicted by extrapolating a scaling law based on small-scale models. The emergence of such abilities suggests that additional scaling could further expand the range of tasks that language models can perform. We discuss the implications of these phenomena and suggest directions for future research.
              
  
View details
          
        
      
    
        
          
            
              Learning to Augment for Casual User Recommendation
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Elaine Le
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jianling Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yuyan Wang
                      
                    
                  
              
            
          
          
          
          
            The ACM Web Conference 2022 (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              Users who come to recommendation platforms are heterogeneous in activity levels. There usually exists a group of core users who visit the platform regularly and consume a large body of contents upon each visit, while others are casual users who tend to visit the platform occasionally and consume less each time.
As a result, consumption activities from core users often dominate the training data used for learning. As core users can exhibit different activity patterns from casual users, recommender systems trained on historical user activity data usually achieve much worse performance on casual users than core users. 
To bridge the gap, we propose a model-agnostic framework \textit{L2Aug} to improve recommendations for casual users through data augmentation, without sacrificing core user experience. \textit{L2Aug} is powered by a data augmentor that learns to generate augmented interaction sequences, in order to fine-tune and optimize the performance of the recommendation system for casual users. On four real-world public datasets, the proposed \textit{L2Aug} outperforms other treatment methods and achieves the best sequential recommendation performance for both casual and core users. We also test \textit{L2Aug} in an online simulation environment with real-time feedback to further validate its efficacy, and showcase its flexibility in supporting different augmentation actions.
              
  
View details
          
        
      
    
        
        
          
              Preview abstract
          
          
              A goal for multi-task learning from a multi-objective optimization perspective is to find the Pareto solutions that are not dominated by others. In this paper, we provide some insights on understanding the trade-off between Pareto efficiency and generalization, as a result of parameterization in deep learning: as a multi-objective optimization problem, enough parameterization is needed for handling task conflicts in a constrained solution space; however, from a multi-task generalization perspective, over-parameterization undermines the benefit of learning a shared representation which helps harder tasks or tasks with limited training examples. A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization. To this end, we propose a method of under-parameterized self-auxiliaries for multi-task models to achieve the best of both worlds. It is model-agnostic, task-agnostic and works with other multi-task learning algorithms. Empirical results show our method improves Pareto efficiency over existing popular algorithms on several multi-task applications.
              
  
View details
          
        
      
    
        
          
            
              HyperPrompt: Prompt-based Task-Conditioning of Transformers
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Cosmo Du
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Steven Zheng
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Vamsi Aribandi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yi Tay
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yun He
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zhao Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zhe Zhao
                      
                    
                  
              
            
          
          
          
          
            ICML (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              Prompt-tuning  is  becoming  a  new  paradigm for finetuning pre-trained language models in a parameter-efficient way.   Here,  we explore the use of HyperNetworks to generate prompts.  We propose a novel architecture of HyperPrompt: prompt-based task-conditioned parameterization of self-attention in Transformers. We show that HyperPrompt is very competitive against strong multi-task learning baselines with only 1% of additional task-conditioning parameters. The prompts are end-to-end learnable via generation by a HyperNetwork.  The additional parameters  scale  sub-linearly  with  the  number of  downstream  tasks,  which  makes  it  very  parameter efficient for multi-task learning. Hyper-Prompt allows the network to learn task-specific feature  maps  where  the  prompts  serve  as  task global memories. Information sharing is enabled among tasks through the HyperNetwork to alleviate task conflicts during co-training.  Through extensive empirical experiments, we demonstrate that  HyperPrompt  can  achieve  superior  performances over strong T5 multi-task learning base-lines and parameter-efficient adapter variants including Prompt-Tuning on Natural Language Understanding  benchmarks  of  GLUE  and  Super-GLUE across all the model sizes explored.
              
  
View details
          
        
      
    