Noah Fiedel
Noah Fiedel is Technical Lead of TensorFlow Serving and Senior Staff Software Engineer at Google Research. In his Google career he has helped build the systems powering YouTube, Blogger, Google+ and Hangouts. Prior to Google he’s worked on projects as large as Boeing-Jeppesen flight planning software and as small as Color Labs, a 25 person startup he co-founded in the mobile space. Noah holds a B.S. in EECS from UC Berkeley.
Research Areas
Authored Publications
Sort By
Levels of AGI for Operationalizing Progress on the Path to AGI
Jascha Sohl-Dickstein
Allan Dafoe
Clement Farabet
Shane Legg
(2023)
Preview abstract
We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy. It is our hope that this framework will be useful in an analogous way to the levels of autonomous driving, by providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. These principles include focusing on capabilities rather than mechanisms; separately evaluating generality and performance; and defining stages along the path toward AGI, rather than focusing on the endpoint. With these principles in mind, we propose “Levels of AGI” based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems.
View details
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aitor Lewkowycz
Daniel Freeman
Guy Gur-Ari
Jaehoon Lee
Jascha Sohl-dickstein
Liam B. Fedus
TBD (2022)
Preview abstract
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to direct future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models.
To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench consists of 207 tasks, contributed by over 400 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on capabilities that are believed to be beyond current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. A team of human experts further performed all tasks, to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with human performance); model performance is remarkably similar across model classes; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit ``breakthrough'' behavior at a critical scale often involve a significant reasoning or algorithmic component; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
View details
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Hyung Won Chung
Sebastian Gehrmann
Parker Schuh
Sasha Tsvyashchenko
Abhishek Rao
Yi Tay
Noam Shazeer
Nan Du
Reiner Pope
James Bradbury
Guy Gur-Ari
Toju Duke
Henryk Michalewski
Xavier Garcia
Liam Fedus
David Luan
Barret Zoph
Ryan Sepassi
David Dohan
Shivani Agrawal
Mark Omernick
Marie Pellat
Aitor Lewkowycz
Erica Moreira
Rewon Child
Oleksandr Polozov
Zongwei Zhou
Brennan Saeta
Michele Catasta
Jason Wei
Kathy Meier-Hellstern
arxiv:2204.02311 (2022)
Preview abstract
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
View details
Towards a Human-like Open-Domain Chatbot
Apoorv Kulshreshtha
Daniel De Freitas Adiwardana
David Richard So
Gaurav Nemade
Jamie Hall
Romal Thoppilan
Yifeng Lu
Zi Yang
arXiv (2020)
Preview abstract
We present Meena, a multi-turn end-to-end open-domain chatbot trained on data mined from public social media and filtered. The model was trained to minimize perplexity of the next token, but we have found evidence that this metric correlates with human judgement of quality. We propose a human judgement metric called Sensibleness and Specificity Average (SSA) which captures key elements of good conversation. Extensive experiments show strong correlation between perplexity and SSA. The fact that Meena scores high on SSA, 72%, on multi-turn evaluation suggests that a human-like chatbot with SSA score of 82% is potentially within reach if we manage to optimize perplexity better.
View details
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
Akshay Naresh Modi
Chiu Yuen Koo
Chuan Yu Foo
Clemens Mewald
Denis M. Baylor
Jarek Wilkiewicz
Levent Koc
Lukasz Lew
Martin A. Zinkevich
Mustafa Ispir
Neoklis Polyzotis
Steven Whang
Sudip Roy
Sukriti Ramesh
Vihan Jain
Xin Zhang
KDD 2017
Preview abstract
Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components—a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt.
We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions.
We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.
View details
TensorFlow-Serving: Flexible, High-Performance ML Serving
Christopher Olston
Fangwei Li
Jordan Soyke
Kiril Gorovoy
Li Lao
Sukriti Ramesh
Vinu Rajashekhar
Workshop on ML Systems at NIPS 2017
Preview abstract
We describe TensorFlow-Serving, a system to serve machine learning models inside
Google which is also available in the cloud and via open-source. It is extremely
flexible in terms of the types of ML platforms it supports, and ways to
integrate with systems that convey new models and updated versions from training
to serving. At the same time, the core code paths around model lookup and
inference have been carefully optimized to avoid performance pitfalls observed
in naive implementations.
The paper covers the architecture of the extensible serving library, as well as
the distributed system for multi-tenant model hosting. Along the way it points
out which extensibility points and performance optimizations turned out to be
especially important based on production experience.
View details