Jump to Content
Fei Sha

Fei Sha

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Probabilistic forecasting is crucial to decision-making under uncertainty about future weather. The dominant approach is to use an ensemble of forecasts to represent and quantify uncertainty in operational numerical weather prediction. However, generating ensembles is computationally costly. In this paper, we propose to generate ensemble forecasts at scale by leveraging recent advances in generative artificial intelligence. Our approach learns a data-driven probabilistic diffusion model from the 5-member ensemble GEFS reforecast dataset. The model can then be sampled efficiently to produce realistic weather forecasts, conditioned on a few members of the operational GEFS forecasting system. The generated ensembles have similar predictive skill as the full GEFS 31-member ensemble, evaluated against ERA5 reanalysis, and emulate well the statistics of large physics-based ensembles. We also apply the same methodology to developing a diffusion model for generative post-processing: the model directly learns to correct biases present in the emulated forecasting system by leveraging reanalysis data as labels during training. Ensembles from this generative post-processing model show greater reliability and accuracy, particularly in extreme event classification. In general, they are more reliable and forecast the probability of extreme weather more accurately than the GEFS operational ensemble. Our models achieve these results at less than 1/10th of the computational cost incurred by the operational GEFS system. View details
    Evolve Smoothly, Fit Consistently: Learning Smooth Latent Dynamics For Advection-Dominated Systems
    Leonardo Zepeda-Núñez
    Anudhyan Boral
    International Conference on Learning Representations (2023) (to appear)
    Preview abstract We present a data-driven, space-time continuous framework to learn surrogate models for complex physical systems described by advection-dominated partial differential equations. Those systems have slow-decaying Kolmogorov n-width that hinders standard methods, including reduced order modeling, from producing high-fidelity simulations at low cost. In this work, we construct hypernetwork-based latent dynamical models directly on the parameter space of a compact representation network. We leverage the expressive power of the network and a specially designed consistency-inducing regularization to obtain latent trajectories that are both low-dimensional and smooth. These properties render our surrogate models highly efficient at inference time. We show the efficacy of our framework by learning models that generate accurate multi-step rollout predictions at much faster inference speed compared to competitors, for several challenging examples. View details
    Preview abstract We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evidence to support each answer. Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13.0% accuracy on our dataset. Moreover, we experimentally show that progress on answering our encyclopedic questions can be achieved by augmenting large models with a mechanism that retrieves relevant information from the knowledge base. An oracle experiment with perfect retrieval achieves 87.0% accuracy on the single-hop portion of our dataset, and an automatic retrieval-augmented prototype yields 48.8%. We believe that our dataset enables future research on retrieval-augmented vision+language models. It is available at https://github.com/google-research/google-research/tree/master/encyclopedic_vqa. View details
    WeatherBench 2: A benchmark for the next generation of data-driven global weather models
    Alex Merose
    Peter Battaglia
    Tyler Russell
    Alvaro Sanchez
    Vivian Yang
    Matthew Chantry
    Zied Ben Bouallegue
    Peter Dueben
    Carla Bromberg
    Jared Sisk
    Luke Barrington
    Aaron Bell
    arXiv (2023) (to appear)
    Preview abstract WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state-of-the-art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state-of-the-art physical and data-driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data-driven weather forecasting. View details
    Mention Memory: incorporating textual knowledge into Transformers through entity mention attention
    Michiel de Jong
    Yury Zemlyanskiy
    10th International Conference on Learning Representations, ICLR 2022, Virtual Conference , April 25-29, 2022, OpenReview.net
    Preview abstract Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge as a ``mention memory" containing a dense vector representation of every entity mention in a corpus. The Transformer model accesses the information through internal memory layers in which each entity mention in the passage being read attends to the mention memory. This approach enables synthesis of and reasoning over many disparate sources of information \textit{within} a single Transformer model. In experiments using a memory of ~150 million Wikipedia mentions, our model provides to strong improvements in performance on several open-domain knowledge-intensive tasks, including the claim verification benchmarks FEVER and HoVeR and several entity-based QA benchmarks. We also show that the model learns to attend to informative mentions without any direct supervision. Finally we show that the model can be adapted to generalize to new unseen entities by updating the memory, without retraining. View details
    Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing
    Ice Pasupat
    Joshua Ainslie
    Linlu Qiu
    Michiel de Jong
    Yury Zemlyanskiy
    Proceedings of COLING (2022)
    Preview abstract A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of query and exemplar inputs. We propose GandR, a retrieval procedure that retrieves exemplars for which outputs are also similar. GandR first generates a preliminary prediction with input-based retrieval. Then, it retrieves exemplars with outputs similar to the preliminary prediction which are used to generate a final prediction. GandR sets the state of the art on multiple low-resource semantic parsing tasks. View details
    ReadTwice: Reading Very Large Documents with Memories
    Yury Zemlyanskiy
    Joshua Ainslie
    Michiel de Jong
    Ilya Eckstein
    Proceedings of NAACL (2021) (to appear)
    Preview abstract Knowledge-intensive tasks such as question answering often require assimilating information from different sections of large inputs, like books or collections of articles. We propose ReadTwice, a simple and effective approach to combine the advantages of existing approaches that modify Transformers to model long-range dependencies. The main idea is to read smaller segments of the text and summarize them into a memory table to be used in a second read of the text. We show that the model outperforms models of comparable size on several QA datasets and sets the state of the art on the challenging NarrativeQA dataset which asks questions about entire books. View details
    CoMSum: Dataset and Neural Model for Contextual Multi-Document Summarization
    Sheide Chammas
    Wan Zhu
    International Conference on Document Analysis and Recognition, International Conference on Document Analysis and Recognition (2021)
    Preview abstract Summarization is the task of compressing source document(s) into coherent and succinct passages. Query-based (contextual) multi-document summarization (qMDS) is a variant that targets summaries to specific informational needs with queries providing additional contexts. Progress in qMDS has been hampered by limited availability of corresponding types of datasets. In this work, we make two contributions. First, we develop an automatic approach for creating both extractive and abstractive qMDS examples from existing language resources. We use this approach to create \qmds, a qMDS dataset for public use. Secondly, to validate the utility of \qmds, we propose a neural model for extractive summarization that exploits the hierarchical nature of the input from multiple documents. It also infuses queries into the modeling to extract query-specific summaries. The experimental results show that modeling the queries and the multiple documents hierarchically improve the performance of qMDS on this datasets. This is consitent with our intuition and supports using \qmds for developing learning methods for qMDS. View details
    When MAML Can Adapt Fast and How to Assist When It Cannot
    Sebastien Arnold
    Shariq Iqbal
    Proceedings of AISTATS 2021
    Preview abstract Model-Agnostic Meta-Learning (MAML) and its variants have achieved success in meta-learning tasks on many datasets and settings. On the other hand, we have just started to understand and analyze how they are able to adapt fast to new tasks. For example, one popular hypothesis is that the algorithms learn good representations for transfer, as in multi-task learning. In this work, we contribute by providing a series of empirical and theoretical studies, and discover several interesting yet previously unknown properties of the algorithm. We find MAML adapts better with a deep architecture even if the tasks need only a shallow one (and thus, no representation learning is needed). While echoing previous findings by others that the bottom layers in deep architectures enable representation learning, we also find that upper layers enable fast adaptation by being meta-learned to perform adaptive gradient update when generalizing to new tasks. Motivated by these findings, we study several meta-optimization approaches and propose a new one for learning to optimize adaptively. Those approaches attain stronger performance in meta-learning both shallower and deeper architectures than MAML. View details
    DOCENT: Learning Self-Supervised Entity Representations from Large Document Collections
    Yury Zemlyanskiy
    Sudeep Gandhe
    Ruining He
    Anirudh Ravula
    Juro Gottweis
    Ilya Eckstein
    Proceedings of EACL (2021) (to appear)
    Preview abstract This paper explores learning rich self-supervised entity representations from large amounts of associated text. Once pre-trained, these models become applicable to multiple entity-centric tasks such as search ranked retrieval, knowledge base completion, question answering and more. Unlike other methods that harvest self-supervision signals based merely on a local context within a sentence, we radically expand the notion of context to include {\em any} available text related to an entity. With the breadth and depth of textual content available on the web, this approach enables a new class of powerful, high-capacity representations that can ultimately ``remember" any useful information about an entity, without the need for human annotations. We present several training strategies that jointly learn to predict words and entities --- strategies we compare experimentally on downstream tasks in the TV-Movies domain, such as MovieLens tag prediction from user reviews and natural language movie search. As evidenced by results, our models outperform competitive baselines, sometimes with little or no fine-tuning, and are also able to scale to very large corpora. Finally, we make our datasets and pre-trained models publicly available\footnote{To be released after the review period.}. This includes {\em Reviews2Movielens}, mapping the 1B word corpus of Amazon movie reviews to MovieLens tags, as well as Reddit Movie Suggestions containing natural language queries and corresponding community recommendations. View details
    Classifier and Exemplar Synthesis for Zero-Shot Learning
    Wei-Lun Chao
    International Journal of Computer Vision, vol. 128 (2020), pp. 166-201
    Preview abstract Zero-shot learning (ZSL) enables solving a task without the need to see its examples. In this paper, we propose two ZSL frameworks that learn to synthesize parameters for novel unseen classes. First, we propose to cast the problem of ZSL as learning manifold embeddings from graphs composed of object classes, leading to a flexible approach that synthesizes “classifiers” for the unseen classes. Then, we define an auxiliary task of synthesizing “exemplars” for the unseen classes to be used as an automatic denoising mechanism for any existing ZSL approaches or as an effective ZSL model by itself. On five visual recognition benchmark datasets, we demonstrate the superior performances of our proposed frameworks in various scenarios of both conventional and generalized ZSL. Finally, we provide valuable insights through a series of empirical analyses, among which are a comparison of semantic representations on the full ImageNet benchmark as well as a comparison of metrics used in generalized ZSL. Our code and data are publicly available at https://github.com/pujols/Zero-shot-learning-journal. View details
    BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps
    Wang Zhu
    Hexiang Hu
    Jiacheng Chen
    Zhiwei Deng
    Vihan Jain
    Proceedings of ACL 2020
    Preview abstract Learning to follow instructions is of fundamental importance to autonomous agents for vision-and-language navigation (VLN). In this paper, we study how an agent can navigate long paths when learning from a corpus that consists of shorter ones. We show that existing state-of-the-art agents do not generalize well. To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially. A special design memory buffer is used by the agent to turn its past experiences into contexts for future steps. The learning process is composed of two phases. In the first phase, the agent uses imitation learning from demonstration to accomplish BabySteps. In the second phase, the agent uses curriculum-based reinforcement learning to maximize rewards on navigation tasks with increasingly longer instructions. We create two new benchmark datasets (of long navigation tasks) and use them in conjunction with existing ones to examine BabyWalk’s generalization ability. Empirical results show that BabyWalk achieves state-of-the-art results on several metrics, in particular, is able to follow long instructions better. View details
    Amortized Inference of Variational Bounds for Learning Noisy-OR
    Melissa Ailem
    Yiming Yan
    Proceedings of AISTATS 2020 (to appear)
    Preview abstract Classical approaches for approximate inference depend on cleverly designed variational distributions and bounds. Modern approaches employ amortized variational inference, which uses a neural network to approximate any posterior without leveraging the structures of the generative models. In this paper, we propose Amortized Conjugate Posterior (ACP), a hybrid approach taking advantages of both types of approaches. Specifically, we use the classical methods to derive specific forms of posterior distributions and then learn the variational parameters using amortized inference. We study the effectiveness of the proposed approach on the noisy-or model and compare to both the classical and the modern approaches for approximate inference and parameter learning. Our results show that the proposed method outperforms or are at par with other approaches. View details
    Learning to Represent Images and Texts with Denotation Graphs
    Bowen Zhang
    Hexiang Hu
    Vihan Jain
    Proceedings of EMNLP 2020 (to appear)
    Preview abstract Learning to fuse vision and language information and represent them is an important topic with many applications. Recent progresses have leveraged the ideas of pre-training (from language modeling) and attention layers in Transformers to learn representation from datasets with images aligned with linguistic expressions that describe the images. In this paper, we propose learning representations from a set of implied visually grounded expressions between image and text, automatically mined from those datasets. In particular, we use denotation graphs to represent how specific concepts (such as sentences describing images) can be linked to abstract and generic concepts (such as short phrases) that are also visually grounded. This type of generic-to-specific relations can be discovered using linguistic analysis tools. We propose methods to incorporate such relations into learning representation. We show that state-of-the-art multimodal learning methods such as ViLBERT can be further improved by leveraging automatically harvested structural relations. The representations lead to stronger empirical results on downstream tasks of text-based image retrieval, and referral expression localization. We will release to the public both our codes and the extracted denotation graphs on both the Flickr30K and the COCO datasets. View details
    Preview abstract We describe a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identify the most suitable \emph{text} describing a scene, given several similar options. Accomplishing the task entails demonstrating comprehension beyond just recognizing ``keywords'' (or key-phrases) and their corresponding visual concepts, and instead requires an alignment between the representations of the two modalities that achieves a visually-grounded ``understanding'' of various linguistic elements and their dependencies. This new task also admits an easy-to-compute and well-understood metric: the accuracy in detecting the true target among the decoys. The paper makes several contributions: a generic mechanism for generating decoys from (human-created) image captions; an instance of applying this mechanism, yielding a large-scale machine comprehension dataset (based on the COCO images and captions) that we make publicly available; results on a human evaluation on this dataset, thus providing a performance ceiling; and several baseline and competitive learning approaches that illustrate the utility of the proposed framework in advancing both image and language machine comprehension. In particular, there is a large gap between human performance and state-of-the-art learning methods, suggesting a fruitful direction for future research. View details
    No Results Found