Eugene Ie
Authored Publications
Sort By
CoMSum: Dataset and Neural Model for Contextual Multi-Document Summarization
Sheide Chammas
Wan Zhu
International Conference on Document Analysis and Recognition, International Conference on Document Analysis and Recognition (2021)
Preview abstract
Summarization is the task of compressing source document(s) into coherent and succinct passages. Query-based (contextual) multi-document summarization (qMDS) is a variant that targets summaries to specific informational needs with queries providing additional contexts. Progress in qMDS has been hampered by limited availability of corresponding types of datasets. In this work, we make two contributions. First, we develop an automatic approach for creating both extractive and abstractive qMDS examples from existing language resources. We use this approach to create \qmds, a qMDS dataset for public use. Secondly, to validate the utility of \qmds, we propose a neural model for extractive summarization that exploits the hierarchical nature of the input from multiple documents. It also infuses queries into the modeling to extract query-specific summaries. The experimental results show that modeling the queries and the multiple documents hierarchically improve the performance of qMDS on this datasets. This is consitent with our intuition and supports using \qmds for developing learning methods for qMDS.
View details
On the Evaluation of Vision-and-Language Navigation Instructions
Ming Zhao
Peter Anderson
Vihan Jain
Su Wang
Conference of the European Chapter of the Association for Computational Linguistics (EACL) (2021)
RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems
Martin Mladenov
Vihan Jain
Christopher Colby
Nicolas Mayoraz
Hubert Pham
Ivan Vendrov
ArXiv (2021)
Preview abstract
The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different
agents (e.g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years.
Developing and training models and algorithms for such recommenders can be especially difficult using static datasets, which often
fail to offer the types of counterfactual predictions needed to evaluate policies over extended horizons. To address this, we develop
RecSim NG, a probabilistic platform for the simulation of multi-agent recommender systems. RecSim NG is a scalable, modular,
differentiable simulator implemented in Edward2 and TensorFlow. It offers: a powerful, general probabilistic programming language for
agent-behavior specification; tools for probabilistic inference and latent-variable model learning, backed by automatic differentiation
and tracing; a TensorFlow-based runtime for running simulations on accelerated hardware. We describe RecSim NG and illustrate
how it can be used to create transparent, configurable, end-to-end models of a recommender ecosystem, complemented by a small
set of simple use cases that demonstrate how RecSim NG can help both researchers and practitioners easily develop and train novel algorithms for recommender systems.
A short version of this paper was published at RecSys 2020.
View details
Preview abstract
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and instructions) than other VLN datasets. It emphasizes the role of language in VLN by addressing known biases in paths and eliciting more references to visible entities. Furthermore, each word in an instruction is time-aligned to the virtual poses of instruction creators and validators. We establish baseline scores for monolingual and multilingual settings and multitask learning when including Room-to-Room annotations. We also provide results for a model that learns from synchronized pose traces by focusing only on portions of the panorama attended to in human demonstrations. The size, scope and detail of RxR dramatically expands the frontier for research on embodied language agents in simulated, photo-realistic environments.
View details
Demonstrating Principled Uncertainty Modeling for Recommender Ecosystems with RecSim NG
Martin Mladenov
Vihan Jain
Christopher Colby
Nicolas Mayoraz
Hubert Pham
Ivan Vendrov
RecSys '20: Fourteenth ACM Conference on Recommender Systems (2020), pp. 591-593
Preview abstract
We develop RecSim NG, a probabilistic platform that supports natural, concise specification and learning of models for multi-agent recommender systems simulation. RecSim NG is a scalable, modular,
differentiable simulator implemented in Edward2 and TensorFlow.
An extended version of this paper is available as arXiv:2103.08057.
View details
Preview abstract
Learning to fuse vision and language information and represent them is an important topic with many applications. Recent progresses have leveraged the ideas of pre-training (from language modeling) and attention layers in Transformers to learn representation from datasets with images aligned with linguistic expressions that describe the images. In this paper, we propose learning representations from a set of implied visually grounded expressions between image and text, automatically mined from those datasets. In particular, we use denotation graphs to represent how specific concepts (such as sentences describing images) can be linked to abstract and generic concepts (such as short phrases) that are also visually grounded. This type of generic-to-specific relations can be discovered using linguistic analysis tools. We propose methods to incorporate such relations into learning representation. We show that state-of-the-art multimodal learning methods such as ViLBERT can be further improved by leveraging automatically harvested structural relations. The representations lead to stronger empirical results on downstream tasks of text-based image retrieval, and referral expression localization. We will release to the public both our codes and the extracted denotation graphs on both the Flickr30K and the COCO datasets.
View details
Preview abstract
The Touchdown dataset (Chen et al., 2019) provides instructions by human annotators for navigation through New York City streets and for resolving spatial descriptions at a given location. To enable the wider research community to work effectively with the Touchdown tasks, we are publicly releasing the 29k raw Street View panoramas needed for Touchdown. We follow the process used for the StreetLearn data release (Mirowski et al., 2019) to check panoramas for personally identifiable information and blur them as necessary. These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn. We also provide a reference implementation for both of the Touchdown tasks: vision and language navigation (VLN) and spatial description resolution (SDR). We compare our model results to those given in Chen et al. (2019) and show that the panoramas we have added to StreetLearn fully support both Touchdown tasks and can be used effectively for further research and comparison.
View details
Preview abstract
Learning to follow instructions is of fundamental importance to autonomous agents for vision-and-language navigation (VLN). In this paper, we study how an agent can navigate long paths when learning from a corpus that consists of shorter ones. We show that existing state-of-the-art agents do not generalize well. To this end, we propose BabyWalk, a new VLN agent that is learned to navigate by decomposing long instructions into shorter ones (BabySteps) and completing them sequentially. A special design memory buffer is used by the agent to turn its past experiences into contexts for future steps. The learning process is composed of two phases. In the first phase, the agent uses imitation learning from demonstration to accomplish BabySteps. In the second phase, the agent uses curriculum-based reinforcement learning to maximize rewards on navigation tasks with increasingly longer instructions. We create two new benchmark datasets (of long navigation tasks) and use them in conjunction with existing ones to examine BabyWalk’s generalization ability. Empirical results show that BabyWalk achieves state-of-the-art results on several metrics, in particular, is able to follow long instructions better.
View details
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Vihan Jain
Jing Wang
Sanmit Narvekar
Ritesh Agarwal
Rui Wu
Morgane Lustman
Vince Gatto
Paul Covington
Jim McFadden
arXiv (2019)
Preview abstract
Most practical recommender systems focus on estimating immediate user engagement without considering the long-term effects of recommendations on user behavior. Reinforcement learning (RL) methods offer the potential to optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. In this work, we address the challenge of making slate-based recommendations to optimize long-term value using RL. Our contributions are three-fold. (i) We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user-choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. (ii) We outline a methodology that leverages existing myopic learning-based recommenders to quickly develop a recommender that handles LTV. (iii) We demonstrate our methods in simulation, and validate the scalability of decomposed TD-learning using SlateQ in live experiments on YouTube.
View details
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Vihan Jain
Gabriel Magalhaes
Ashish Vaswani
Association for Computational Linguistics (2019)
Preview abstract
Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation (VLN), in which agents interpret natural language instructions and visual scenes to move through environments and reach goals. Despite recent progress, current research leaves unclear how much of a role language understanding plays in this task, especially because dominant evaluation metrics have focused on goal completion rather than the sequence of actions corresponding to the instructions. Here, we highlight shortcomings of current metrics for the Room-to-Room dataset (Anderson et al. 2018b) and propose a new metric, Coverage weighted by Length Score (CLS). We also show that the existing paths in the dataset are not ideal for evaluating instruction following because they are direct-to-goal shortest paths. We join existing short paths to form more challenging extended paths to create a new data set, Room-for-Room (R4R). Using R4R and CLS, we show that agents that receive rewards for instruction fidelity outperform agents that focus on goal completion.
View details