Srini Narayanan
Research Areas
Authored Publications
Sort By
UGIF-DataSet: A New Dataset for Cross-lingual, Cross-modal Sequential actions on the UI
Findings of the Association for Computational Linguistics: NAACL 2024
Preview abstract
Help documents are supposed to aid smartphone users in resolving queries such as "How to block calls from unknown numbers?". However, given a query, identifying the right help document, understanding instructions from the document, and using them to resolve the issue at hand is challenging. The user experience may be enhanced by converting the instructions in the help document to a step-by-step tutorial overlaid on the phone UI. Successful execution of this task requires overcoming research challenges in retrieval, parsing, and grounding in the multilingual-multimodal setting. For example, user queries in one language may have to be matched against instructions in another language, which in turn needs to be grounded in a multimodal UI in yet another language. Moreover, there isn’t any relevant dataset for such a task. In order to bridge this gap, we introduce UGIF-DataSet, a multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone, containing 4,184 tasks across 8 languages. The instruction steps in UGIF-DataSet are available only in English, so the challenge involves operations in the cross-modal, cross-lingual setting. We compare the performance of different large language models for this task and find that the end-to-end task completion rate drops from 48% in English to 32% for other languages, demonstrating significant overall headroom for improvement. We are hopeful that UGIF-DataSet and our analysis will aid further research on the important problem of sequential task completion in the multilingual and multimodal setting.
View details
Preview abstract
We propose a benchmark to assess the capability of large language models to reason with metaphor.
Our benchmark combines the previously isolated topics of metaphor detection and commonsense reasoning into a single task that requires a model to make inferences by accurately selecting between the literal and metaphorical register. We examine the performance of state-of-the-art pretrained models on forced-choice tasks and find a large discrepancy between small and very large models, going from chance- to human-level performance. However, upon examining the generative performance of the largest model, we find that there is still a gap to bridge before human performance is reached in a more natural conversational setting.
View details
Real-Time Sign Language Detection using Human Pose Estimation
Amit Moryossef
Sarah Ebling
SLRTP2020 (2020)
Preview abstract
We propose a lightweight real-time sign language detection model, as we identify the need for such a case in videoconferencing. We extract optical flow features based on human pose estimation and, using a linear classifier, show these features are meaningful with an accuracy of 80%, evaluated on the DGS Corpus. Using a recurrent model directly on the input, we see improvements of up to 91% accuracy, while still working under 4ms. We describe a demo application to sign language detection in the browser in order to demonstrate its usage possibility in videoconferencing applications.
View details
Points, Paths, and Playscapes: Large-scale Spatial Language Understanding Tasks Set in the Real World
Daphne Luong
Proceedings of the First International Workshop on Spatial Language Understanding, Association for Computational Linguistics, New Orleans, Louisiana, USA (2018), pp. 46-52
Preview abstract
Spatial language understanding is important for practical applications and as a building block for better abstract language understanding. Much progress has been made through work on understanding spatial relations and values in images and texts as well as on giving and following navigation instructions in restricted domains. We argue that the next big advances in spatial language understanding can be best supported by creating large-scale datasets that focus on points and paths based in the real world, and then extending these to create online, persistent playscapes that mix human and bot players. The bot players can begin play having undergone a prior training regime, but then must learn, evolve, and survive according to their depth of understanding of scenes, navigation, and interactions.
View details
Multilingual Metaphor Processing: Experiments with Semi-Supervised and Unsupervised Learning
Ekaterina Shutova
Lin Sun
Dario Gutierrez
Patricia Lichtenstein
Computational Linguistics (2017)
Preview abstract
Highly frequent in language and communication, metaphor represents a significant challenge for
Natural Language Processing (NLP) applications. Computational work on metaphor has traditionally
evolved around the use of hand-coded knowledge, making the systems hard to scale. Recent
years have witnessed a rise in statistical approaches to metaphor processing. However, these
approaches often require extensive human annotation effort and are predominantly evaluated
within a limited domain. In contrast, we experiment with weakly supervised and unsupervised
techniques — with little or no annotation — to generalize higher-level mechanisms of metaphor
from distributional properties of concepts. We investigate different levels and types of supervision
(learning from linguistic examples vs. learning from a given set of metaphorical mappings vs.
learning without annotation) in flat and hierarchical, unconstrained and constrained clustering
settings. Our aim is to identify the optimal type of supervision for a learning algorithm that
discovers patterns of metaphorical association from text. In order to investigate the scalability
and adaptability of our models, we applied them to data in three languages from different
language groups — English, Spanish and Russian, — achieving state-of-the-art results with
little supervision. Finally, we demonstrate that statistical methods can facilitate and scale up
cross-linguistic research on metaphor.
View details
Bridging Text and Knowledge with Frames
ACL Workshop on Frame Semantics (in honor of Charles FIllmore) (2014)
Preview abstract
FrameNet is the current best operational
version of Chuck Fillmore’s Frame Semantics. As FrameNet has evolved over the years, we have been building a series of increasingly ambitious prototype applications that exploit the ideas of frame semantics and FrameNet as a resource. Results from this work suggest that frames are a
natural semantic representation linking issue of textual meaning and world knowledge.
View details