Anoop K. Sinha

Anoop K. Sinha

Anoop Sinha is Research Director at Google. His current research interests include future user interfaces research, with a focus on large model quality, and society-centered AI, applications that have the potential for beneficial impact on society. Prior to Google, he was at FAIR at Meta and led Machine Learning for Siri at Apple. Anoop has a PhD from UC Berkeley in Computer Science, with a Human Computer Interaction focus, and a BS from Stanford University where he received Honors in Science Technology & Society.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
Preview abstract While non-verbal behaviors and expressive movements are essential for natural human-robot interaction, existing methods often overlook a crucial element: the human’s internal cognitive state. Consequently, proactive multi-agent systems frequently interrupt humans at inopportune moments, leading to cognitive overload and decreased task performance. This paper introduces a framework for generating “cognitively aligned” multi-agent interactions, enhancing the ability of robotic systems to contextually defer communications during moments of high human mental workload. We present the design and implementation of a closed-loop architecture that explores the interplay between autonomous task execution and real-time neurophysiological focus. Utilizing a consumer-grade Brain-Computer Interface (BCI), our approach continuously monitors Electroencephalography (EEG) spectral band powers while a human performs a cognitive-load-inducing task. We propose a workload-driven pipeline where an HTTP-based signaling mechanism places a primary agent’s sensory inputs and audio outputs into a holding state upon detecting high cognitive load. This allows secondary agents to seamlessly process complex, delegated tasks in the background. Once the human’s cognitive state returns to a baseline, the primary agent releases the queued agent message. Our preliminary results demonstrate the feasibility of leveraging real-time signal processing, Large Language Models (LLMs), and physical robotic embodiments to create interrupt-aware, non-intrusive multi-agent systems. View details
Levels of Multimodal Interaction
Chinmay Kulkarni
Alex Olwal
ICMI Companion '24: Companion Proceedings of the 26th International Conference on Multimodal Interaction (2024)
Preview abstract Large Multimodal Models (LMMs) like OpenAI's GPT4o and Google's Gemini, introduced in 2024, process multiple modalities, enabling significant advances in multimodal interaction. Inspired by frameworks for self-driving cars and AGI, this paper proposes "Levels of Multimodal Interaction" to guide research and development. The four levels are: basic multimodality (0), single modalities in turn-taking; combined multimodality (1), fused interpretation of multiple modalities; humanlike (2), natural interaction flow with additional communication signals; and beyond humanlike (3), surpassing human capabilities and include underlying hidden signals with the potential for transformational human-AI integration. LMMs have progressed from Level 0 to 1, with Level 2 next. Level 3 sets a speculative target that multimodal interaction research could help achieve, where interaction becomes more natural and ultimately surpasses human capabilities. Eventually, such Level 3 multimodal interaction could lead to greater human-AI integration and transform human performance. This anticipated shift, in turn, introduces considerations, particularly around safety, agency and control of AI systems. View details
×