Jump to Content
Sean Kirmani

Sean Kirmani

Sean Kirmani currently is a Senior Research Engineer working at Google DeepMind. His research interest involve computer vision, natural language processing, and robotics. Sean has also spent time working at X, the Moonshot Factory as part of The Everyday Robot Project.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Scalable Multi-Sensor Robot Imitation Learning via Task-Level Domain Consistency
    Armando Fuentes
    Daniel Ho
    Eric Victor Jang
    Matt Bennice
    Mohi Khansari
    Nicolas Sievers
    Yuqing Du
    ICRA (2023) (to appear)
    Preview abstract Recent work in visual end-to-end learning for robotics has shown the promise of imitation learning across a variety of tasks. However, such approaches are often expensive and require vast amounts of real world training demonstrations. Additionally, they rely on a time-consuming evaluation process for identifying the best model to deploy in the real world. These challenges can be mitigated by simulation - by supplementing real world data with simulated demonstrations and using simulated evaluations to identify strong policies. However, this introduces the well-known ``reality gap'' problem, where simulator inaccuracies decorrelates performance in simulation from reality. In this paper, we build on top of prior work in GAN-based domain adaptation and introduce the notion of a Task Consistency Loss (TCL), a self-supervised contrastive loss that encourages sim and real alignment both at the feature and action-prediction level. We demonstrate the effectiveness of our approach on the challenging task of latched-door opening with a 9 Degree-of-Freedom (DoF) mobile manipulator from raw RGB and depth images. While most prior work in vision-based manipulation operate from a fixed, third person view, mobile manipulation couples the challenges of locomotion and manipulation with greater visual diversity and action space complexity. We find that we are able to achieve 77% success on seen and unseen scenes, a +30% increase from the baseline, using only ~16 hours of teleoperation demonstrations in sim and real. View details
    No Results Found