Abhinav Gupta

Abhinav Gupta

I am a Research Scientist at Google DeepMind working with the Gemini reasoning and code teams and was previously part of the post-training team based in London UK. I'm interested in the intersection of reinforcement learning and language focusing on fine-tuning LLMs with machine/execution feedback and building robust evaluation metrics. I received my PhD from MILA where I worked on improving self-play in emergent communication and also hold a Masters degree from NYU. My personal website can be found at guabhinav.com.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Dynamic population-based meta-learning for multi-agent communication with natural language
    Marc Lanctot
    Angeliki Lazaridou
    Advances in Neural Information Processing Systems, Curran Associates, Inc. (2021), pp. 16899-16912
    Preview abstract In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language. Previous work using a single set of agents has shown great progress in generalizing to known partners, however it struggles when coordinating with unfamiliar agents. To mitigate that, recent work explored the use of population-based approaches, where multiple agents interact with each other with the goal of learning more generic protocols. These methods, while able to result in good coordination between unseen partners, still only achieve so in cases of simple languages, thus failing to adapt to human partners using natural language. We attribute this to the use of static populations and instead propose a dynamic population-based meta-learning approach that builds such a population in an iterative manner. We perform a holistic evaluation of our method on two different referential games, and show that our agents outperform all prior work when communicating with seen partners and humans. Furthermore, we analyze the natural language generation skills of our agents, where we find that our agents also outperform strong baselines. Finally, we test the robustness of our agents when communicating with out-of-population agents and carefully test the importance of each component of our method through ablation studies. View details
    ×