Jump to Content
Siqi Liu

Siqi Liu

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Emergent Coordination through Competition
    Guy Lever
    Josh Merel
    Nicolas Heess
    Saran Tunyasuvunakool
    Thore Graepel
    ICLR (2019)
    Preview abstract We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics. We demonstrate that decentralized, population-based training with co-play can lead to a progression in agents' behaviors: from random, to simple ball chasing, and finally showing evidence of cooperation. Our study highlights several of the challenges encountered in large scale multi-agent training in continuous control. In particular, we demonstrate that the automatic optimization of simple shaping rewards, not themselves conducive to co-operative behavior, can lead to long-horizon team behavior. We further apply an evaluation scheme, grounded by game theoretic principals, that can assess agent performance in the absence of pre-defined evaluation tasks or human baselines. View details
    Observational Learning by Reinforcement Learning
    Nicolas Heess
    Bilal Piot
    Remi Munos
    Olivier Pietquin
    Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2019)
    Preview abstract Observational learning is a type of learning that occurs as a function of observing, retaining and possibly imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other agents is necessary to achieve observational learning through machine learning. Especially, we argue that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory. Through simple scenarios, we demonstrate that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment. The other agent is only observed through the effect of its actions on the environment and never explicitly modeled. Two key aspects are borrowed from observational learning: i) the observer behaviour needs to change as a result of viewing a 'teacher' (another agent) and ii) the observer needs to be motivated somehow to engage in making use of the other agent's behaviour. The later is naturally modeled by RL, by correlating the learning agent's reward with the teacher agent's behaviour. View details
    Preview abstract Current image captioning methods are usually trained via (penalized) maximum likelihood estimation. However, the log-likelihood score of a caption does not correlate well with human assessments of quality. Standard syntactic evaluation metrics, such BLEU, METEOR and ROUGE, are also not well correlated. The SPICE and CIDEr metrics are better correlated, but have traditionally been hard to optimize for. In this paper, we show how to use a policy gradient (PG) algorithm to directly optimize a combination of SPICE and CIDEr (a combination we call SPIDEr): the SPICE score ensures our captions are semantically faithful to the image, and the CIDEr score ensures our captions are syntactically fluent. The PG algorithm we propose improves on the prior MIXER approach, by using Monte Carlo rollouts instead of mixing ML training with PG. We show empirically that our algorithm leads to improved results compared to MIXER. Finally, we shoow that using our PG algorithm to optimize the novel SPIDEr metric results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained using different objective functions. View details
    No Results Found