Cooperation through Reinforcement Learning

Abstract

Can cooperation be learnt through reinforcement learning? This is the central question we pose in this paper. To answer it first requires an examination of what constitutes reinforcement learning. We also examine some of the issues associated with the design of a reinforcement learning system; these include: the choice of an update rule, whether or not to implement an eligibility trace.

In this paper we set ourselves four tasks that need solving, each task shows us certain aspects of reinforcement learning. Each task is of increasing complexity, the first two allow us to explore reinforcement learning on its own, while the last two allow us to examine reinforcement learning in a multi-agent setting. We begin with a system that learns to play blackjack; it allows us to examine how robust reinforcement learning algorithms are. The second system learns to run through a maze; here we learn how to correctly implement an eligibility trace, and explore different updating rules.

The two multi-agent systems involve a traffic simulation, as well as a cellular simulation. The traffic simulation shows the weaknesses in reinforcement learning that show up when applying it to a multi-agent setting. In our cellular simulation, we show that it is possible to implement a reinforcement learning algorithm in continuous state- space.

We reach the conclusion that while reinforcement learning does show great promise; it does suffer in performance when extending it to the multi-agent case. In particular the quality of solutions arrived at by a reinforcement learning system are suboptimal in the multi-agent case. We also show that the algorithm used for continuous state-space, does not achieve optimal performance either.

Participants

Technical Reports

[1] Trevor Johnson. An evaluation of how dynamic programming and game theory are applied to liar's dice. Technical Report Honours Project Report, Virtual Reality Special Interest Group, Computer Science Department, Rhodes University, Grahamstown, South Africa, November 2006. [PDF] [BibTeX]

[2] Philip Sterne. Cooperation through reinforcement learning. Technical Report Honours Project Report, Virtual Reality Special Interest Group, Computer Science Department, Rhodes University, Grahamstown, South Africa, November 2002. [DOC] [PDF] [BibTeX]

Images