Learning to cooperate in a search mission via policy search

Authors:

  • Martin Daniel

Publish date: 2002-01-01

Report number: FOI-R--0386--SE

Pages: 50

Written in: English

Abstract

The dangers of and the time needed when clearing an area from unexploded ordnance can be reduced by a system consisting of unmanned, autonomous robots. The system will need less time when more than one robot cooperate to search the area. The reinforcement learning algorithm GPOMDP is evaluated for the specific case of finding a decision rule that, given a map and the robot´s position on the map, enables the robot to automatically choose between different possible actions. The actions lead to a near optimal path through an area where some parts need to be searched. A neural network is used as a function approximator to store and improve the decision rule, and also to find actions according to it. The problem is expanded to include two robots using the same decision rule, distributed in a sense that the robots pick actions according to their own perception of the surroundings and independent of the other robot´s action. To achieve cooperation between the robots, they are trained to maximise a shared reward that is equal to the sum of individual rewards that are given according to the consequences of the robots´ actions. When using the learnt policy to search the largest of the experiment´s areas, two robots that have been trained with a shared reward use 70% of the time that one optimal robot would need, while two agents that have been trained with their individual rewards need 88%.