Bounded Policy Iteration for Decentralized POMDPs

Daniel S. Bernstein, Eric A. Hansen, and Shlomo Zilberstein. Bounded Policy Iteration for Decentralized POMDPs. Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI), 1287-1292, Edinburgh, Scotland, 2005.

Abstract

We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs.

Bibtex entry:

@inproceedings{BHZijcai05,
  author	= {Daniel S. Bernstein and Eric A. Hansen and Shlomo Zilberstein},
  title		= {Bounded Policy Iteration for Decentralized {POMDP}s},
  booktitle     = {Proceedings of the Nineteenth International Joint Conference
		   on Artificial Intelligence},
  year		= {2005},
  pages		= {1287-1292},
  address       = {Edinburgh, Scotland},
  url		= {http://rbr.cs.umass.edu/shlomo/papers/BHZijcai05.html}
}

shlomo@cs.umass.edu
UMass Amherst