Bounded Policy Iteration for Decentralized POMDPs
Daniel S. Bernstein
Eric A. Hansen
Shlomo Zilberstein
Abstract
We present a bounded policy iteration algorithm for
infinite-horizon decentralized POMDPs. Policies
are represented as joint stochastic finite-state controllers,
which consist of a local controller for each
agent. We also let a joint controller include a correlation
device that allows the agents to correlate
their behavior without exchanging information during
execution, and show that this leads to improved
performance. The algorithm uses a fixed amount
of memory, and each iteration is guaranteed to produce
a controller with value at least as high as the
previous one for all possible initial state distributions.
For the case of a single agent, the algorithm
reduces to Poupart and Boutilier's bounded policy
iteration for POMDPs.
Download
[pdf]