History-Based Controller Design and Optimization for Partially Observable MDPs
Akshat Kumar and Shlomo Zilberstein. History-Based Controller Design and Optimization for Partially Observable MDPs. Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), 156-164, Jerusalem, Israel, 2015.
Abstract
Partially observable MDPs provide an elegant framework for sequential decision making. Finite-state controllers (FSCs) are often used to represent policies for infinite-horizon problems as they offer a compact representation, simple-to-execute plans, and adjustable tradeoff between computational complexity and policy size. We develop novel connections between optimizing FSCs for POMDPs and the dual linear program for MDPs. Building on that, we present a dual mixed integer linear program (MIP) for optimizing FSCs. To assign well-defined meaning to FSC nodes as well as aid in policy search, we show how to associate history-based features with each FSC node. Using this representation, we address another challenging problem, that of iteratively deciding which nodes to add to FSC to get a better policy. Using an efficient off-the-shelf MIP solver, we show that this new approach can find compact near-optimal FSCs for several large benchmark domains, and is competitive with previous best approaches.
Bibtex entry:
@inproceedings{KZicaps15, author = {Akshat Kumar and Shlomo Zilberstein}, title = {History-Based Controller Design and Optimization for Partially Observable MDPs}, booktitle = {Proceedings of the International Conference on Automated Planning and Scheduling}, year = {2015}, pages = {156-164}, address = {Jerusalem, Israel}, url = {http://rbr.cs.umass.edu/shlomo/papers/KZicaps15.html} }shlomo@cs.umass.edu