Robust Value Function Approximation Using Bilinear Programming

Marek Petrik and Shlomo Zilberstein. Robust Value Function Approximation Using Bilinear Programming. Proceedings of the Twenty-Third Neural Information Processing Systems Conference (NIPS), 1446-1454, Vancouver, British Columbia, Canada, 2009.

Abstract

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, this approach provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NP-hard, but this is unavoidable because the Bellman-residual minimization itself is NP-hard. We therefore employ and analyze a common approximate algorithm for bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on a simple benchmark problem.

Bibtex entry:

@inproceedings{PZnips09,
  author	= {Marek Petrik and Shlomo Zilberstein},
  title		= {Robust Value Function Approximation Using Bilinear Programming},
  booktitle     = {Proceedings of the Twenty-Third Neural Information Processing 
                   Systems Conference},
  year		= {2009},
  pages		= {1446-1454},
  address       = {Vancouver, British Columbia, Canada},
  url		= {http://rbr.cs.umass.edu/shlomo/papers/PZnips09.html}
}

shlomo@cs.umass.edu