Multi-Objective POMDPs with Lexicographic Reward Preferences

Kyle Hollins Wray and Shlomo Zilberstein. Multi-Objective POMDPs with Lexicographic Reward Preferences. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI), 1719-1725, Buenos Aires, Argentina, 2015.

Abstract

We propose a model, Lexicographic Partially Observable Markov Decision Process (LPOMDP), which extends POMDPs with lexicographic preferences over multiple value functions. It allows for slack--slightly less-than-optimal values--for higher-priority preferences to facilitate improvement in lower-priority value functions. Many real life situations are naturally captured by LPOMDPs with slack. We consider a semi-autonomous driving scenario in which time spent on the road is minimized, while maximizing time spent driving autonomously. We propose two solutions to LPOMDPs–Lexicographic Value Iteration (LVI) and Lexicographic Point-Based Value Iteration (LPBVI), establishing convergence results and correctness within strong slack bounds. We test the algorithms using real-world road data provided by Open Street Map (OSM) within 10 major cities. Finally, we present GPU-based optimizations for point-based solvers, demonstrating that their application enables us to quickly solve vastly larger LPOMDPs and other variations of POMDPs.

Bibtex entry:

@inproceedings{KZijcai15,
  author	= {Kyle Wray and Shlomo Zilberstein},
  title		= {Multi-Objective POMDPs with Lexicographic Reward Preferences},
  booktitle     = {Proceedings of the Twenty-Fourth International Joint Conference on
                   Artificial Intelligence},
  year		= {2015},
  pages		= {1719-1725},
  address       = {Buenos Aires, Argentina},
  url		= {http://rbr.cs.umass.edu/shlomo/papers/KZijcai15.html}
}

shlomo@cs.umass.edu
UMass Amherst