The LP-update policy for weakly coupled Markov decision processes
1 : Inria Grenoble-Rhône Alpes
Inria Grenoble-Rhône Alpes, Univ. Grenoble-Alpes : InriaGrenoble-Rhône Alpes
2 : Laboratoire d'Informatique de Grenoble
Centre National de la Recherche Scientifique : UMR5217, Université Grenoble Alpes, Institut polytechnique de Grenoble - Grenoble Institute of Technology, Centre National de la Recherche Scientifique
Weakly coupled MDPs generalize the notion of resltess multi-armed bandits, that are known to be PSPACE-hard. In this paper, we show that a relaxed version of the problem can be solved by linear programming. We use the solution of a relaxed problem to build a heuristics, that we call the LP-update policy. We study its performance for a case with N statistically identical weakly coupled MDPs. We prove that, when N goes to infinity, the LP-update policy is asymptotically optimal. We also provide a rank condition that guarantees that the sub-optimality gap is at most O(1/N). We illustrate the good performance of our algorithm in a selection problem with fairness constraints.