ROADEF 2023 : 24ème congrès de la Société Française de Recherche Opérationnelle et d'Aide à la Décision

sciencesconf.org:roadef2023:435672

The LP-update policy for weakly coupled Markov decision processes

Chen Yan 1, 2, @ , Nicolas Gast 1, 2, @ , Bruno Gaujal 1, 2, @

1 : Inria Grenoble-Rhône Alpes

Inria Grenoble-Rhône Alpes, Univ. Grenoble-Alpes : InriaGrenoble-Rhône Alpes

2 : Laboratoire d'Informatique de Grenoble

Centre National de la Recherche Scientifique : UMR5217, Université Grenoble Alpes, Institut polytechnique de Grenoble - Grenoble Institute of Technology, Centre National de la Recherche Scientifique

Weakly coupled MDPs generalize the notion of resltess multi-armed bandits, that are known to be PSPACE-hard. In this paper, we show that a relaxed version of the problem can be solved by linear programming. We use the solution of a relaxed problem to build a heuristics, that we call the LP-update policy. We study its performance for a case with N statistically identical weakly coupled MDPs. We prove that, when N goes to infinity, the LP-update policy is asymptotically optimal. We also provide a rank condition that guarantees that the sub-optimality gap is at most O(1/N). We illustrate the good performance of our algorithm in a selection problem with fairness constraints.

Type :	:	Article
Thématiques	:	Contrôle et Optimisation Stochastique (GT COSMOS)
Mots-Clés	:	Markov decision processes ; stochastic optimization ; linear programming ; restless bandit

Vie privée | Accessibilité