Generalized Nested Rollout Policy Adaptation with Bias Learning
Julien Sentuc  1@  , Jean-Yves Lucas  2@  , Tristan Cazenave  3@  , Farah Ellouze@
1 : LAMSADE
université Paris Dauphine, PSL Resarch University
2 : EDF Labs
EDF Recherche et Développement
3 : Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision
Université Paris Dauphine-PSL, Centre National de la Recherche Scientifique : UMR7024

Nested Monte Carlo Search (NMCS) is a recursive algorithm which uses lower level playouts to bias its playouts, memorizing the best sequence at each level. At level 0, a Monte Carlo simulation is performed, random decisions are made until a terminal state is reached. Based on the latter, the Nested Rollout Policy Adaptation (NRPA) algorithm was introduced. NRPA combines nested search, memorizing the best sequence of moves found, and the online learning of a playout policy using this sequence. Generalized Nested Rollout Policy Adaptation (GNRPA) generalizes the way the probability is calculated using a temperature and a bias. In this work we intoduce an extension of GNRPA, namely Bias Learning GNRPA (BGNRPA). BLGNRPA automatically learns the bias weights. The goal is both to obtain better results on sets of dissimilar instances, and also to avoid some hyperparameters settings. The idea is to learn the parameters of the bias along with the policy. Experiments show that it improves the GNRPA algorithm for two different optimization problems : the Vehicle Routing Problem and 3D Bin Packing.


Personnes connectées : 7 Vie privée
Chargement...