ROADEF 2023 : 24ème congrès de la Société Française de Recherche Opérationnelle et d'Aide à la Décision

sciencesconf.org:roadef2023:436462

Generalized Nested Rollout Policy Adaptation with Bias Learning

Julien Sentuc 1, @ , Jean-Yves Lucas 2, @ , Tristan Cazenave 3, @ , Farah Ellouze, @

1 : LAMSADE

université Paris Dauphine, PSL Resarch University

2 : EDF Labs

EDF Recherche et Développement

3 : Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision

Université Paris Dauphine-PSL, Centre National de la Recherche Scientifique : UMR7024

Nested Monte Carlo Search (NMCS) is a recursive algorithm which uses lower level playouts to bias its playouts, memorizing the best sequence at each level. At level 0, a Monte Carlo simulation is performed, random decisions are made until a terminal state is reached. Based on the latter, the Nested Rollout Policy Adaptation (NRPA) algorithm was introduced. NRPA combines nested search, memorizing the best sequence of moves found, and the online learning of a playout policy using this sequence. Generalized Nested Rollout Policy Adaptation (GNRPA) generalizes the way the probability is calculated using a temperature and a bias. In this work we intoduce an extension of GNRPA, namely Bias Learning GNRPA (BGNRPA). BLGNRPA automatically learns the bias weights. The goal is both to obtain better results on sets of dissimilar instances, and also to avoid some hyperparameters settings. The idea is to learn the parameters of the bias along with the policy. Experiments show that it improves the GNRPA algorithm for two different optimization problems : the Vehicle Routing Problem and 3D Bin Packing.

Type :	:	Article
Thématiques	:	Les outils du Machine Learning pour les problèmes de tournées (GT GT2L)
Mots-Clés	:	Vehicle Routing ; 3D Bin Packing ; Monte Carlo Search ; GNRPA

Vie privée | Accessibilité