Bandit-based Search for Constraint Programming

Manuel Loth, Michèle Sebag, Youssef Hamadi, Christian Schulte, Marc Schoenauer.

[pdf | bibtex]

Constraint Programming (CP) solvers classically explore the solution space using tree-search based heuristics. Monte- Carlo Tree-Search (MCTS) is a tree-search method aimed at optimal sequential decision making under uncertainty. At the crossroads of CP and MCTS, this paper presents the Bandit Search for Constraint Programming (BASCOP) algorithm, adapting MCTS to the specifics of CP search trees. Formally, MCTS simultaneously estimates the average node reward, and uses it to bias the exploration towards the most promising regions of the tree, borrowing the multi-armed bandit (MAB) decision rule. The two contributions in BASCOP concern i) a specific reward function, estimating the relative failure depth conditionally to a (variable, value) assignment; ii) a new decision rule, hybridizing the MAB framework and the spirit of local neighborhood search. Specifically, BASCOP guides the CP search in the neighborhood of the previous best solution, by exploiting statistical estimates gathered across multiple restarts. BASCOP, using Gecode as the underlying constraint solver, shows significant improvements over the depthfirst search baseline on some CP benchmark suites. For hard job-shop scheduling problems, BASCOP matches the results of state-of-the-art scheduling-specific CP approaches. These results demonstrate the potential of BASCOP as a generic yet robust search method for CP.

In: COmbining COnstraint solving with MIning and LEarning, Bellevue, WA, USA. July, 2013.