Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/88131
Título : LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
Autor : de Curtò i Díaz, Joaquim
de Zarzà i Cubero, Irene
Roig, Gemma
Cano, Juan Carlos
Manzoni, Pietro
Calafate, Carlos T.
Fecha de publicación : 25-jun-2023
Resumen : .
In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing the predictive power of large language models (LLMs). With the realization that traditional bandit strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle in the face of dynamic changes, we propose a strategy informed by LLMs that offers dynamic guidance on exploration versus exploitation, contingent on the current state of the bandits. We bring forward a new non-stationary bandit model with fluctuating reward distributions and illustrate how LLMs can be employed to guide the choice of bandit amid this variability. Experimental outcomes illustrate the potential of our LLM-informed strategy, demonstrating its adaptability to the fluctuating nature of the bandit problem, while maintaining competitive performance against conventional strategies. This study provides key insights into the capabilities of LLMs in enhancing decision-making processes in dynamic and uncertain scenarios.
Descripción : Artículos en revistas
URI : https://doi.org/10.3390/electronics12132814
http://hdl.handle.net/11531/88131
ISSN : 2079-9292
Aparece en las colecciones: Artículos

Ficheros en este ítem:
Fichero Descripción Tamaño Formato  
2024210183839127_electronics-12-02814-decurto.pdf671,3 kBAdobe PDFVista previa
Visualizar/Abrir


Los ítems de DSpace están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.