LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments

de Curtò i Díaz, Joaquim; de Zarzà i Cubero, Irene; Roig, Gemma; Cano, Juan Carlos; Manzoni, Pietro; Calafate, Carlos T.

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/88131

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	de Curtò i Díaz, Joaquim	es-ES
dc.contributor.author	de Zarzà i Cubero, Irene	es-ES
dc.contributor.author	Roig, Gemma	es-ES
dc.contributor.author	Cano, Juan Carlos	es-ES
dc.contributor.author	Manzoni, Pietro	es-ES
dc.contributor.author	Calafate, Carlos T.	es-ES
dc.date.accessioned	2024-04-15T07:47:07Z	-
dc.date.available	2024-04-15T07:47:07Z	-
dc.date.issued	2023-06-25	es_ES
dc.identifier.issn	2079-9292	es_ES
dc.identifier.uri	https://doi.org/10.3390/electronics12132814	es_ES
dc.identifier.uri	http://hdl.handle.net/11531/88131	-
dc.description	Artículos en revistas	es_ES
dc.description.abstract	.	es-ES
dc.description.abstract	In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing the predictive power of large language models (LLMs). With the realization that traditional bandit strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle in the face of dynamic changes, we propose a strategy informed by LLMs that offers dynamic guidance on exploration versus exploitation, contingent on the current state of the bandits. We bring forward a new non-stationary bandit model with fluctuating reward distributions and illustrate how LLMs can be employed to guide the choice of bandit amid this variability. Experimental outcomes illustrate the potential of our LLM-informed strategy, demonstrating its adaptability to the fluctuating nature of the bandit problem, while maintaining competitive performance against conventional strategies. This study provides key insights into the capabilities of LLMs in enhancing decision-making processes in dynamic and uncertain scenarios.	en-GB
dc.format.mimetype	application/pdf	es_ES
dc.language.iso	en-GB	es_ES
dc.rights	Creative Commons Reconocimiento-NoComercial-SinObraDerivada España	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	es_ES
dc.source	Revista: Electronics, Periodo: 1, Volumen: 12, Número: 13, Página inicial: 2814, Página final: .	es_ES
dc.title	LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.description.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.rights.holder		es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.keywords	.	es-ES
dc.keywords	multi-armed bandit; non-stationary environments; large language models; AI strategy optimization; GPT-3.5-turbo; QLoRA	en-GB
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
2024210183839127_electronics-12-02814-decurto.pdf		671,3 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem