Mostrar el registro sencillo del ítem

dc.contributor.authorde Curtò i Díaz, Joaquimes-ES
dc.contributor.authorde Zarzà i Cubero, Irenees-ES
dc.contributor.authorRoig, Gemmaes-ES
dc.contributor.authorCano, Juan Carloses-ES
dc.contributor.authorManzoni, Pietroes-ES
dc.contributor.authorCalafate, Carlos T.es-ES
dc.date.accessioned2024-04-15T07:47:07Z
dc.date.available2024-04-15T07:47:07Z
dc.date.issued2023-06-25es_ES
dc.identifier.issn2079-9292es_ES
dc.identifier.urihttps://doi.org/10.3390/electronics12132814es_ES
dc.identifier.urihttp://hdl.handle.net/11531/88131
dc.descriptionArtículos en revistases_ES
dc.description.abstract.es-ES
dc.description.abstractIn this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing the predictive power of large language models (LLMs). With the realization that traditional bandit strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle in the face of dynamic changes, we propose a strategy informed by LLMs that offers dynamic guidance on exploration versus exploitation, contingent on the current state of the bandits. We bring forward a new non-stationary bandit model with fluctuating reward distributions and illustrate how LLMs can be employed to guide the choice of bandit amid this variability. Experimental outcomes illustrate the potential of our LLM-informed strategy, demonstrating its adaptability to the fluctuating nature of the bandit problem, while maintaining competitive performance against conventional strategies. This study provides key insights into the capabilities of LLMs in enhancing decision-making processes in dynamic and uncertain scenarios.en-GB
dc.format.mimetypeapplication/pdfes_ES
dc.language.isoen-GBes_ES
dc.rightsCreative Commons Reconocimiento-NoComercial-SinObraDerivada Españaes_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/es_ES
dc.sourceRevista: Electronics, Periodo: 1, Volumen: 12, Número: 13, Página inicial: 2814, Página final: .es_ES
dc.titleLLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environmentses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.description.versioninfo:eu-repo/semantics/publishedVersiones_ES
dc.rights.holderes_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.keywords.es-ES
dc.keywordsmulti-armed bandit; non-stationary environments; large language models; AI strategy optimization; GPT-3.5-turbo; QLoRAen-GB


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

  • Artículos
    Artículos de revista, capítulos de libro y contribuciones en congresos publicadas.

Mostrar el registro sencillo del ítem

Creative Commons Reconocimiento-NoComercial-SinObraDerivada España
Excepto si se señala otra cosa, la licencia del ítem se describe como Creative Commons Reconocimiento-NoComercial-SinObraDerivada España