LLMs aplicados al diseño automático de agentes basados en RL

Gil Garcia, Enrique

dc.contributor.advisor	Martín Gallardo, Emilio	es-ES
dc.contributor.author	Gil Garcia, Enrique	es-ES
dc.contributor.other	Universidad Pontificia Comillas, Escuela Técnica Superior de Ingeniería (ICAI)	es_ES
dc.date.accessioned	2025-03-13T22:36:58Z
dc.date.available	2025-03-13T22:36:58Z
dc.date.issued	2025	es_ES
dc.identifier.uri	http://hdl.handle.net/11531/98060
dc.description	Máster Universitario en Big Data	es_ES
dc.description.abstract	Este Trabajo de Fin de Máster explora el uso de Modelos de Lenguaje (LLMs) para automatizar y optimizar el diseño de funciones de recompensa en agentes de Aprendizaje por Refuerzo (RL). Tradicionalmente, la configuración de recompensas requiere intervención manual, lo que la hace laboriosa, específica y propensa a errores, afectando la eficiencia del aprendizaje. El estudio propone aprovechar la capacidad de los LLMs para interpretar lenguaje natural y generar código, traduciendo descripciones de tareas y comportamientos en funciones de recompensa ejecutables. Se realizaron experimentos en entornos estándar de Gymnasium, donde los LLMs generaron funciones de recompensa a partir de descripciones textuales. Estas funciones se evaluaron frente a versiones diseñadas manualmente, usando algoritmos como Proximal Policy Optimization. En algunos casos, las recompensas generadas por LLMs lograron una convergencia más rápida y mejores políticas, demostrando su potencial. Sin embargo, también se detectaron limitaciones, como la variabilidad en las respuestas del modelo y su sensibilidad a la formulación de los prompts. Estos resultados muestran que, aunque los LLMs ofrecen una vía prometedora para acelerar el desarrollo en RL, aún se requiere mejorar la ingeniería de prompts, la alineación del modelo y la incorporación de conocimiento experto. El trabajo concluye proponiendo enfoques híbridos que combinen la intuición humana con la generación automatizada de los LLMs, para desarrollar sistemas de RL más adaptativos, eficientes y escalables.	es-ES
dc.description.abstract	This thesis investigates the potential of integrating Large Language Models (LLMs) into the design of reward functions for Reinforcement Learning (RL) agents, with a particular focus on automating and enhancing the reward shaping process. Drawing inspiration from the Eureka framework, this work aims to assess whether LLMs can serve as effective co-designers or autonomous creators of reward functions in complex RL environments. Traditional reward engineering is often labor-intensive, domain-specific, and prone to suboptimal guidance, which can hinder learning efficiency and generalization. By leveraging the natural language understanding and generative capabilities of LLMs, we propose a method for translating task goals and behavioral heuristics into executable reward components. To evaluate this approach, we conduct a series of experiments in standard Gymnasium environments, including LunarLander-v2 and BipedalWalker-v3, where LLMs are prompted to generate reward logic based on task descriptions and agent behavior summaries. The performance of LLM-derived reward functions is compared against baseline hand-crafted functions using common RL algorithms such as Proximal Policy Optimization (PPO) or Deep Q-Network (DQN). Our results indicate that in certain task-specific scenarios, LLM-generated reward functions can lead to faster convergence and more robust policies than their manually designed counterparts. Nevertheless, we also identify key limitations, including inconsistency in LLM outputs, sensitivity to prompt phrasing, and challenges in mapping high-level language descriptions to low-level observation spaces. These findings highlight the promise and the current limitations of using LLMs in reward function design. While LLMs show potential as tools for accelerating and democratizing RL development, achieving fully autonomous and reliable reward shaping requires further advances in prompt engineering, model alignment, and the incorporation of domain knowledge. We conclude by outlining future directions for hybrid reward design frameworks that combine human intuition with the generative flexibility of LLMs, setting the stage for more adaptive and scalable reinforcement learning systems.	en-GB
dc.format.mimetype	application/pdf	es_ES
dc.language.iso	es-ES	es_ES
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	es_ES
dc.subject.other	H0Z	es_ES
dc.title	LLMs aplicados al diseño automático de agentes basados en RL	es_ES
dc.type	info:eu-repo/semantics/masterThesis	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.keywords	Aprendizaje por Refuerzo, Funciones de recompensa, LLM, Gymnasium, Optimización dinámica, Automatización en inteligencia artificial	es-ES
dc.keywords	Reinforcement Learning,Reward Functions,LLM (Large Language Models), Gymnasium, Dynamic Optimization, Artificial Intelligence Automation	en-GB

Ficheros en el ítem

Nombre:: TFM - Gil Garcia, Enrique.pdf
Tamaño:: 846.3Kb
Formato:: PDF
Descripción:: Trabajo Fin de Máster

Ver/

Nombre:: AnexoI1.pdf
Tamaño:: 15.69Kb
Formato:: PDF
Descripción:: Autorización

Ver/

Este ítem aparece en la(s) siguiente(s) colección(ones)

TFG, TFM (temporales)

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivs 3.0 United States