Offline Conservative Q-Learning for Optimizing Parameters of Retrieval-AugmentedGeneration with Large Language Models

Ibinarriaga Robles, Jorge

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/104759

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.advisor	Chaquet, Jacobo	es-ES
dc.contributor.author	Ibinarriaga Robles, Jorge	es-ES
dc.contributor.other	Universidad Pontificia Comillas, Escuela Técnica Superior de Ingeniería (ICAI)	es_ES
dc.date.accessioned	2025-09-26T09:10:54Z	-
dc.date.available	2025-09-26T09:10:54Z	-
dc.date.issued	2026	es_ES
dc.identifier.uri	http://hdl.handle.net/11531/104759	-
dc.description	Grado en Ingeniería Matemática e Inteligencia Artificial	es_ES
dc.description.abstract	Este proyecto explora la optimización dinámica de sistemas Retrieval Augmented Generation (RAG) mediante métodos de Reinforcement Learning (RL). Se aborda la selección automática de hiperparámetros que afectan directamente a la calidad de las respuestas. El problema se modela como un multi-armed bandit (MAB), un algoritmo que aprende iterativamente qué configuraciones (brazos) producen mejores resultados; equilibrando exploración y explotación. La evaluación se basa en un dataset de preguntas y respuestas, SQuAD v2, utilizando la similitud coseno para medir el rendimiento. Los resultados sugieren que ciertas configuraciones pueden adaptarse mejor a determinados conjuntos de preguntas, destacando el potencial de enfoques adaptativos en sistemas RAG.	es-ES
dc.description.abstract	This project explores the dynamic optimization of Retrieval Augmented Generation (RAG) systems through Reinforcement Learning (RL) methods. It addresses the automatic selection of hyperparameters that directly affect the quality of the answers. The problem is modeled as a multi-armed bandit (MAB), an algorithm that iteratively learns which configurations (arms) produce better results; balancing exploration and exploitation. The evaluation is based on a question-answering dataset, SQuAD v2, using cosine similarity to measure performance. The results suggest that certain configurations may adapt better to particular sets of questions, highlighting the potential of adaptive approaches in RAG systems.	en-GB
dc.format.mimetype	application/pdf	es_ES
dc.language.iso	en-GB	es_ES
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	es_ES
dc.subject.other	KMI	es_ES
dc.title	Offline Conservative Q-Learning for Optimizing Parameters of Retrieval-AugmentedGeneration with Large Language Models	es_ES
dc.type	info:eu-repo/semantics/bachelorThesis	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.keywords	RAG; MAB; RL; LLMs; Question Answering (QA); embeddings; clustering	es-ES
dc.keywords	RAG; MAB; RL; LLMs; Question Answering (QA); embeddings; clustering	en-GB
Aparece en las colecciones:	TFG, TFM (temporales)

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
Anexo_I_firmado.pdf	Autorización	718,02 kB	Adobe PDF	Visualizar/Abrir
TFG Jorge Ibinarriaga Robles.pdf	Trabajo Fin de Grado	7,24 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem