Desarrollo de un sistema de análisis estadístico y predictivo del mercado inmobiliario en Madrid mediante técnicas estadísticas y de machine learning

Ibarra González de la Ballina, Javier

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/106918

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.advisor	Gascón González, Alberto	es-ES
dc.contributor.author	Ibarra González de la Ballina, Javier	es-ES
dc.contributor.other	Universidad Pontificia Comillas, Escuela Técnica Superior de Ingeniería (ICAI)	es_ES
dc.date.accessioned	2025-11-04T12:58:19Z	-
dc.date.available	2025-11-04T12:58:19Z	-
dc.date.issued	2026	es_ES
dc.identifier.uri	http://hdl.handle.net/11531/106918	-
dc.description	Grado en Ingeniería en Tecnologías de Telecomunicación	es_ES
dc.description.abstract	Este trabajo presenta el desarrollo de un sistema analítico y predictivo para el mercado inmobiliario de Madrid, diseñado para superar las limitaciones de los modelos tradicionales de tasación basados únicamente en datos tabulares. Para lograrlo, el autor utiliza un conjunto de datos extraído de Idealista con más de 10.000 registros, los cuales son depurados antes de inyectarlos en los diferentes modelos algorítmicos: Regresión Lineal Múltiple, Random Forest y XGBoost. La principal innovación de la metodología radica en la hibridación de algoritmos basados en árboles con Inteligencia Artificial Generativa. Específicamente, se emplea un Modelo de Lenguaje de Gran Tamaño (LLM Gemma 4) para procesar el texto libre de los anuncios, extrayendo características cualitativas e intangibles (como el estado "Reformado" o el nivel de exclusividad) y convirtiéndolas en variables estructuradas mediante Procesamiento de Lenguaje Natural (NLP). Los resultados demuestran que el modelo híbrido definitivo (XGBoost + LLM) supera de manera notable el "techo de información" de los modelos clásicos. Este modelo optimizado alcanza un coeficiente de determinación del 73,38% y reduce el Error Absoluto Medio (MAE) hasta los 1.246,26 €/m², mejorando significativamente las predicciones. Adicionalmente, el proyecto implementa valores SHAP para garantizar la transparencia y la interpretabilidad del sistema matemático. Este análisis de interpretabilidad revela empíricamente que, tras la variable de ubicación correspondiente al Barrio de Salamanca, la característica cualitativa de exclusividad extraída artificialmente por el LLM es la segunda variable con mayor impacto positivo en el precio final de la tasación. En conclusión, el estudio confirma que la integración de algoritmos predictivos avanzados con LLMs logra transformar la información desordenada en datos cuantificables, proporcionando tasaciones mucho más transparentes y coherentes para reducir la incertidumbre en el sector inmobiliario.	es-ES
dc.description.abstract	This project presents the development of an analytical and predictive system for the real estate market in Madrid, designed to overcome the limitations of traditional valuation models based solely on tabular data. To achieve this, the author uses a dataset extracted from Idealista with over 10,000 records, which are cleaned before being injected into the different algorithmic models: Multiple Linear Regression, Random Forest, and XGBoost. The main innovation of the methodology lies in the hybridization of tree-based algorithms with Generative Artificial Intelligence. Specifically, a Large Language Model (LLM Gemma 4) is employed to process the free text of the listings, extracting qualitative and intangible features (such as the "Renovated" status or the level of exclusivity) and converting them into structured variables using Natural Language Processing (NLP). The results demonstrate that the definitive hybrid model (XGBoost + LLM) notably surpasses the "information ceiling" of classical models. This optimized model achieves a coefficient of determination of 73.38% and reduces the Mean Absolute Error (MAE) to 1,246.26 €/m², significantly improving the predictions. Additionally, the project implements SHAP values to ensure the transparency and interpretability of the mathematical system. This interpretability analysis empirically reveals that, after the location variable corresponding to the Salamanca neighborhood, the qualitative feature of exclusivity artificially extracted by the LLM is the second variable with the greatest positive impact on the final valuation price. In conclusion, the study confirms that the integration of advanced predictive algorithms with LLMs succeeds in transforming unstructured information into quantifiable data, providing much more transparent and coherent valuations to reduce uncertainty in the real estate sector.	en-GB
dc.format.mimetype	application/pdf	es_ES
dc.language.iso	es-ES	es_ES
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	es_ES
dc.subject.other	KTT (GITT)	es_ES
dc.title	Desarrollo de un sistema de análisis estadístico y predictivo del mercado inmobiliario en Madrid mediante técnicas estadísticas y de machine learning	es_ES
dc.type	info:eu-repo/semantics/bachelorThesis	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.keywords	Mercado Inmobiliario, Machine Learning, Procesamiento de Lenguaje Natural (NLP), Modelos de Lenguaje de Gran Tamaño (LLM), XGBoost, Interpretabilidad (SHAP).	es-ES
dc.keywords	Real Estate Market, Machine Learning, Natural Language Processing (NLP), Large Language Models (LLM), XGBoost, Interpretability (SHAP).	en-GB
Aparece en las colecciones:	KTT-Trabajos Fin de Grado

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
TFG - Ibarra González de la Ballina, Javier.pdf	Trabajo Fin de Grado	8,8 MB	Adobe PDF	Visualizar/Abrir
Confirmación de Autoría (Anexo I) - Javier Ibarra.pdf	Autorización	168,72 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem