Desarrollo de un framework de detección de data drifting

Carbonero Álvarez, Pablo

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/62853

Título :	Desarrollo de un framework de detección de data drifting
Autor :	Gascón González, Alberto Velasco Garasa, Miguel Ángel Carbonero Álvarez, Pablo Universidad Pontificia Comillas, Escuela Técnica Superior de Ingeniería (ICAI)
Fecha de publicación :	2022
Resumen :	El machine learning se trata de una rama dentro del campo de la inteligencia artificial y la ingeniería de software que se centra en el uso de datos y algoritmos para imitar la forma en la que actúan los seres humanos. Se crean modelos que pueden ser capaces de predecir o clasificar datos en función de lo aprendido. Hoy en día, estos modelos conviven con un problema latente, el data drift, un fenómeno que puede provocar la perdida de rendimiento y se trata de un cambio en la distribución de los datos de entrada. Este proyecto tiene como fin elaborar un framework que sea capaz de detectar eficazmente el data drift y alertar de cuando esto ocurra al usuario del modelo, pudiendo este lanzar un reentrenamiento del mismo. A través de este sistema se pretende evitar que el modelo pierda rendimiento en el tiempo El framework desarrollado es capaz de trabajar con variables cuantitativas, categóricas y binarias. Ade- más, es capaz de realizar un análisis del data drift en el plano univariante y multivariante, esto es, se analiza el cambio en las distribuciones de los datos por separado, pero también se analiza en su conjunto. Este sistema funciona comparando los datos de entrada al modelo, a través de los diferentes data- chunks con los datos con los que el modelo se entrenó originalmente. Cabe destacar que se utilizan diferentes tipos de test estadísticos dependiendo del caso de uso que se dé en cada momento. De esta forma, el framework logra una detección eficaz para cada caso en concentro Machine learning is a branch within the field of artificial intelligence and software engineering that focuses on the use of data and algorithms to mimic the way humans act. Models are created in such way that they are able to predict or classify data in function of what has been learned. Today, these models coexist with a latent problem, data drift, a phenomenon that can cause a loss of performance. Data drift is the change of the distribution of the input data. This project develop a framework that is capable of effectively detecting data drift and alerting when this happens to the user of the model, being able to launch a retraining of the same. This system is intended to prevent the model from losing performance in the time The developed framework is capable of working with quantitative, categorical and binary variables. Furthermore, it is capable of performing an analysis of the data drift in the univariate and multivariate plane , so it's possible to track the change in the distributions of the data that is been analyzed separately, but also analyzed it as a whole. This system works comparing the input data to the model, through the different data- chunks with the data the model was originally trained on. It should be noted that different types of statistical tests are used depending on of the use case that occurs at each moment. In this way, the framework achieves an effective detection for each case in concentration
Descripción :	Grado en Ingeniería en Tecnologías Industriales
URI :	http://hdl.handle.net/11531/62853
Aparece en las colecciones:	KTI-Trabajos Fin de Grado

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
TFG_CarboneroAlvarez, Pablo.pdf	Trabajo Fin de Grado	4,25 MB	Adobe PDF	Visualizar/Abrir
anexo_I(bueno).pdf	Autorización	297,24 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem