Optimizing Neural Networks for Imbalanced Data

de Zarzà i Cubero, Irene; de Curtò i Díaz, Joaquim; Calafate, Carlos T.

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/87976

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	de Zarzà i Cubero, Irene	es-ES
dc.contributor.author	de Curtò i Díaz, Joaquim	es-ES
dc.contributor.author	Calafate, Carlos T.	es-ES
dc.date.accessioned	2024-04-04T14:35:54Z	-
dc.date.available	2024-04-04T14:35:54Z	-
dc.date.issued	2023-06-14	es_ES
dc.identifier.issn	2079-9292	es_ES
dc.identifier.uri	https://doi.org/10.3390/electronics12122674	es_ES
dc.description	Artículos en revistas	es_ES
dc.description.abstract	.	es-ES
dc.description.abstract	mbalanced datasets pose pervasive challenges in numerous machine learning (ML) applications, notably in areas such as fraud detection, where fraudulent cases are vastly outnumbered by legitimate transactions. Conventional ML methods often grapple with such imbalances, resulting in models with suboptimal performance concerning the minority class. This study undertakes a thorough examination of strategies for optimizing supervised learning algorithms when confronted with imbalanced datasets, emphasizing resampling techniques. Initially, we explore multiple methodologies, encompassing Gaussian Naive Bayes, linear and quadratic discriminant analysis, K-nearest neighbors (K-NN), support vector machines (SVMs), decision trees, and multi-layer perceptron (MLP). We apply these on a four-class spiral dataset, a notoriously demanding non-linear classification problem, to gauge their effectiveness. Subsequently, we leverage the garnered insights for a real-world credit card fraud detection task on a public dataset, where we achieve a compelling accuracy of 99.937%. In this context, we compare and contrast the performances of undersampling, oversampling, and the synthetic minority oversampling technique (SMOTE). Our findings highlight the potency of resampling strategies in augmenting model performance on the minority class; in particular, oversampling techniques achieve the best performance, resulting in an accuracy of 99.928% with a significantly low number of false negatives (21/227,451).	en-GB
dc.format.mimetype	application/pdf	es_ES
dc.language.iso	en-GB	es_ES
dc.rights	Creative Commons Reconocimiento-NoComercial-SinObraDerivada España	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	es_ES
dc.source	Revista: Electronics, Periodo: 1, Volumen: 12, Número: 12, Página inicial: 2674, Página final: .	es_ES
dc.title	Optimizing Neural Networks for Imbalanced Data	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.description.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.rights.holder		es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.keywords	.	es-ES
dc.keywords	neural networks; imbalanced datasets; resampling techniques; fraud detection; hyperparameter optimization	en-GB
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
electronics-12-02674-v2_dezarza_and_decurto.pdf		1,92 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem