Raman Spectroscopy Pre-Trained Encoder: A Self-Supervised Learning Approach For Data-Efficient Domain-Independent Spectroscopy Analysis

Eranti, Abhiraam; Tewari, Yogesh; Palacios Hielscher, Rafael; Gupta, Amar

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/109102

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Eranti, Abhiraam	es-ES
dc.contributor.author	Tewari, Yogesh	es-ES
dc.contributor.author	Palacios Hielscher, Rafael	es-ES
dc.contributor.author	Gupta, Amar	es-ES
dc.date.accessioned	2026-03-12T11:18:17Z	-
dc.date.available	2026-03-12T11:18:17Z	-
dc.date.issued	2026-03-09	es_ES
dc.identifier.issn	2169-3536	es_ES
dc.identifier.uri	https://doi.org/10.1109/ACCESS.2026.3672109	es_ES
dc.identifier.uri	http://hdl.handle.net/11531/109102	-
dc.description	Artículos en revistas	es_ES
dc.description.abstract	Deep-learning methods have boosted the analytical power of Raman spectroscopy, yet they still require large, task-specific, labeled datasets and often fail to transfer across application domains. The study explores pre-trained encoders as a solution. Pre-trained encoders have significantly impacted Natural Language Processing and Computer Vision with their ability to learn transferable representations that can be applied to a variety of datasets, significantly reducing the amount of time and data required to create capable models. The following work puts forward a new approach that applies these benefits to Raman Spectroscopy. The proposed approach, RSPTE (Raman Spectroscopy Pre-Trained Encoder), is designed to learn generalizable spectral representations without labels. RSPTE employs a novel domain adaptation strategy using unsupervised Barlow Twins decorrelation objectives to learn fundamental spectral patterns from multi-domain Raman Spectroscopy datasets containing samples from medicine, biology, and mineralogy. Transferability is demonstrated through evaluation on several models created by fine-tuning RSPTE for different application domains: Medicine (detection of Melanoma and COVID), Biology (Pathogen Identification), and Agriculture. As an example, using only 20% of the dataset, models trained with RSPTE achieve accuracies ranging 50%–86% (depending on the dataset used) while without RSPTE the range is 9%–57%. Using the full dataset, accuracies with RSPTE range 81%–97%, and without pretraining 51%–97%. Current methods and state-of-the-art models in Raman Spectroscopy are compared to RSPTE for context, and RSPTE exhibits competitive results, especially with less data as well. These results provide evidence that the proposed RSPTE model can effectively learn and transfer generalizable spectral features across different domains, achieving accurate results with less data in less time (both data collection time and training time).	es-ES
dc.description.abstract	Deep-learning methods have boosted the analytical power of Raman spectroscopy, yet they still require large, task-specific, labeled datasets and often fail to transfer across application domains. The study explores pre-trained encoders as a solution. Pre-trained encoders have significantly impacted Natural Language Processing and Computer Vision with their ability to learn transferable representations that can be applied to a variety of datasets, significantly reducing the amount of time and data required to create capable models. The following work puts forward a new approach that applies these benefits to Raman Spectroscopy. The proposed approach, RSPTE (Raman Spectroscopy Pre-Trained Encoder), is designed to learn generalizable spectral representations without labels. RSPTE employs a novel domain adaptation strategy using unsupervised Barlow Twins decorrelation objectives to learn fundamental spectral patterns from multi-domain Raman Spectroscopy datasets containing samples from medicine, biology, and mineralogy. Transferability is demonstrated through evaluation on several models created by fine-tuning RSPTE for different application domains: Medicine (detection of Melanoma and COVID), Biology (Pathogen Identification), and Agriculture. As an example, using only 20% of the dataset, models trained with RSPTE achieve accuracies ranging 50%–86% (depending on the dataset used) while without RSPTE the range is 9%–57%. Using the full dataset, accuracies with RSPTE range 81%–97%, and without pretraining 51%–97%. Current methods and state-of-the-art models in Raman Spectroscopy are compared to RSPTE for context, and RSPTE exhibits competitive results, especially with less data as well. These results provide evidence that the proposed RSPTE model can effectively learn and transfer generalizable spectral features across different domains, achieving accurate results with less data in less time (both data collection time and training time).	en-GB
dc.language.iso	en-GB	es_ES
dc.source	Revista: IEEE Access, Periodo: 1, Volumen: En imprenta, Número: , Página inicial: 0, Página final: 0	es_ES
dc.subject.other	Instituto de Investigación Tecnológica (IIT)	es_ES
dc.title	Raman Spectroscopy Pre-Trained Encoder: A Self-Supervised Learning Approach For Data-Efficient Domain-Independent Spectroscopy Analysis	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.description.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.rights.holder		es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.keywords	Raman Spectroscopy, self-supervised learning, pre-trained encoder, multi-domain data, clinical diagnostics	es-ES
dc.keywords	Raman Spectroscopy, self-supervised learning, pre-trained encoder, multi-domain data, clinical diagnostics	en-GB
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
IIT-26-066R.pdf		12,34 MB	Adobe PDF	Visualizar/Abrir
IIT-26-066R_preview.pdf		3,97 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem