Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/11531/111097Registro completo de metadatos
| Campo DC | Valor | Lengua/Idioma |
|---|---|---|
| dc.contributor.author | García Martínez-Echevarría, Victoria | es-ES |
| dc.contributor.author | Palacios Hielscher, Rafael | es-ES |
| dc.contributor.author | López López, Gregorio | es-ES |
| dc.contributor.author | Gupta, Amar | es-ES |
| dc.date.accessioned | 2026-07-02T04:32:29Z | - |
| dc.date.available | 2026-07-02T04:32:29Z | - |
| dc.date.issued | 2026-07-01 | es_ES |
| dc.identifier.issn | 2079-9292 | es_ES |
| dc.identifier.uri | https://doi.org/10.3390/electronics15132846 | es_ES |
| dc.identifier.uri | http://hdl.handle.net/11531/111097 | - |
| dc.description | Artículos en revistas | es_ES |
| dc.description.abstract | Voice phishing, commonly known as vishing, has become one of the fastest-growing threats in social engineering. The rapid advancement and accessibility of AI voice cloning tools have enabled attackers to produce highly convincing synthetic speech at minimal cost, driving a sharp increase in impersonation fraud. Accordingly, automatic detection of synthetic voices could contribute, as one component of a broader defense, to mitigating vishing attacks. This paper studies the automatic detection of AI-generated speech, with a particular focus on how well such detectors generalize beyond their training data to modern, unseen synthesis methods. Two detection approaches are evaluated: a Residual CNN (convolutional neural network) trained as a binary classifier on three different time–frequency representations and a one-class learning strategy with a ResNet-18 backbone, yielding four models in total. Models were trained on the well-known ASVspoof 2019 Logical Access dataset and tested on its standard partitions. Then, models were tested on the SONAR benchmark, which gathers voices generated with state-of-the-art synthesis techniques unseen during training. Experimental results show that, on the modern systems gathered in SONAR, all four configurations fall close to chance. The LFCC one-class detector generalizes comparatively best, but the apparently higher accuracy of some models reflects a tendency to label most speech as spoofed. These findings indicate that the evaluated detectors can provide, at most, a partial security layer against vishing driven by current and emerging speech-synthesis technologies, although continuous model updates are recommended. | es-ES |
| dc.description.abstract | Voice phishing, commonly known as vishing, has become one of the fastest-growing threats in social engineering. The rapid advancement and accessibility of AI voice cloning tools have enabled attackers to produce highly convincing synthetic speech at minimal cost, driving a sharp increase in impersonation fraud. Accordingly, automatic detection of synthetic voices could contribute, as one component of a broader defense, to mitigating vishing attacks. This paper studies the automatic detection of AI-generated speech, with a particular focus on how well such detectors generalize beyond their training data to modern, unseen synthesis methods. Two detection approaches are evaluated: a Residual CNN (convolutional neural network) trained as a binary classifier on three different time–frequency representations and a one-class learning strategy with a ResNet-18 backbone, yielding four models in total. Models were trained on the well-known ASVspoof 2019 Logical Access dataset and tested on its standard partitions. Then, models were tested on the SONAR benchmark, which gathers voices generated with state-of-the-art synthesis techniques unseen during training. Experimental results show that, on the modern systems gathered in SONAR, all four configurations fall close to chance. The LFCC one-class detector generalizes comparatively best, but the apparently higher accuracy of some models reflects a tendency to label most speech as spoofed. These findings indicate that the evaluated detectors can provide, at most, a partial security layer against vishing driven by current and emerging speech-synthesis technologies, although continuous model updates are recommended. | en-GB |
| dc.language.iso | en-GB | es_ES |
| dc.source | Revista: Electronics, Periodo: 1, Volumen: online, Número: 13, Página inicial: 2846, Página final: 0 | es_ES |
| dc.subject.other | Instituto de Investigación Tecnológica (IIT) | es_ES |
| dc.title | The Generalization Gap: Do Audio Deepfake Detectors Actually Protect Against Modern Vishing? | es_ES |
| dc.type | info:eu-repo/semantics/article | es_ES |
| dc.description.version | info:eu-repo/semantics/publishedVersion | es_ES |
| dc.rights.holder | es_ES | |
| dc.rights.accessRights | info:eu-repo/semantics/openAccess | es_ES |
| dc.keywords | AI-generated speech; spoofing detection; residual CNN (convolutional neural network); one-class learning; generalization; vishing | es-ES |
| dc.keywords | AI-generated speech; spoofing detection; residual CNN (convolutional neural network); one-class learning; generalization; vishing | en-GB |
| Aparece en las colecciones: | Artículos | |
Ficheros en este ítem:
| Fichero | Descripción | Tamaño | Formato | |
|---|---|---|---|---|
| IIT-26-200R.pdf | 1,21 MB | Adobe PDF | Visualizar/Abrir | |
| IIT-26-200R_preview.pdf | 3,82 kB | Adobe PDF | Visualizar/Abrir |
Los ítems de DSpace están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.