A Comparative Study of Large Language Models for Industrial Cyber-Physical Security

de Curtò i Díaz, Joaquim; de Zarzà i Cubero, Irene; Cano, Juan Carlos; Calafate, Carlos T.

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/110867

Título :	A Comparative Study of Large Language Models for Industrial Cyber-Physical Security
Autor :	de Curtò i Díaz, Joaquim de Zarzà i Cubero, Irene Cano, Juan Carlos Calafate, Carlos T.
Fecha de publicación :	24-jun-2026
Resumen :	. Intrusion detection in industrial cyber-physical systems is constrained by small labelled-attack corpora and by the subtler signal of physical-process attacks compared with classical IT-network intrusions, motivating renewed interest in foundation-model-based detectors; classical detectors are typically trained per dataset and degrade under the distribution shift that is common in operational technology, where attack repertoires evolve faster than retraining cycles. Two foundation-model families are now plausible candidates: open-source Large Language Models (LLMs) and recent tabular foundation models (TabPFN, TabICL) pre-trained for in-context tabular inference. We compare the two families head-to-head, alongside Random Forest and XGBoost classical anchors, across three established industrial security benchmarks (SWaT, HAI, WUSTL-IIoT-2021) under a controlled multi-seed full-holdout protocol with paired McNemar and cross-seed Mann–Whitney tests. The empirical picture is dataset-dependent rather than universal: tabular foundation models establish a strong, previously unreported baseline that is competitive with or superior to classical anchors on every dataset evaluated, while LLMs are complementary detectors with a specific advantage on schemas that carry process-engineering semantics (such as SWaT’s named sensor channels). A per-class analysis on the WUSTL five-class attack taxonomy shows that the two families have structurally different strengths: tabular methods dominate traffic-rich attacks (Denial-of-Service, Reconnaissance), whereas LLMs are competitive on rare attack types (Backdoor, Command Injection). A confidence-gated cascade that escalates only low-confidence tabular decisions to an LLM exceeds either detector alone at a small query budget, and a leave-one-attack-type-out analysis shows that foundation-model detectors generalise to unseen attack families substantially better than the classical anchors. The appropriate detector choice in industrial cyber-physical security is therefore informed by the dataset’s feature schema, the attack-type mix, and the operational cost envelope, rather than by a specific performance metric.
Descripción :	Artículos en revistas
URI :	https://doi.org/10.3390/electronics15132779 (registering DOI)
ISSN :	2079-9292
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Tamaño	Formato
electronics-15-02779_deCurto_and_deZarza.pdf	2,88 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem