Stopping LLMs from Going Rogue: A Control Barrier Approach to Text Generation

Silvestre, Joao Pedro; Rodríguez Abella, Álvaro; Tabuada, Paulo

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/110126

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Silvestre, Joao Pedro	es-ES
dc.contributor.author	Rodríguez Abella, Álvaro	es-ES
dc.contributor.author	Tabuada, Paulo	es-ES
dc.date.accessioned	2026-05-18T14:34:16Z	-
dc.date.available	2026-05-18T14:34:16Z	-
dc.date.issued	2025-12-25	es_ES
dc.identifier.uri	10.1109/CDC57313.2025.11312450	es_ES
dc.description	Capítulos en libros	es_ES
dc.description.abstract	.	es-ES
dc.description.abstract	The rapid integration of large language models (LLMs) into our everyday lives has outpaced safety considerations aimed at protecting users from toxic outputs and preventing malicious actors from generating harmful text at scale. As a result, LLMs have been exploited by bots capable of producing vast amounts of harmful and toxic content, enabling users to manipulate online opinions and, in some cases, create dangerous online environments.Our work addresses this issue by developing a framework for designing safety filters that preclude toxic outputs. To achieve this, we leverage Control Barrier Functions (CBFs) which enable the design of closed-loop systems that remain safe. We consider the continuous-time model of an LLM, where tokens are regarded as the state of the model, and prove that by only controlling the first token, any function satisfying mild assumptions becomes a CBF. Our approach can be utilized to design LLMs capable of ensuring safety of its outputs without significantly affecting the original model’s behavior.	en-GB
dc.format.mimetype	application/pdf	es_ES
dc.language.iso	en-GB	es_ES
dc.publisher	IEEE (Río de Janeiro, Brasil)	es_ES
dc.rights	Creative Commons Reconocimiento-NoComercial-SinObraDerivada España	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	es_ES
dc.source	Libro: 2025 IEEE 64th Conference on Decision and Control (CDC), Página inicial: ., Página final: .	es_ES
dc.title	Stopping LLMs from Going Rogue: A Control Barrier Approach to Text Generation	es_ES
dc.type	info:eu-repo/semantics/bookPart	es_ES
dc.description.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.rights.holder		es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.keywords	.	es-ES
dc.keywords	.	en-GB
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Tamaño	Formato
Stopping_LLMs_from_Going_Rogue_unido.pdf	402,68 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem