Show simple item record

dc.contributor.authorSilvestre, Joao Pedroes-ES
dc.contributor.authorRodríguez Abella, Álvaroes-ES
dc.contributor.authorTabuada, Pauloes-ES
dc.date.accessioned2026-05-18T14:34:16Z
dc.date.available2026-05-18T14:34:16Z
dc.date.issued2025-12-25es_ES
dc.identifier.uri10.1109/CDC57313.2025.11312450es_ES
dc.descriptionCapítulos en libroses_ES
dc.description.abstract.es-ES
dc.description.abstractThe rapid integration of large language models (LLMs) into our everyday lives has outpaced safety considerations aimed at protecting users from toxic outputs and preventing malicious actors from generating harmful text at scale. As a result, LLMs have been exploited by bots capable of producing vast amounts of harmful and toxic content, enabling users to manipulate online opinions and, in some cases, create dangerous online environments.Our work addresses this issue by developing a framework for designing safety filters that preclude toxic outputs. To achieve this, we leverage Control Barrier Functions (CBFs) which enable the design of closed-loop systems that remain safe. We consider the continuous-time model of an LLM, where tokens are regarded as the state of the model, and prove that by only controlling the first token, any function satisfying mild assumptions becomes a CBF. Our approach can be utilized to design LLMs capable of ensuring safety of its outputs without significantly affecting the original model’s behavior.en-GB
dc.format.mimetypeapplication/pdfes_ES
dc.language.isoen-GBes_ES
dc.publisherIEEE (Río de Janeiro, Brasil)es_ES
dc.rightsCreative Commons Reconocimiento-NoComercial-SinObraDerivada Españaes_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/es_ES
dc.sourceLibro: 2025 IEEE 64th Conference on Decision and Control (CDC), Página inicial: ., Página final: .es_ES
dc.titleStopping LLMs from Going Rogue: A Control Barrier Approach to Text Generationes_ES
dc.typeinfo:eu-repo/semantics/bookPartes_ES
dc.description.versioninfo:eu-repo/semantics/publishedVersiones_ES
dc.rights.holderes_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.keywords.es-ES
dc.keywords.en-GB


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Artículos
    Artículos de revista, capítulos de libro y contribuciones en congresos publicadas.

Show simple item record

Creative Commons Reconocimiento-NoComercial-SinObraDerivada España
Except where otherwise noted, this item's license is described as Creative Commons Reconocimiento-NoComercial-SinObraDerivada España