Semantic Scene Understanding with Large Language Models on  Unmanned Aerial Vehicles

de Curtò i Díaz, Joaquim; de Zarzà i Cubero, Irene; Calafate, Carlos T.

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/87977

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	de Curtò i Díaz, Joaquim	es-ES
dc.contributor.author	de Zarzà i Cubero, Irene	es-ES
dc.contributor.author	Calafate, Carlos T.	es-ES
dc.date.accessioned	2024-04-04T14:48:32Z	-
dc.date.available	2024-04-04T14:48:32Z	-
dc.date.issued	2023-02-08	es_ES
dc.identifier.issn	2504-446X	es_ES
dc.identifier.uri	https://doi.org/10.3390/drones7020114	es_ES
dc.identifier.uri	http://hdl.handle.net/11531/87977	-
dc.description	Artículos en revistas	es_ES
dc.description.abstract	.	es-ES
dc.description.abstract	Unmanned Aerial Vehicles (UAVs) are able to provide instantaneous visual cues and a high-level data throughput that could be further leveraged to address complex tasks, such as semantically rich scene understanding. In this work, we built on the use of Large Language Models (LLMs) and Visual Language Models (VLMs), together with a state-of-the-art detection pipeline, to provide thorough zero-shot UAV scene literary text descriptions. The generated texts achieve a GUNNING Fog median grade level in the range of 7–12. Applications of this framework could be found in the filming industry and could enhance user experience in theme parks or in the advertisement sector. We demonstrate a low-cost highly efficient state-of-the-art practical implementation of microdrones in a well-controlled and challenging setting, in addition to proposing the use of standardized readability metrics to assess LLM-enhanced descriptions.	en-GB
dc.format.mimetype	application/pdf	es_ES
dc.language.iso	en-GB	es_ES
dc.rights	Creative Commons Reconocimiento-NoComercial-SinObraDerivada España	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	es_ES
dc.source	Revista: Drones, Periodo: 1, Volumen: 7, Número: 2, Página inicial: 114, Página final: .	es_ES
dc.title	Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.description.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.rights.holder		es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.keywords	.	es-ES
dc.keywords	scene understanding; large language models; visual language models; CLIP; GPT-3; YOLOv7; UAV	en-GB
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
2024210181510259_drones-07-00114_decurto.pdf		8,81 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem