Semantic Scene Understanding with Large Language Models on  Unmanned Aerial Vehicles

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/87977

Título :	Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles
Autor :	de Curtò i Díaz, Joaquim de Zarzà i Cubero, Irene Calafate, Carlos T.
Fecha de publicación :	8-feb-2023
Resumen :	. Unmanned Aerial Vehicles (UAVs) are able to provide instantaneous visual cues and a high-level data throughput that could be further leveraged to address complex tasks, such as semantically rich scene understanding. In this work, we built on the use of Large Language Models (LLMs) and Visual Language Models (VLMs), together with a state-of-the-art detection pipeline, to provide thorough zero-shot UAV scene literary text descriptions. The generated texts achieve a GUNNING Fog median grade level in the range of 7–12. Applications of this framework could be found in the filming industry and could enhance user experience in theme parks or in the advertisement sector. We demonstrate a low-cost highly efficient state-of-the-art practical implementation of microdrones in a well-controlled and challenging setting, in addition to proposing the use of standardized readability metrics to assess LLM-enhanced descriptions.
Descripción :	Artículos en revistas
URI :	https://doi.org/10.3390/drones7020114 http://hdl.handle.net/11531/87977
ISSN :	2504-446X
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
2024210181510259_drones-07-00114_decurto.pdf		8,81 MB	Adobe PDF	Visualizar/Abrir