Mostrar el registro sencillo del ítem

dc.contributor.authorde Curtò i Díaz, Joaquimes-ES
dc.contributor.authorde Zarzà i Cubero, Irenees-ES
dc.contributor.authorCalafate, Carlos T.es-ES
dc.date.accessioned2024-04-04T14:48:32Z
dc.date.available2024-04-04T14:48:32Z
dc.date.issued2023-02-08es_ES
dc.identifier.issn2504-446Xes_ES
dc.identifier.urihttps://doi.org/10.3390/drones7020114es_ES
dc.identifier.urihttp://hdl.handle.net/11531/87977
dc.descriptionArtículos en revistases_ES
dc.description.abstract.es-ES
dc.description.abstractUnmanned Aerial Vehicles (UAVs) are able to provide instantaneous visual cues and a high-level data throughput that could be further leveraged to address complex tasks, such as semantically rich scene understanding. In this work, we built on the use of Large Language Models (LLMs) and Visual Language Models (VLMs), together with a state-of-the-art detection pipeline, to provide thorough zero-shot UAV scene literary text descriptions. The generated texts achieve a GUNNING Fog median grade level in the range of 7–12. Applications of this framework could be found in the filming industry and could enhance user experience in theme parks or in the advertisement sector. We demonstrate a low-cost highly efficient state-of-the-art practical implementation of microdrones in a well-controlled and challenging setting, in addition to proposing the use of standardized readability metrics to assess LLM-enhanced descriptions.en-GB
dc.format.mimetypeapplication/pdfes_ES
dc.language.isoen-GBes_ES
dc.rightsCreative Commons Reconocimiento-NoComercial-SinObraDerivada Españaes_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/es_ES
dc.sourceRevista: Drones, Periodo: 1, Volumen: 7, Número: 2, Página inicial: 114, Página final: .es_ES
dc.titleSemantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicleses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.description.versioninfo:eu-repo/semantics/publishedVersiones_ES
dc.rights.holderes_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.keywords.es-ES
dc.keywordsscene understanding; large language models; visual language models; CLIP; GPT-3; YOLOv7; UAVen-GB


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

  • Artículos
    Artículos de revista, capítulos de libro y contribuciones en congresos publicadas.

Mostrar el registro sencillo del ítem

Creative Commons Reconocimiento-NoComercial-SinObraDerivada España
Excepto si se señala otra cosa, la licencia del ítem se describe como Creative Commons Reconocimiento-NoComercial-SinObraDerivada España