Show simple item record

dc.contributor.authorde Zarzà i Cubero, Irenees-ES
dc.contributor.authorLiz, Mauroes-ES
dc.contributor.authorde Curtò i Díaz, Joaquimes-ES
dc.contributor.authorCalafate, Carlos T.es-ES
dc.date.accessioned2026-04-07T07:39:25Z
dc.date.available2026-04-07T07:39:25Z
dc.date.issued2026-03-27es_ES
dc.identifier.issn2079-9292es_ES
dc.identifier.urihttps://doi.org/10.3390/electronics15071395es_ES
dc.identifier.urihttp://hdl.handle.net/11531/109444
dc.descriptionArtículos en revistases_ES
dc.description.abstract.es-ES
dc.description.abstractThe rapid deployment of Large Language Models (LLMs) in multilingual, production-scale systems has made inference-time energy consumption a critical yet systematically under-evaluated dimension of model quality. While accuracy-centric benchmarks dominate current evaluation practice, they fail to capture the energy cost of reasoning, particularly across languages and task complexities where consumption profiles diverge substantially. In this work, we present a comprehensive energy–performance evaluation of five instruction-tuned LLMs, spanning Transformer, Grouped-Query Attention, and State Space Model architectures, across thirteen typologically diverse languages and multiple task difficulty levels under controlled GPU-level energy measurement on NVIDIA H200 hardware. Our analysis encompasses 65 model–language configurations totaling over 5100 individual inference runs, supported by rigorous non-parametric statistical testing (Friedman tests, pairwise Wilcoxon signed-rank with Holm correction, and paired Cohen’s d effect sizes). We report four principal findings. First, energy consumption varies up to threefold across models under identical workloads (𝜒2=49.42 , 𝑝=4.78×10−10 , Friedman test), stratifying into three distinct energy regimes driven by architecture and generation dynamics rather than parameter count. Second, energy expenditure and reasoning performance are only weakly coupled, as confirmed by Spearman rank correlation analysis (𝑟𝑠=0.109 , 𝑝=0.386 ). Third, task category and difficulty level introduce substantial and model-dependent variation in both energy demand and performance, with cross-lingual performance variance amplifying at higher difficulty levels. Fourth, language choice acts as a measurable deployment parameter as follows: Romance languages on average achieve lower energy consumption than English across multiple models, while model efficiency rankings shift across languages, yielding language-dependent Pareto-optimal frontiers. We formalize these trade-offs through multi-objective Pareto analysis and introduce a composite AI Energy Score metric that captures reasoning quality per unit of energy. Of the 65 evaluated configurations, only four are Pareto-optimal, three Mistral-7B configurations at the low-energy extreme and one Phi-4-mini-instruct configuration at the high-performance end, while three of the five models are entirely dominated across all language configurations. These findings provide actionable guidelines for energy-aware model selection in multilingual deployments and support the integration of AI Energy Scores as a standard complementary criterion in LLM evaluation frameworks.en-GB
dc.format.mimetypeapplication/pdfes_ES
dc.language.isoen-GBes_ES
dc.rightsCreative Commons Reconocimiento-NoComercial-SinObraDerivada Españaes_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/es_ES
dc.sourceRevista: Electronics, Periodo: 1, Volumen: 15(7), Número: 1395, Página inicial: 1, Página final: 44es_ES
dc.titleEnergy-Aware Multilingual Evaluation of Large Language Modelses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.description.versioninfo:eu-repo/semantics/publishedVersiones_ES
dc.rights.holderes_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.keywords.es-ES
dc.keywordslarge language models; energy efficiency; multilingual evaluation; sustainable AI; GPU energy consumption; AI energy scoresen-GB


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • Artículos
    Artículos de revista, capítulos de libro y contribuciones en congresos publicadas.

Show simple item record

Creative Commons Reconocimiento-NoComercial-SinObraDerivada España
Except where otherwise noted, this item's license is described as Creative Commons Reconocimiento-NoComercial-SinObraDerivada España