Enhancing keyphrase extraction from long scientific documents using graph embeddings

López López, Álvaro Jesús; Portela González, José

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/100533

Título :	Enhancing keyphrase extraction from long scientific documents using graph embeddings
Autor :	López López, Álvaro Jesús Portela González, José
Fecha de publicación :	1-jul-2025
Resumen :	This study explores the integration of graph neural network (GNN) representations with pre-trained language models (PLMs) to enhance keyphrase extraction (KPE) from lengthy documents. We demonstrate that incorporating graph embeddings into PLMs yields richer semantic representations, especially for long texts. Our approach constructs a co-occurrence graph of the document, which we then embed using a graph convolutional network (GCN) trained for edge prediction. This process captures non-sequential relationships and long-distance dependencies, both of which are often crucial in lengthy documents. We introduce a novel graph-enhanced sequence tagging architecture that combines PLM-based contextual embeddings with GNN-derived representations. Through evaluations on benchmark datasets, our method outperforms state-of-the-art models, showing notable improvements in F1 scores. Beyond performance on standard benchmarks, this approach also holds promise in domains such as legal, medical, and scientific document processing, where efficient handling of long texts is vital. Our findings underscore the potential for GNNs to complement PLMs, helping address both technical and real-world challenges in KPE for long documents.in KPE for long documents
Descripción :	Artículos en revistas
URI :	https:doi.org10.1007s10489-025-06579-y http://hdl.handle.net/11531/100533
ISSN :	0924-669X
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Tamaño	Formato
IIT-25-123R_preview	3,09 kB	Unknown	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem