The MERIT Dataset: Modelling and efficiently rendering interpretable transcripts

de Rodrigo Tobías, Ignacio; Sánchez Cuadrado, Alberto; Boal Martín-Larrauri, Jaime; López López, Álvaro Jesús

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/106393

Título :	The MERIT Dataset: Modelling and efficiently rendering interpretable transcripts
Autor :	de Rodrigo Tobías, Ignacio Sánchez Cuadrado, Alberto Boal Martín-Larrauri, Jaime López López, Álvaro Jesús
Fecha de publicación :	1-abr-2026
Resumen :	This paper introduces the MERIT Dataset, a multimodal, fully labeled dataset of school grade reports. Comprising over 400 labels and 33k samples, the MERIT Dataset is a resource for training models in demanding Visually-rich Document Understanding tasks. It contains multimodal features that link patterns in the textual, visual, and layout domains. The MERIT Dataset also includes biases in a controlled way, making it a valuable tool to benchmark biases induced in Language Models. The paper outlines the dataset’s generation pipeline and highlights its main features and patterns in its different domains. We benchmark the dataset for token classification, showing that it poses a significant challenge even for SOTA models. This paper introduces the MERIT Dataset, a multimodal, fully labeled dataset of school grade reports. Comprising over 400 labels and 33k samples, the MERIT Dataset is a resource for training models in demanding Visually-rich Document Understanding tasks. It contains multimodal features that link patterns in the textual, visual, and layout domains. The MERIT Dataset also includes biases in a controlled way, making it a valuable tool to benchmark biases induced in Language Models. The paper outlines the dataset’s generation pipeline and highlights its main features and patterns in its different domains. We benchmark the dataset for token classification, showing that it poses a significant challenge even for SOTA models.
Descripción :	Artículos en revistas
URI :	https://doi.org/10.1016/j.patcog.2025.112502
ISSN :	0031-3203
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
IIT-25-307R_preprint.pdf		5,08 MB	Adobe PDF	Visualizar/Abrir
IIT-25-307R_preview.pdf		2,77 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem