Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/11531/99668
Título : Characterization of Institutional Texts for an Automated Golden Standard: Enhancing Machine Translation Quality Assessment between English and Spanish
Autor : Romana García, María Luisa
Hernández Pardo, Blanca
Fecha de publicación : 6-jul-2024
Resumen : .
The purpose of this paper is to collect a set of features that can contribute to the linguistic characterization of the institutional textual genre. The aim is to describe as exhaustively as possible the archetypal text to be obtained as a target text in this type of specialized translation. The tools used were Orange Data Mining© and Google Colab (Python code), and the data was obtained using the following processing mechanisms: word cloud, text preprocessing (cleaning, tokenization, normalization, lemmatization and PoS annotation). With these tools, lexical and grammatical frequencies, lexical and documentary embeddings, cosine distances, hierarchical clustering, and 20-component dimensionality reduction (t-SNE) were extracted. As a result, a series of useful descriptive parameters have been obtained for the characterization of model texts for economic translation of institutional domains into Spain Spanish: lexical and terminological density, phraseological and terminological lexicalizations, grammatical frequencies, and semantic maps. In conclusion, the study provides several quantifiable features that characterize the analyzed register and opens the way for further research to deepen these parameters and develop the research by searching for complementary parameters until a complete and exhaustive picture of the reference model in this genre is obtained.
Descripción : Presentación en congreso
URI : http://hdl.handle.net/11531/99668
Aparece en las colecciones: Artículos

Ficheros en este ítem:
Fichero Tamaño Formato  
202562416378822_NeTTT Proceedings_Romana-Hernandez.pdf2,32 MBAdobe PDFVisualizar/Abrir     Request a copy


Los ítems de DSpace están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.