Characterization of Institutional Texts for an Automated Golden Standard: Enhancing Machine Translation Quality Assessment between English and Spanish
Abstract
. The purpose of this paper is to collect a set of features that can contribute to the linguistic characterization of the institutional textual genre. The aim
is to describe as exhaustively as possible the archetypal text to be obtained as a
target text in this type of specialized translation. The tools used were Orange Data
Mining© and Google Colab (Python code), and the data was obtained using the
following processing mechanisms: word cloud, text preprocessing (cleaning, tokenization, normalization, lemmatization and PoS annotation). With these tools,
lexical and grammatical frequencies, lexical and documentary embeddings, cosine distances, hierarchical clustering, and 20-component dimensionality reduction (t-SNE) were extracted.
As a result, a series of useful descriptive parameters have been obtained for the
characterization of model texts for economic translation of institutional domains
into Spain Spanish: lexical and terminological density, phraseological and terminological lexicalizations, grammatical frequencies, and semantic maps. In conclusion, the study provides several quantifiable features that characterize the analyzed register and opens the way for further research to deepen these parameters
and develop the research by searching for complementary parameters until a
complete and exhaustive picture of the reference model in this genre is obtained.
Characterization of Institutional Texts for an Automated
Golden Standard: Enhancing Machine Translation
Quality Assessment between English and Spanish
Tipo de Actividad
Presentación en congresoPalabras Clave
.Machine Translation, Golden Standard, Translation Quality Assessment, Specialized Translation, AI Processing.