Quantitative fine-grained human evaluation of machine translation systems: a case study on English to Croatian

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/139898
Información del item - Informació de l'item - Item information
Título: Quantitative fine-grained human evaluation of machine translation systems: a case study on English to Croatian
Autor/es: Klubička, Filip | Toral, Antonio | Sánchez-Cartagena, Víctor M.
Grupo/s de investigación o GITE: Transducens
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: Neural machine translation | Statistical machine translation | Phrase-based machine translation | Factored models | Human evaluation | Error annotation | Multidimensional quality metrics (MQM)
Fecha de publicación: 10-feb-2018
Editor: Springer Nature
Cita bibliográfica: Machine Translation. 2018, 32: 195-215. https://doi.org/10.1007/s10590-018-9214-x
Resumen: This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established multidimensional quality metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant. We conduct a case study for English-to-Croatian, a language direction that involves translating into a morphologically rich language, for which we compare three MT systems belonging to different paradigms: pure phrase-based, factored phrase-based and neural. First, we design an MQM-compliant error taxonomy tailored to the relevant linguistic phenomena of Slavic languages, which made the annotation process feasible and accurate. Errors in MT outputs were then annotated by two annotators following this taxonomy. Subsequently, we carried out a statistical analysis which showed that the best-performing system (neural) reduces the errors produced by the worst system (pure phrase-based) by more than half (54%). Moreover, we conducted an additional analysis of agreement errors in which we distinguished between short (phrase-level) and long distance (sentence-level) errors. We discovered that phrase-based MT approaches are of limited use for long distance agreement phenomena, for which neural MT was found to be especially effective.
Patrocinador/es: This research was partly funded by the ADAPT Centre, which is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund. This research has also received funding from the European Union Seventh Framework Programme FP7/2007-2013 under Grant agreement PIAP-GA-2012-324414 (Abu-MaTran) and the Swiss National Science Foundation Grant 74Z0_160501 (ReLDI).
URI: http://hdl.handle.net/10045/139898
ISSN: 0922-6567 (Print) | 1573-0573 (Online)
DOI: 10.1007/s10590-018-9214-x
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: © Springer Science+Business Media B.V., part of Springer Nature 2018
Revisión científica: si
Versión del editor: https://doi.org/10.1007/s10590-018-9214-x
Aparece en las colecciones:INV - TRANSDUCENS - Artículos de Revistas
Investigaciones financiadas por la UE

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailKlubicka_etal_2018_MachineTranslation_final.pdfVersión final (acceso restringido)692,39 kBAdobe PDFAbrir    Solicitar una copia
ThumbnailKlubicka_etal_2018_MachineTranslation_accepted.pdfAccepted Manuscript (acceso abierto)359,29 kBAdobe PDFAbrir Vista previa


Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.