Creating the best development corpus for Statistical Machine Translation systems

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/76033
Información del item - Informació de l'item - Item information
Título: Creating the best development corpus for Statistical Machine Translation systems
Autor/es: Chinea-Rios, Mara | Sanchis-Trilles, Germán | Casacuberta, Francisco
Palabras clave: Machine Translation
Área/s de conocimiento: Lenguajes y Sistemas Informáticos
Fecha de publicación: 2018
Editor: European Association for Machine Translation
Cita bibliográfica: Chinea-Rios, Mara; Sanchis-Trilles, Germán; Casacuberta, Francisco. “Creating the best development corpus for Statistical Machine Translation systems”. In: Pérez-Ortiz, Juan Antonio, et al. (Eds.). Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain, pp. 99-108
Resumen: We propose and study three different novel approaches for tackling the problem of development set selection in Statistical Machine Translation. We focus on a scenario where a machine translation system is leveraged for translating a specific test set, without further data from the domain at hand. Such test set stems from a real application of machine translation, where the texts of a specific e-commerce were to be translated. For developing our development-set selection techniques, we first conducted experiments in a controlled scenario, where labelled data from different domains was available, and evaluated the techniques both with classification and translation quality metrics. Then, the best-performing techniques were evaluated on the e-commerce data at hand, yielding consistent improvements across two language directions.
Patrocinador/es: The research leading to these results were partially supported by projects CoMUN-HaT-TIN2015-70924-C2-1-R (MINECO/FEDER) and PROMETEO/2018/004.
URI: http://hdl.handle.net/10045/76033
ISBN: 978-84-09-01901-4
Idioma: eng
Tipo: info:eu-repo/semantics/conferenceObject
Derechos: © 2018 The authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC-BY-ND.
Revisión científica: si
Versión del editor: http://eamt2018.dlsi.ua.es/proceedings-eamt2018.pdf
Aparece en las colecciones:EAMT2018 - Proceedings

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailEAMT2018-Proceedings_12.pdf1,64 MBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons