Iterative Data Augmentation for Neural Machine Translation: a Low Resource Case Study for English–Telugu

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/76090
Información del item - Informació de l'item - Item information
Título: Iterative Data Augmentation for Neural Machine Translation: a Low Resource Case Study for English–Telugu
Autor/es: Dandapat, Sandipan | Federmann, Christian
Palabras clave: Machine Translation
Área/s de conocimiento: Lenguajes y Sistemas Informáticos
Fecha de publicación: 2018
Editor: European Association for Machine Translation
Cita bibliográfica: Dandapat, Sandipan; Federmann, Christian. “Iterative Data Augmentation for Neural Machine Translation: a Low Resource Case Study for English–Telugu”. In: Pérez-Ortiz, Juan Antonio, et al. (Eds.). Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain, pp. 287-292
Resumen: Telugu is the fifteenth most commonly spoken language in the world with an estimated reach of 75 million people in the Indian subcontinent. At the same time, it is a severely low resourced language. In this paper, we present work on English–Telugu general domain machine translation (MT) systems using small amounts of parallel data. The baseline statistical (SMT) and neural MT (NMT) systems do not yield acceptable translation quality, mostly due to limited resources. However, the use of synthetic parallel data (generated using back translation, based on an NMT engine) significantly improves translation quality and allows NMT to outperform SMT. We extend back translation and propose a new, iterative data augmentation (IDA) method. Filtering of synthetic data and IDA both further boost translation quality of our final NMT systems, as measured by BLEU scores on all test sets and based on state-of-the-art human evaluation.
URI: http://hdl.handle.net/10045/76090
ISBN: 978-84-09-01901-4
Idioma: eng
Tipo: info:eu-repo/semantics/conferenceObject
Derechos: © 2018 The authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC-BY-ND.
Revisión científica: si
Versión del editor: http://eamt2018.dlsi.ua.es/proceedings-eamt2018.pdf
Aparece en las colecciones:EAMT2018 - Proceedings

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailEAMT2018-Proceedings_31.pdf1,61 MBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons