Creating the best development corpus for Statistical Machine Translation systems

Please use this identifier to cite or link to this item:
Información del item - Informació de l'item - Item information
Title: Creating the best development corpus for Statistical Machine Translation systems
Authors: Chinea-Rios, Mara | Sanchis-Trilles, Germán | Casacuberta, Francisco
Keywords: Machine Translation
Knowledge Area: Lenguajes y Sistemas Informáticos
Issue Date: 2018
Publisher: European Association for Machine Translation
Citation: Chinea-Rios, Mara; Sanchis-Trilles, Germán; Casacuberta, Francisco. “Creating the best development corpus for Statistical Machine Translation systems”. In: Pérez-Ortiz, Juan Antonio, et al. (Eds.). Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain, pp. 99-108
Abstract: We propose and study three different novel approaches for tackling the problem of development set selection in Statistical Machine Translation. We focus on a scenario where a machine translation system is leveraged for translating a specific test set, without further data from the domain at hand. Such test set stems from a real application of machine translation, where the texts of a specific e-commerce were to be translated. For developing our development-set selection techniques, we first conducted experiments in a controlled scenario, where labelled data from different domains was available, and evaluated the techniques both with classification and translation quality metrics. Then, the best-performing techniques were evaluated on the e-commerce data at hand, yielding consistent improvements across two language directions.
Sponsor: The research leading to these results were partially supported by projects CoMUN-HaT-TIN2015-70924-C2-1-R (MINECO/FEDER) and PROMETEO/2018/004.
ISBN: 978-84-09-01901-4
Language: eng
Type: info:eu-repo/semantics/conferenceObject
Rights: © 2018 The authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC-BY-ND.
Peer Review: si
Publisher version:
Appears in Collections:Congresos - EAMT2018 - Proceedings

Files in This Item:
Files in This Item:
File Description SizeFormat 
ThumbnailEAMT2018-Proceedings_12.pdf1,64 MBAdobe PDFOpen Preview

This item is licensed under a Creative Commons License Creative Commons