Creating the best development corpus for Statistical Machine Translation systems
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10045/76033
Title: | Creating the best development corpus for Statistical Machine Translation systems |
---|---|
Authors: | Chinea-Rios, Mara | Sanchis-Trilles, Germán | Casacuberta, Francisco |
Keywords: | Machine Translation |
Knowledge Area: | Lenguajes y Sistemas Informáticos |
Issue Date: | 2018 |
Publisher: | European Association for Machine Translation |
Citation: | Chinea-Rios, Mara; Sanchis-Trilles, Germán; Casacuberta, Francisco. “Creating the best development corpus for Statistical Machine Translation systems”. In: Pérez-Ortiz, Juan Antonio, et al. (Eds.). Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain, pp. 99-108 |
Abstract: | We propose and study three different novel approaches for tackling the problem of development set selection in Statistical Machine Translation. We focus on a scenario where a machine translation system is leveraged for translating a specific test set, without further data from the domain at hand. Such test set stems from a real application of machine translation, where the texts of a specific e-commerce were to be translated. For developing our development-set selection techniques, we first conducted experiments in a controlled scenario, where labelled data from different domains was available, and evaluated the techniques both with classification and translation quality metrics. Then, the best-performing techniques were evaluated on the e-commerce data at hand, yielding consistent improvements across two language directions. |
Sponsor: | The research leading to these results were partially supported by projects CoMUN-HaT-TIN2015-70924-C2-1-R (MINECO/FEDER) and PROMETEO/2018/004. |
URI: | http://hdl.handle.net/10045/76033 |
ISBN: | 978-84-09-01901-4 |
Language: | eng |
Type: | info:eu-repo/semantics/conferenceObject |
Rights: | © 2018 The authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC-BY-ND. |
Peer Review: | si |
Publisher version: | http://eamt2018.dlsi.ua.es/proceedings-eamt2018.pdf |
Appears in Collections: | EAMT2018 - Proceedings |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
EAMT2018-Proceedings_12.pdf | 1,64 MB | Adobe PDF | Open Preview | |
This item is licensed under a Creative Commons License