Investigating Backtranslation in Neural Machine Translation

Poncelas, Alberto; Shterionov, Dimitar; Way, Andy; Maillette de Buy Wenniger, Gideon; Passban, Peyman

Investigating Backtranslation in Neural Machine Translation

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/76085

Registro completo de metadatos

Registro completo de metadatos
Campo DC	Valor	Idioma
dc.contributor.author	Poncelas, Alberto	-
dc.contributor.author	Shterionov, Dimitar	-
dc.contributor.author	Way, Andy	-
dc.contributor.author	Maillette de Buy Wenniger, Gideon	-
dc.contributor.author	Passban, Peyman	-
dc.date.accessioned	2018-05-31T10:09:17Z	-
dc.date.available	2018-05-31T10:09:17Z	-
dc.date.issued	2018	-
dc.identifier.citation	Poncelas, Alberto, et al. “Investigating Backtranslation in Neural Machine Translation”. In: Pérez-Ortiz, Juan Antonio, et al. (Eds.). Proceedings of the 21st Annual Conference of the European Association for Machine Translation: 28-30 May 2018, Universitat d'Alacant, Alacant, Spain, pp. 249-258	es_ES
dc.identifier.isbn	978-84-09-01901-4	-
dc.identifier.uri	http://hdl.handle.net/10045/76085	-
dc.description.abstract	A prerequisite for training corpus-based machine translation (MT) systems – either Statistical MT (SMT) or Neural MT (NMT) – is the availability of high-quality parallel data. This is arguably more important today than ever before, as NMT has been shown in many studies to outperform SMT, but mostly when large parallel corpora are available; in cases where data is limited, SMT can still outperform NMT. Recently researchers have shown that back-translating monolingual data can be used to create synthetic parallel corpora, which in turn can be used in combination with authentic parallel data to train a high-quality NMT system. Given that large collections of new parallel text become available only quite rarely, back-translation has become the norm when building state-of-the-art NMT systems, especially in resource-poor scenarios. However, we assert that there are many unknown factors regarding the actual effects of back-translated data on the translation capabilities of an NMT model. Accordingly, in this work we investigate how using back-translated data as a training corpus – both as a separate standalone dataset as well as combined with human-generated parallel data – affects the performance of an NMT model. We use incrementally larger amounts of back-translated data to train a range of NMT systems for German-to-English, and analyse the resulting translation performance.	es_ES
dc.description.sponsorship	This research has been supported by the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund. This work has also received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 713567.	es_ES
dc.language	eng	es_ES
dc.publisher	European Association for Machine Translation	es_ES
dc.rights	© 2018 The authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC-BY-ND.	es_ES
dc.subject	Machine Translation	es_ES
dc.subject.other	Lenguajes y Sistemas Informáticos	es_ES
dc.title	Investigating Backtranslation in Neural Machine Translation	es_ES
dc.type	info:eu-repo/semantics/conferenceObject	es_ES
dc.peerreviewed	si	es_ES
dc.relation.publisherversion	http://eamt2018.dlsi.ua.es/proceedings-eamt2018.pdf	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/713567	es_ES
Aparece en las colecciones:	EAMT2018 - Proceedings Investigaciones financiadas por la UE

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
EAMT2018-Proceedings_27.pdf		1,7 MB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro sencillo

Este ítem está licenciado bajo Licencia Creative Commons