Cross-lingual neural fuzzy matching for exploiting target-language monolingual corpora in computer-aided translation

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/140026
Información del item - Informació de l'item - Item information
Título: Cross-lingual neural fuzzy matching for exploiting target-language monolingual corpora in computer-aided translation
Autor/es: Esplà-Gomis, Miquel | Sánchez-Cartagena, Víctor M. | Pérez-Ortiz, Juan Antonio | Sánchez-Martínez, Felipe
Grupo/s de investigación o GITE: Transducens
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: Fuzzy match | Computer-aided translation | Monolingual corpora
Fecha de publicación: dic-2022
Editor: Association for Computational Linguistics
Cita bibliográfica: Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, and Felipe Sánchez-Martínez. 2022. Cross-lingual neural fuzzy matching for exploiting target-language monolingual corpora in computer-aided translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7532–7543, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.511
Resumen: Computer-aided translation (CAT) tools based on translation memories (MT) play a prominent role in the translation workflow of professional translators. However, the reduced availability of in-domain TMs, as compared to in-domain monolingual corpora, limits its adoption for a number of translation tasks. In this paper, we introduce a novel neural approach aimed at overcoming this limitation by exploiting not only TMs, but also in-domain target-language (TL) monolingual corpora, and still enabling a similar functionality to that offered by conventional TM-based CAT tools. Our approach relies on cross-lingual sentence embeddings to retrieve translation proposals from TL monolingual corpora, and on a neural model to estimate their post-editing effort. The paper presents an automatic evaluation of these techniques on four language pairs that shows that our approach can successfully exploit monolingual texts in a TM-based CAT environment, increasing the amount of useful translation proposals, and that our neural model for estimating the post-editing effort enables the combination of translation proposals obtained from monolingual corpora and from TMs in the usual way. A human evaluation performed on a single language pair confirms the results of the automatic evaluation and seems to indicate that the translation proposals retrieved with our approach are more useful than what the automatic evaluation shows.
Patrocinador/es: This paper is part of the R+D+i project PID2021-127999NB-I00 funded by the Spanish Ministry of Science and Innovation (MCIN), the Spanish Research Agency (AEI/10.13039/501100011033), and the European Regional Development Fund A way to make Europe. The computational resources used for the experiments were funded by the European Regional Development Fund through project IDIFEDER/2020/003. We thank the European Association for Machine Translation for funding the human evaluation reported in this paper through the EAMT Sponsorship of Activities for 2021.
URI: http://hdl.handle.net/10045/140026
DOI: 10.18653/v1/2022.emnlp-main.511
Idioma: eng
Tipo: info:eu-repo/semantics/conferenceObject
Derechos: © 2022 Association for Computational Linguistics. Creative Commons 4.0 BY (Attribution) license
Revisión científica: si
Versión del editor: https://doi.org/10.18653/v1/2022.emnlp-main.511
Aparece en las colecciones:INV - TRANSDUCENS - Comunicaciones a Congresos, Conferencias, etc.

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
Thumbnail2022-emnlp-main-511.pdf201,5 kBAdobe PDFAbrir Vista previa


Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.