Modelling parallel texts for boosting compression
Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10045/27534
Título: | Modelling parallel texts for boosting compression |
---|---|
Autor/es: | Adiego Rodríguez, Joaquín | Martínez Prieto, Miguel Ángel | Hoyos Torío, Javier E. | Sánchez-Martínez, Felipe |
Grupo/s de investigación o GITE: | Transducens |
Centro, Departamento o Servicio: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Palabras clave: | Machine translation | Bitext compression | Parallel texts | Modelling |
Área/s de conocimiento: | Lenguajes y Sistemas Informáticos |
Fecha de publicación: | mar-2010 |
Editor: | IEEE |
Cita bibliográfica: | ADIEGO, Joaquín, et al. "Modelling parallel texts for boosting compression". En: Data Compression Conference : 24-26 March 2010, Snowbird, Utah : proceedings / ed. by James A. Storer, Michael W. Marcellin. Los Alamitos, CA : IEEE Computer Society, 2010. ISBN 978-1-4244-6425-8, p. 517 |
Resumen: | Bilingual parallel corpora, also known as bitexts, convey the same information in two different languages. This implies that to model a bitext we can take advantage of the translation relationship that exists between the two texts; the text alignment task makes it possible to establish such a translation relationship. A biword is defined as a pair of words, each from a different text, that are mutual translations in the bitext; the use of biwords allows both texts in the bitext to be represented on a single model. Several biword-based schemes have been proposed leading to good compression ratios. Bearing in mind Melamed's affirmation which states that "the translation of a text into another language can be viewed as a detailed annotation of what that text means", we propose a new model for bitexts in agreement with this affirmation, dubbed MAR. The idea is to represent the words in the right text with respect to the preceding word in the left text; thus, a first-order model based on alignment relationships is proposed. |
Patrocinador/es: | Work supported by Spanish projects TIN2009-14009-C02-01 and TIN2009-14009-C02-02. Miguel A. Martínez-Prieto is granted by JCyL and ESF. |
URI: | http://hdl.handle.net/10045/27534 |
ISBN: | 978-1-4244-6425-8 |
ISSN: | 1068-0314 |
DOI: | 10.1109/DCC.2010.86 |
Idioma: | eng |
Tipo: | info:eu-repo/semantics/conferenceObject |
Derechos: | © Copyright 2010 IEEE |
Revisión científica: | si |
Versión del editor: | http://dx.doi.org/10.1109/DCC.2010.86 |
Aparece en las colecciones: | INV - TRANSDUCENS - Comunicaciones a Congresos, Conferencias, etc. |
Archivos en este ítem:
Archivo | Descripción | Tamaño | Formato | |
---|---|---|---|---|
adiego10a.pdf | Versión revisada (acceso abierto) | 35,67 kB | Adobe PDF | Abrir Vista previa |
adiego10a_final.pdf | Versión final (acceso restringido) | 87,15 kB | Adobe PDF | Abrir Solicitar una copia |
Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.