Integrating corpus-based and rule-based approaches in an open-source machine translation system
Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10045/27530
Título: | Integrating corpus-based and rule-based approaches in an open-source machine translation system |
---|---|
Autor/es: | Sánchez-Martínez, Felipe | Pérez-Ortiz, Juan Antonio | Forcada, Mikel L. |
Grupo/s de investigación o GITE: | Transducens |
Centro, Departamento o Servicio: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Palabras clave: | Machine translation | Corpus-based | Rule-based | Open-source | Apertium |
Área/s de conocimiento: | Lenguajes y Sistemas Informáticos |
Fecha de publicación: | ene-2007 |
Editor: | University of Leuven. Centre for Computational Linguistics |
Cita bibliográfica: | SÁNCHEZ-MARTÍNEZ, Felipe; PÉREZ-ORTIZ, Juan Antonio; FORCADA, Mikel L. "Integrating corpus-based and rule-based approaches in an open-source machine translation system". En: Proceedings of METIS-II Workshop : New Approaches to Machine Translation, January 11, 2007, Leuven, Belgium, pp. 73-82 |
Resumen: | Most current taxonomies of machine translation (MT) systems start by contrasting rule-based (RB) systems with corpusbased (CB) ones. These two approaches are much more than theoretical boundaries since many working MT systems fall within one of them. However, hybrid MT systems integrating RB and CB approaches are receiving increasing attention. In this paper we show our current research on using CB methods to extend a MT system primarily designed following the RB approach. Specifically, the open-source MT system Apertium is being extended with a set of CB tools to be also released under an open-source license, therefore allowing third parties to freely use or modify them. We present CB extensions for Apertium allowing (a) to improve its part-of-speech tagger, (b) to automatically infer the set of transfer rules, and (c) to tackle the problem of the translation of polysemous words. A common feature of these CB methods is the use of unsupervised corpora in the target language of the MT system. The resulting hybrid system preserves most of the advantages of the RB approach while reducing the need for human intervention. |
Patrocinador/es: | Work funded by the Spanish Ministry of Science and Technology through project TIC2003-08681-C02-01 and by the Spanish Ministry of Education and Science and the European Social Fund through grant BES-2004-4711. The development of the Apertium MT engine was initially funded by the Spanish Ministry of Industry, Tourism and Commerce through grants FIT-340101-2004-3 and FIT-340001-2005-2. The enhancement of Apertium is being funded by the Generalitat de Catalunya. |
URI: | http://hdl.handle.net/10045/27530 |
Idioma: | eng |
Tipo: | info:eu-repo/semantics/conferenceObject |
Revisión científica: | si |
Aparece en las colecciones: | INV - TRANSDUCENS - Comunicaciones a Congresos, Conferencias, etc. |
Archivos en este ítem:
Archivo | Descripción | Tamaño | Formato | |
---|---|---|---|---|
sanchez07a.pdf | 165,96 kB | Adobe PDF | Abrir Vista previa | |
Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.