Integrating corpus-based and rule-based approaches in an open-source machine translation system

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/27530
Información del item - Informació de l'item - Item information
Título: Integrating corpus-based and rule-based approaches in an open-source machine translation system
Autor/es: Sánchez-Martínez, Felipe | Pérez-Ortiz, Juan Antonio | Forcada, Mikel L.
Grupo/s de investigación o GITE: Transducens
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: Machine translation | Corpus-based | Rule-based | Open-source | Apertium
Área/s de conocimiento: Lenguajes y Sistemas Informáticos
Fecha de publicación: ene-2007
Editor: University of Leuven. Centre for Computational Linguistics
Cita bibliográfica: SÁNCHEZ-MARTÍNEZ, Felipe; PÉREZ-ORTIZ, Juan Antonio; FORCADA, Mikel L. "Integrating corpus-based and rule-based approaches in an open-source machine translation system". En: Proceedings of METIS-II Workshop : New Approaches to Machine Translation, January 11, 2007, Leuven, Belgium, pp. 73-82
Resumen: Most current taxonomies of machine translation (MT) systems start by contrasting rule-based (RB) systems with corpusbased (CB) ones. These two approaches are much more than theoretical boundaries since many working MT systems fall within one of them. However, hybrid MT systems integrating RB and CB approaches are receiving increasing attention. In this paper we show our current research on using CB methods to extend a MT system primarily designed following the RB approach. Specifically, the open-source MT system Apertium is being extended with a set of CB tools to be also released under an open-source license, therefore allowing third parties to freely use or modify them. We present CB extensions for Apertium allowing (a) to improve its part-of-speech tagger, (b) to automatically infer the set of transfer rules, and (c) to tackle the problem of the translation of polysemous words. A common feature of these CB methods is the use of unsupervised corpora in the target language of the MT system. The resulting hybrid system preserves most of the advantages of the RB approach while reducing the need for human intervention.
Patrocinador/es: Work funded by the Spanish Ministry of Science and Technology through project TIC2003-08681-C02-01 and by the Spanish Ministry of Education and Science and the European Social Fund through grant BES-2004-4711. The development of the Apertium MT engine was initially funded by the Spanish Ministry of Industry, Tourism and Commerce through grants FIT-340101-2004-3 and FIT-340001-2005-2. The enhancement of Apertium is being funded by the Generalitat de Catalunya.
URI: http://hdl.handle.net/10045/27530
Idioma: eng
Tipo: info:eu-repo/semantics/conferenceObject
Revisión científica: si
Aparece en las colecciones:INV - TRANSDUCENS - Comunicaciones a Congresos, Conferencias, etc.

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
Thumbnailsanchez07a.pdf165,96 kBAdobe PDFAbrir Vista previa


Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.