Automatic Discovery of Heterogeneous Machine Learning Pipelines: An Application to Natural Language Processing

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/110740
Información del item - Informació de l'item - Item information
Title: Automatic Discovery of Heterogeneous Machine Learning Pipelines: An Application to Natural Language Processing
Authors: Estévez-Velarde, Suilan | Gutiérrez, Yoan | Montoyo, Andres | Almeida-Cruz, Yudivian
Research Group/s: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Keywords: AutoGOAL | Machine learning | Natural Language Processing
Knowledge Area: Lenguajes y Sistemas Informáticos
Issue Date: Dec-2020
Publisher: Association for Computational Linguistics
Citation: Estevez-Velarde, Suilan, et al. “Automatic Discovery of Heterogeneous Machine Learning Pipelines: An Application to Natural Language Processing”. In: COLING 2020, The 28th International Conference on Computational Linguistics: Proceedings of the Conference, December 8-13, 2020 Barcelona, Spain (Online), ISBN 978-1-952148-27-9, pp. 3558-3568
Abstract: This paper presents AutoGOAL, a system for automatic machine learning (AutoML) that uses heterogeneous techniques. In contrast with existing AutoML approaches, our contribution can automatically build machine learning pipelines that combine techniques and algorithms from different frameworks, including shallow classifiers, natural language processing tools, and neural networks. We define the heterogeneous AutoML optimization problem as the search for the best sequence of algorithms that transforms specific input data into the desired output. This provides a novel theoretical and practical approach to AutoML. Our proposal is experimentally evaluated in diverse machine learning problems and compared with alternative approaches, showing that it is competitive with other AutoML alternatives in standard benchmarks. Furthermore, it can be applied to novel scenarios, such as several NLP tasks, where existing alternatives cannot be directly deployed. The system is freely available and includes in-built compatibility with a large number of popular machine learning frameworks, which makes our approach useful for solving practical problems with relative ease and effort.
Sponsor: This research has been supported by a Carolina Foundation grant in agreement with University of Alicante and University of Havana. Moreover, it has also been partially funded by both aforementioned universities, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089).
URI: http://hdl.handle.net/10045/110740
ISBN: 978-1-952148-27-9
Language: eng
Type: info:eu-repo/semantics/conferenceObject
Rights: This work is licensed under a Creative Commons Attribution 4.0 International License. License details: http://creativecommons.org/licenses/by/4.0/.
Peer Review: si
Publisher version: https://www.aclweb.org/anthology/2020.coling-main.317
Appears in Collections:INV - GPLSI - Comunicaciones a Congresos, Conferencias, etc.

Files in This Item:
Files in This Item:
File Description SizeFormat 
Thumbnail2020.coling-main.317.pdf253,53 kBAdobe PDFOpen Preview


This item is licensed under a Creative Commons License Creative Commons