Intelligent Ensembling of Auto-ML System Outputs for Solving Classification Problems

Consuegra-Ayala, Juan Pablo; Gutiérrez, Yoan; Almeida-Cruz, Yudivian; Palomar, Manuel

Intelligent Ensembling of Auto-ML System Outputs for Solving Classification Problems

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/125532

Información del item - Informació de l'item - Item information
Título:	Intelligent Ensembling of Auto-ML System Outputs for Solving Classification Problems
Autor/es:	Consuegra-Ayala, Juan Pablo \| Gutiérrez, Yoan \| Almeida-Cruz, Yudivian \| Palomar, Manuel
Grupo/s de investigación o GITE:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos \| Universidad de Alicante. Instituto Universitario de Investigación Informática
Palabras clave:	Ensemble Methods \| Auto-ML \| Grammatical Evolution \| Supervised Learning
Fecha de publicación:	18-jul-2022
Editor:	Elsevier
Cita bibliográfica:	Information Sciences. 2022, 609: 766-780. https://doi.org/10.1016/j.ins.2022.07.061
Resumen:	Automatic Machine Learning (Auto-ML) tools enable the automatic solution of real-world problems through machine learning techniques. These tools tend to be more time consuming than standard machine learning libraries, therefore, exploiting all the available resources to the full is a valuable feature. This paper presents a two-phase optimization system for solving classification problems. The system is designed to produce more robust classifiers by exploiting the different architectures that are generated while solving classification problems with Auto-ML tools, particularly AutoGOAL. In the first phase, the system follows a probabilistic strategy to find the best combination of algorithms and hyperparameters to generate a collection of base models according to certain diversity criteria; and in the second, it follows similar Auto-ML strategies to ensemble those models. The HAHA 2019 challenge corpus and the Adult dataset were used to evaluate the system. The experimental results show that: i) a better solution can be built by ensembling a subset of the already tested models; ii) the performance of ensemble methods depends on the collection of base models used; and, iii) ensuring diversity using the double-fault measure produces better results than the disagreement measure. The source code is available online for the research community.
Patrocinador/es:	This research has been partially funded by the University of Alicante and the University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22), INTEGER (RTI2018-094649-B-I00) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089). Moreover, it has been backed by the work of both COST Actions: CA19134 - “Distributed Knowledge Graphs” and CA19142 - “Leading Platform for European Citizens, Industries, Academia and Policymakers in Media Accessibility”.
URI:	http://hdl.handle.net/10045/125532
ISSN:	0020-0255 (Print) \| 1872-6291 (Online)
DOI:	10.1016/j.ins.2022.07.061
Idioma:	eng
Tipo:	info:eu-repo/semantics/article
Derechos:	© 2022 Elsevier Inc.
Revisión científica:	si
Versión del editor:	https://doi.org/10.1016/j.ins.2022.07.061
Aparece en las colecciones:	INV - GPLSI - Artículos de Revistas

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
Consuegra-Ayala_etal_2022_InformSci_accepted.pdf	Embargo 24 meses (acceso abierto: 19 jul. 2024)	776,39 kB	Adobe PDF	Abrir Solicitar una copia
Consuegra-Ayala_etal_2022_InformSci_final.pdf	Versión final (acceso restringido)	1,03 MB	Adobe PDF	Abrir Solicitar una copia

Ver citas en Google Académico

Muestra el registro completo