Intelligent Ensembling of Auto-ML System Outputs for Solving Classification Problems
Empreu sempre aquest identificador per citar o enllaçar aquest ítem
http://hdl.handle.net/10045/125532
Títol: | Intelligent Ensembling of Auto-ML System Outputs for Solving Classification Problems |
---|---|
Autors: | Consuegra-Ayala, Juan Pablo | Gutiérrez, Yoan | Almeida-Cruz, Yudivian | Palomar, Manuel |
Grups d'investigació o GITE: | Procesamiento del Lenguaje y Sistemas de Información (GPLSI) |
Centre, Departament o Servei: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos | Universidad de Alicante. Instituto Universitario de Investigación Informática |
Paraules clau: | Ensemble Methods | Auto-ML | Grammatical Evolution | Supervised Learning |
Data de publicació: | 18-de juliol-2022 |
Editor: | Elsevier |
Citació bibliogràfica: | Information Sciences. 2022, 609: 766-780. https://doi.org/10.1016/j.ins.2022.07.061 |
Resum: | Automatic Machine Learning (Auto-ML) tools enable the automatic solution of real-world problems through machine learning techniques. These tools tend to be more time consuming than standard machine learning libraries, therefore, exploiting all the available resources to the full is a valuable feature. This paper presents a two-phase optimization system for solving classification problems. The system is designed to produce more robust classifiers by exploiting the different architectures that are generated while solving classification problems with Auto-ML tools, particularly AutoGOAL. In the first phase, the system follows a probabilistic strategy to find the best combination of algorithms and hyperparameters to generate a collection of base models according to certain diversity criteria; and in the second, it follows similar Auto-ML strategies to ensemble those models. The HAHA 2019 challenge corpus and the Adult dataset were used to evaluate the system. The experimental results show that: i) a better solution can be built by ensembling a subset of the already tested models; ii) the performance of ensemble methods depends on the collection of base models used; and, iii) ensuring diversity using the double-fault measure produces better results than the disagreement measure. The source code is available online for the research community. |
Patrocinadors: | This research has been partially funded by the University of Alicante and the University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22), INTEGER (RTI2018-094649-B-I00) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089). Moreover, it has been backed by the work of both COST Actions: CA19134 - “Distributed Knowledge Graphs” and CA19142 - “Leading Platform for European Citizens, Industries, Academia and Policymakers in Media Accessibility”. |
URI: | http://hdl.handle.net/10045/125532 |
ISSN: | 0020-0255 (Print) | 1872-6291 (Online) |
DOI: | 10.1016/j.ins.2022.07.061 |
Idioma: | eng |
Tipus: | info:eu-repo/semantics/article |
Drets: | © 2022 Elsevier Inc. |
Revisió científica: | si |
Versió de l'editor: | https://doi.org/10.1016/j.ins.2022.07.061 |
Apareix a la col·lecció: | INV - GPLSI - Artículos de Revistas |
Arxius per aquest ítem:
Arxiu | Descripció | Tamany | Format | |
---|---|---|---|---|
![]() | Accepted Manuscript (acceso abierto) | 776,39 kB | Adobe PDF | Obrir Vista prèvia |
![]() | Versión final (acceso restringido) | 1,03 MB | Adobe PDF | Obrir Sol·licitar una còpia |
Tots els documents dipositats a RUA estan protegits per drets d'autors. Alguns drets reservats.