EvoSplit: An Evolutionary Approach to Split a Multi-Label Data Set into Disjoint Subsets

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/113868
Información del item - Informació de l'item - Item information
Título: EvoSplit: An Evolutionary Approach to Split a Multi-Label Data Set into Disjoint Subsets
Autor/es: Flórez-Revuelta, Francisco
Grupo/s de investigación o GITE: Informática Industrial y Redes de Computadores
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Tecnología Informática y Computación
Palabras clave: Multi-label data sets | Supervised learning | Machine learning | Evolutionary computation | Big data applications
Área/s de conocimiento: Arquitectura y Tecnología de Computadores
Fecha de publicación: 22-mar-2021
Editor: MDPI
Cita bibliográfica: Florez-Revuelta F. EvoSplit: An Evolutionary Approach to Split a Multi-Label Data Set into Disjoint Subsets. Applied Sciences. 2021; 11(6):2823. https://doi.org/10.3390/app11062823
Resumen: This paper presents a new evolutionary approach, EvoSplit, for the distribution of multi-label data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using iterative stratification, a method that aims to maintain the label (or label pair) distribution of the original data set into the different subsets. Following the same aim, this paper first introduces a single-objective evolutionary approach that tries to obtain a split that maximizes the similarity between those distributions independently. Second, a new multi-objective evolutionary algorithm is presented to maximize the similarity considering simultaneously both distributions (labels and label pairs). Both approaches are validated using well-known multi-label data sets as well as large image data sets currently used in computer vision and machine learning applications. EvoSplit improves the splitting of a data set in comparison to the iterative stratification following different measures: Label Distribution, Label Pair Distribution, Examples Distribution, folds and fold-label pairs with zero positive examples.
URI: http://hdl.handle.net/10045/113868
ISSN: 2076-3417
DOI: 10.3390/app11062823
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Revisión científica: si
Versión del editor: https://doi.org/10.3390/app11062823
Aparece en las colecciones:INV - I2RC - Artículos de Revistas
INV - AmI4AHA - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailFlorez-Revuelta_2021_ApplSci.pdf959,84 kBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons