Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/117579
Información del item - Informació de l'item - Item information
Título: Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification
Autor/es: Castellanos, Francisco J. | Valero-Mas, Jose J. | Calvo-Zaragoza, Jorge
Grupo/s de investigación o GITE: Reconocimiento de Formas e Inteligencia Artificial
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: String Space | Data Reduction | k-Nearest Neighbor | Prototype Generation
Área/s de conocimiento: Lenguajes y Sistemas Informáticos
Fecha de publicación: 2-sep-2021
Editor: Springer Nature
Cita bibliográfica: Soft Computing. 2021, 25: 15403-15415. https://doi.org/10.1007/s00500-021-06178-2
Resumen: The k-nearest neighbor (kNN) rule is one of the best-known distance-based classifiers, and is usually associated with high performance and versatility as it requires only the definition of a dissimilarity measure. Nevertheless, kNN is also coupled with low-efficiency levels since, for each new query, the algorithm must carry out an exhaustive search of the training data, and this drawback is much more relevant when considering complex structural representations, such as graphs, trees or strings, owing to the cost of the dissimilarity metrics. This issue has generally been tackled through the use of data reduction (DR) techniques, which reduce the size of the reference set, but the complexity of structural data has historically limited their application in the aforementioned scenarios. A DR algorithm denominated as reduction through homogeneous clusters (RHC) has recently been adapted to string representations but as obtaining the exact median value of a set of string data is known to be computationally difficult, its authors resorted to computing the set-median value. Under the premise that a more exact median value may be beneficial in this context, we, therefore, present a new adaptation of the RHC algorithm for string data, in which an approximate median computation is carried out. The results obtained show significant improvements when compared to those of the set-median version of the algorithm, in terms of both classification performance and reduction rates.
Patrocinador/es: Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research work was partially funded by “Programa I+D+i de la Generalitat Valenciana” through grants ACIF/2019/042 and APOSTD/2020/256, the Spanish Ministry through HISPAMUS project TIN2017-86576-R, partially funded by the EU, and the University of Alicante through project GRE19-04.
URI: http://hdl.handle.net/10045/117579
ISSN: 1432-7643 (Print) | 1433-7479 (Online)
DOI: 10.1007/s00500-021-06178-2
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Revisión científica: si
Versión del editor: https://doi.org/10.1007/s00500-021-06178-2
Aparece en las colecciones:INV - GRFIA - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailCastellanos_etal_2021_SoftComput.pdf664,17 kBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons