Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification
Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10045/117579
Título: | Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification |
---|---|
Autor/es: | Castellanos, Francisco J. | Valero-Mas, Jose J. | Calvo-Zaragoza, Jorge |
Grupo/s de investigación o GITE: | Reconocimiento de Formas e Inteligencia Artificial |
Centro, Departamento o Servicio: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Palabras clave: | String Space | Data Reduction | k-Nearest Neighbor | Prototype Generation |
Área/s de conocimiento: | Lenguajes y Sistemas Informáticos |
Fecha de publicación: | 2-sep-2021 |
Editor: | Springer Nature |
Cita bibliográfica: | Soft Computing. 2021, 25: 15403-15415. https://doi.org/10.1007/s00500-021-06178-2 |
Resumen: | The k-nearest neighbor (kNN) rule is one of the best-known distance-based classifiers, and is usually associated with high performance and versatility as it requires only the definition of a dissimilarity measure. Nevertheless, kNN is also coupled with low-efficiency levels since, for each new query, the algorithm must carry out an exhaustive search of the training data, and this drawback is much more relevant when considering complex structural representations, such as graphs, trees or strings, owing to the cost of the dissimilarity metrics. This issue has generally been tackled through the use of data reduction (DR) techniques, which reduce the size of the reference set, but the complexity of structural data has historically limited their application in the aforementioned scenarios. A DR algorithm denominated as reduction through homogeneous clusters (RHC) has recently been adapted to string representations but as obtaining the exact median value of a set of string data is known to be computationally difficult, its authors resorted to computing the set-median value. Under the premise that a more exact median value may be beneficial in this context, we, therefore, present a new adaptation of the RHC algorithm for string data, in which an approximate median computation is carried out. The results obtained show significant improvements when compared to those of the set-median version of the algorithm, in terms of both classification performance and reduction rates. |
Patrocinador/es: | Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research work was partially funded by “Programa I+D+i de la Generalitat Valenciana” through grants ACIF/2019/042 and APOSTD/2020/256, the Spanish Ministry through HISPAMUS project TIN2017-86576-R, partially funded by the EU, and the University of Alicante through project GRE19-04. |
URI: | http://hdl.handle.net/10045/117579 |
ISSN: | 1432-7643 (Print) | 1433-7479 (Online) |
DOI: | 10.1007/s00500-021-06178-2 |
Idioma: | eng |
Tipo: | info:eu-repo/semantics/article |
Derechos: | © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
Revisión científica: | si |
Versión del editor: | https://doi.org/10.1007/s00500-021-06178-2 |
Aparece en las colecciones: | INV - GRFIA - Artículos de Revistas |
Archivos en este ítem:
Archivo | Descripción | Tamaño | Formato | |
---|---|---|---|---|
Castellanos_etal_2021_SoftComput.pdf | 664,17 kB | Adobe PDF | Abrir Vista previa | |
Este ítem está licenciado bajo Licencia Creative Commons