Data Reduction in the String Space for Efficient kNN Classification through Space Partitioning

Empreu sempre aquest identificador per citar o enllaçar aquest ítem http://hdl.handle.net/10045/106979
Información del item - Informació de l'item - Item information
Títol: Data Reduction in the String Space for Efficient kNN Classification through Space Partitioning
Autors: Valero-Mas, Jose J. | Castellanos, Francisco J.
Grups d'investigació o GITE: Reconocimiento de Formas e Inteligencia Artificial
Centre, Departament o Servei: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Paraules clau: String space | Data reduction | k-Nearest neighbor | Prototype generation
Àrees de coneixement: Lenguajes y Sistemas Informáticos
Data de publicació: 12-de maig-2020
Editor: MDPI
Citació bibliogràfica: Valero-Mas JJ, Castellanos FJ. Data Reduction in the String Space for Efficient kNN Classification through Space Partitioning. Applied Sciences. 2020; 10(10):3356. doi:10.3390/app10103356
Resum: Within the Pattern Recognition field, two representations are generally considered for encoding the data: statistical codifications, which describe elements as feature vectors, and structural representations, which encode elements as high-level symbolic data structures such as strings, trees or graphs. While the vast majority of classifiers are capable of addressing statistical spaces, only some particular methods are suitable for structural representations. The kNN classifier constitutes one of the scarce examples of algorithms capable of tackling both statistical and structural spaces. This method is based on the computation of the dissimilarity between all the samples of the set, which is the main reason for its high versatility, but in turn, for its low efficiency as well. Prototype Generation is one of the possibilities for palliating this issue. These mechanisms generate a reduced version of the initial dataset by performing data transformation and aggregation processes on the initial collection. Nevertheless, these generation processes are quite dependent on the data representation considered, being not generally well defined for structural data. In this work we present the adaptation of the generation-based reduction algorithm Reduction through Homogeneous Clusters to the case of string data. This algorithm performs the reduction by partitioning the space into class-homogeneous clusters for then generating a representative prototype as the median value of each group. Thus, the main issue to tackle is the retrieval of the median element of a set of strings. Our comprehensive experimentation comparatively assesses the performance of this algorithm in both the statistical and the string-based spaces. Results prove the relevance of our approach by showing a competitive compromise between classification rate and data reduction.
Patrocinadors: This research work was partially funded by “Programa I+D+i de la Generalitat Valenciana” through grant ACIF/2019/ 042 and the Spanish Ministry through HISPAMUS project TIN2017-86576-R, partially funded by the EU.
URI: http://hdl.handle.net/10045/106979
ISSN: 2076-3417
DOI: 10.3390/app10103356
Idioma: eng
Tipus: info:eu-repo/semantics/article
Drets: © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Revisió científica: si
Versió de l'editor: https://doi.org/10.3390/app10103356
Apareix a la col·lecció: INV - GRFIA - Artículos de Revistas

Arxius per aquest ítem:
Arxius per aquest ítem:
Arxiu Descripció Tamany Format  
ThumbnailValero-Mas_Castellanos_2020_ApplSci.pdf332,84 kBAdobe PDFObrir Vista prèvia


Aquest ítem està subjecte a una llicència de Creative Commons Llicència Creative Commons Creative Commons