Compressed kNN: K-Nearest Neighbors with Data Compression

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/89038
Información del item - Informació de l'item - Item information
Título: Compressed kNN: K-Nearest Neighbors with Data Compression
Autor/es: Salvador, Jaime | Ruiz, Zoila | Garcia-Rodriguez, Jose
Grupo/s de investigación o GITE: Informática Industrial y Redes de Computadores
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Tecnología Informática y Computación
Palabras clave: Classification | KNN | Compression | Categorical data | Feature pre-processing
Área/s de conocimiento: Arquitectura y Tecnología de Computadores
Fecha de publicación: 28-feb-2019
Editor: MDPI
Cita bibliográfica: Salvador–Meneses J, Ruiz–Chavez Z, Garcia–Rodriguez J. Compressed kNN: K-Nearest Neighbors with Data Compression. Entropy. 2019; 21(3):234. doi:10.3390/e21030234
Resumen: The kNN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running.
URI: http://hdl.handle.net/10045/89038
ISSN: 1099-4300
DOI: 10.3390/e21030234
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Revisión científica: si
Versión del editor: https://doi.org/10.3390/e21030234
Aparece en las colecciones:INV - AIA - Artículos de Revistas
INV - I2RC - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
Thumbnail2019_Salvador-Meneses_etal_Entropy.pdf868,54 kBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons