Feature selection based on information theory

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/18362
Información del item - Informació de l'item - Item information
Title: Feature selection based on information theory
Authors: Bonev, Boyan
Research Director: Cazorla Quevedo, Miguel Ángel | Escolano Ruiz, Francisco
Center, Department or Service: Universidad de Alicante. Departamento de Ciencia de la Computación e Inteligencia Artificial
Keywords: Feature selection | Information theory | Pattern recognition | Supervised classification | Selección de características | Teoría de la información | Reconocimiento de patrones | Clasificación supervisada
Knowledge Area: Ciencia de la Computación e Inteligencia Artificial
Date Created: 2010
Issue Date: 2010
Date of defense: 29-Jun-2010
Publisher: Universidad de Alicante
Abstract: Along with the improvement of data acquisition techniques and the increasing computational capacity of computers, the dimensionality of the data grows higher. Pattern recognition methods have to deal with samples consisting of thousands of features and the reduction of their dimensionality becomes crucial to make them tractable. Feature selection is a technique for removing the irrelevant and noisy features and selecting a subset of features which describe better the samples and produce a better classification performance. It is becoming an essential part of most pattern recognition applications. | In this thesis we propose a feature selection method for supervised classification. The main contribution is the efficient use of information theory, which provides a solid theoretical framework for measuring the relation between the classes and the features. Mutual information is considered to be the best measure for such purpose. Traditionally it has been measured for ranking single features without taking into account the entire set of selected features. This is due to the computational complexity involved in estimating the mutual information. However, in most data sets the features are not independent and their combination provides much more information about the class, than the sum of their individual prediction power. | Methods based on density estimation can only be used for data sets with a very high number of samples and low number of features. Due to the curse of dimensionality, in a multi-dimensional feature space the amount of samples required for a reliable density estimation is very high. For this reason we analyse the use of different estimation methods which bypass the density estimation and estimate entropy directly from the set of samples. These methods allow us to efficiently evaluate sets of thousands of features. | For high-dimensional feature sets another problem is the search order of the feature space. All non-prohibitive computational cost algorithms search for a sub-optimal feature set. Greedy algorithms are the fastest and are the ones which incur less overfitting. We show that from the information theoretical perspective, a greedy backward selection algorithm conserves the amount of mutual information, even though the feature set is not the minimal one. | We also validate our method in several real-world applications. We apply feature selection to omnidirectional image classification through a novel approach. It is appearance-based and we select features from a bank of filters applied to different parts of the image. The context of the task is place recognition for mobile robotics. Another set of experiments are performed on microarrays from gene expression databases. The classification problem aims to predict the disease of a new patient. We present a comparison of the classification performance and the algorithms we present showed to outperform the existing ones. Finally, we succesfully apply feature selection to spectral graph classification. All the features we use are for unattributed graphs, which constitutes a contribution to the field. We also draw interesting conclusions about which spectral features matter most, under different experimental conditions. In the context of graph classification we also show important is the precise estimation of mutual information and we analyse its impact on the final classification results.
URI: http://hdl.handle.net/10045/18362
ISBN: 978-84-694-1634-1
Language: eng
Type: info:eu-repo/semantics/doctoralThesis
Appears in Collections: Doctoral theses

Files in This Item:
Files in This Item:
File Description SizeFormat 
Thumbnailtesis_Ivanov.pdf3,63 MBAdobe PDFOpen Preview

This item is licensed under a Creative Commons License Creative Commons