Oversampling imbalanced data in the string space

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/72581
Full metadata record
Full metadata record
DC FieldValueLanguage
dc.contributorReconocimiento de Formas e Inteligencia Artificiales_ES
dc.contributor.authorCastellanos, Francisco J.-
dc.contributor.authorValero Mas, José Javier-
dc.contributor.authorCalvo-Zaragoza, Jorge-
dc.contributor.authorRico Juan, Juan Ramón-
dc.contributor.otherUniversidad de Alicante. Departamento de Lenguajes y Sistemas Informáticoses_ES
dc.identifier.citationPattern Recognition Letters. 2018, 103: 32-38. doi:10.1016/j.patrec.2018.01.003es_ES
dc.identifier.issn0167-8655 (Print)-
dc.identifier.issn1872-7344 (Online)-
dc.description.abstractImbalanced data is a typical problem in the supervised classification field, which occurs when the different classes are not equally represented. This fact typically results in the classifier biasing its performance towards the class representing the majority of the elements. Many methods have been proposed to alleviate this scenario, yet all of them assume that data is represented as feature vectors. In this paper we propose a strategy to balance a dataset whose samples are encoded as strings. Our approach is based on adapting the well-known Synthetic Minority Over-sampling Technique (SMOTE) algorithm to the string space. More precisely, data generation is achieved with an iterative approach to create artificial strings within the segment between two given samples of the training set. Results with several datasets and imbalance ratios show that the proposed strategy properly deals with the problem in all cases considered.es_ES
dc.description.sponsorshipThis work was partially supported by the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013- 48152-C2-1-R supported by EU FEDER funds), the Universidad de Alicante through the FPU program (UAFPU2014–5883) and grant GRE-16-04 .es_ES
dc.rights© 2018 Elsevier B.V.es_ES
dc.subjectClass imbalance problemes_ES
dc.subjectString spacees_ES
dc.subject.otherLenguajes y Sistemas Informáticoses_ES
dc.titleOversampling imbalanced data in the string spacees_ES
Appears in Collections:INV - GRFIA - Artículos de Revistas

Files in This Item:
Files in This Item:
File Description SizeFormat 
Thumbnail2018_Castellanos_etal_PatternRecognLett_final.pdfVersión final (acceso restringido)716,98 kBAdobe PDFOpen    Request a copy
Thumbnail2018_Castellanos_etal_PatternRecognLett_accepted.pdfEmbargo 24 meses (acceso abierto: 6 en. 2020)313,42 kBAdobe PDFOpen    Request a copy

Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.