Oversampling imbalanced data in the string space

Castellanos, Francisco J.; Valero-Mas, Jose J.; Calvo-Zaragoza, Jorge; Rico-Juan, Juan Ramón

Oversampling imbalanced data in the string space

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/72581

Registro completo de metadatos

Registro completo de metadatos
Campo DC	Valor	Idioma
dc.contributor	Reconocimiento de Formas e Inteligencia Artificial	es_ES
dc.contributor.author	Castellanos, Francisco J.	-
dc.contributor.author	Valero-Mas, Jose J.	-
dc.contributor.author	Calvo-Zaragoza, Jorge	-
dc.contributor.author	Rico-Juan, Juan Ramón	-
dc.contributor.other	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos	es_ES
dc.date.accessioned	2018-01-17T11:38:03Z	-
dc.date.available	2018-01-17T11:38:03Z	-
dc.date.issued	2018-02-01	-
dc.identifier.citation	Pattern Recognition Letters. 2018, 103: 32-38. doi:10.1016/j.patrec.2018.01.003	es_ES
dc.identifier.issn	0167-8655 (Print)	-
dc.identifier.issn	1872-7344 (Online)	-
dc.identifier.uri	http://hdl.handle.net/10045/72581	-
dc.description.abstract	Imbalanced data is a typical problem in the supervised classification field, which occurs when the different classes are not equally represented. This fact typically results in the classifier biasing its performance towards the class representing the majority of the elements. Many methods have been proposed to alleviate this scenario, yet all of them assume that data is represented as feature vectors. In this paper we propose a strategy to balance a dataset whose samples are encoded as strings. Our approach is based on adapting the well-known Synthetic Minority Over-sampling Technique (SMOTE) algorithm to the string space. More precisely, data generation is achieved with an iterative approach to create artificial strings within the segment between two given samples of the training set. Results with several datasets and imbalance ratios show that the proposed strategy properly deals with the problem in all cases considered.	es_ES
dc.description.sponsorship	This work was partially supported by the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013- 48152-C2-1-R supported by EU FEDER funds), the Universidad de Alicante through the FPU program (UAFPU2014–5883) and grant GRE-16-04 .	es_ES
dc.language	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	© 2018 Elsevier B.V.	es_ES
dc.subject	Class imbalance problem	es_ES
dc.subject	Oversampling	es_ES
dc.subject	String space	es_ES
dc.subject	SMOTE	es_ES
dc.subject.other	Lenguajes y Sistemas Informáticos	es_ES
dc.title	Oversampling imbalanced data in the string space	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.peerreviewed	si	es_ES
dc.identifier.doi	10.1016/j.patrec.2018.01.003	-
dc.relation.publisherversion	http://dx.doi.org/10.1016/j.patrec.2018.01.003	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2013-48152-C2-1-R	-
Aparece en las colecciones:	INV - GRFIA - Artículos de Revistas

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
2018_Castellanos_etal_PatternRecognLett_final.pdf	Versión final (acceso restringido)	716,98 kB	Adobe PDF	Abrir Solicitar una copia
2018_Castellanos_etal_PatternRecognLett_accepted.pdf	Accepted Manuscript (acceso abierto)	313,42 kB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro sencillo