A Data Analytics Methodology to Visually Analyze the impact of Bias and Rebalancing

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/134700
Información del item - Informació de l'item - Item information
Título: A Data Analytics Methodology to Visually Analyze the impact of Bias and Rebalancing
Autor/es: Lavalle, Ana | Maté, Alejandro | Trujillo, Juan | Teruel, Miguel A.
Grupo/s de investigación o GITE: Lucentia
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: Data Analytics | Data Bias | Data Visualization | Model-driven development | Requirements Engineering | Artificial Intelligence
Fecha de publicación: 24-may-2023
Editor: IEEE
Cita bibliográfica: IEEE Access. 2023, 11: 56691-56702. https://doi.org/10.1109/ACCESS.2023.3279732
Resumen: Data Analytics have become a key component of many business processes which influence several aspects of our daily life. Indeed, any misinterpretation or flaw in the outputs of Data Analytics results can cause significant damage, specialy when dealing with one of the most often overlooked issues, namely the unaware use of biased data. When data bias goes unadverted, it warps the meaning of data, having a devastating effect on Data Analytics results. Although it is widely argued that the most common manner to deal with data bias is to rebalance biased datasets, it is not an aseptic transformation, leading to several potentially undesired side-effects that will probably harm the result of data analyses. Therefore, in order to analyze the underlying bias in datasets, in this work we present (i) a comprehensive methodology based on visualization techniques, which assists users in the definition of their analytical requirements to detect and visually represent the data bias automatically helping them to find out whether it is appropriate to artificially rebalance their dataset or not; (ii) a novel metamodel for visually representing bias; (iii) a motivating real-world running example used to analyze the impact of bias in Data Analytics and (iv) an assessment of the improvements introduced by our proposal through a complete real-world case study by using a Fire Department Calls for Service dataset, thus demonstrating that rebalancing datasets is not always the best option. It is crucial to study the context where the decisions are going to be taken. Moreover, it is also important to do a pre-analysis with the aim of knowing the nature of the datasets and how biased they are.
Patrocinador/es: This work has been co-funded by the AETHER-UA project (PID2020-112540RB-C43) funded by Spanish Ministry of Science and Innovation and the BALLADEER (PROMETEO /2021/088) project funded by the Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital (Generalitat Valenciana).
URI: http://hdl.handle.net/10045/134700
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2023.3279732
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Revisión científica: si
Versión del editor: https://doi.org/10.1109/ACCESS.2023.3279732
Aparece en las colecciones:INV - LUCENTIA - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailLavalle_etal_2023_IEEEAccess.pdf1,89 MBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons