A multifaceted approach to detect gender biases in Natural Language Generation

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/146242
Información del item - Informació de l'item - Item information
Title: A multifaceted approach to detect gender biases in Natural Language Generation
Authors: Consuegra-Ayala, Juan Pablo | Martínez-Murillo, Iván | Lloret, Elena | Moreda, Paloma | Palomar, Manuel
Research Group/s: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos | Universidad de Alicante. Instituto Universitario de Investigación Informática
Keywords: Natural Language Generation | Gender Bias | Common Sense Resources
Issue Date: 22-Aug-2024
Publisher: Elsevier
Citation: Knowledge-Based Systems. 2024, 303: 112367. https://doi.org/10.1016/j.knosys.2024.112367
Abstract: Recent advances in generative models have skyrocketed the popularity of conversational chatbots and have revolutionized the way we interact with artificial intelligence. At the same time, research has shown that machine learning models can unconsciously reflect and amplify human biases. This is particularly dangerous for generative models given the huge popularity of such technologies. Specifically, a fundamental source of bias of such technologies is the resources on which the models are trained. To address this issue, this paper proposes a methodology to analyze intrinsic gender bias in Natural Language Generation (NLG). Some works already propose metrics and approaches to measure bias in the Natural Language processing field. However, there is a lack of standard methodology to measure gender bias in NLG. Therefore, adapting the Bias Score approach, our proposal involves three sequential stages applied to individual texts to detect intrinsic gender bias on NLG effectively. Those steps are as follows: (i) word scoring; (ii) word filtering; and (iii) generative-word analysis. This methodology is applied to recent datasets and pre-trained models widely used for the generation of text with common sense. In particular, this paper analyzes the potential gender bias in CommonGen and C2 Gen datasets and the SimpleNLG and T5 models. The results show the ability of the proposed methodology to detect gender bias in word distributions, presenting a strong correlation with the words typically associated with a specific gender. Results indicate that both tested datasets are intrinsically gender-biased, and therefore, tested models fine-tuned with those datasets also are.
Sponsor: The research work has been partially funded by the University of Alicante and the University of Havana, and it is part of the R&D projects: “CORTEX: Conscious Text Generation” (PID2021-123956OB-I00), funded by MCIN/ AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”; “CLEAR.TEXT: Enhancing the modernization public sector organizations by deploying Natural Language Processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and “European Union NextGenerationEU/PRTR”; TRIVIAL (PID2021-122263OB-C22) and SOCIALTRUST (PDC2022-133146-C22), funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by ERDF A way of making Europe, by the European Union or by the European Union NextGenerationEU/PRTR. Also, the VIVES: ”Pla de Tecnologies de la Llengua per al valencià” project (2022/TL22/00215334) from the Projecte Estratègic per a la Recuperació i Transformació Econòmica (PERTE). At regional level, this research has been funded by the project “NL4DISMIS: Natural Language Technologies for dealing with disand misinformation with grant reference (CIPROM/2021/21)” by the Generalitat Valenciana. Moreover, it has been also partially funded by the European Commission ICT COST Actions “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231), “Distributed Knowledge Graphs” (CA19134) and “Leading Platform for European Citizens, Industries, Academia, and Policymakers in Media Accessibility” (CA19142).
URI: http://hdl.handle.net/10045/146242
ISSN: 0950-7051 (Print) | 1872-7409 (Online)
DOI: 10.1016/j.knosys.2024.112367
Language: eng
Type: info:eu-repo/semantics/article
Rights: © 2024 Elsevier B.V.
Peer Review: si
Publisher version: https://doi.org/10.1016/j.knosys.2024.112367
Appears in Collections:INV - GPLSI - Artículos de Revistas

Files in This Item:
Files in This Item:
File Description SizeFormat 
ThumbnailConsuegra-Ayala_etal_2024_Knowl-BasedSyst_accepted.pdfEmbargo 24 meses (acceso abierto: 23 ag. 2026)3,71 MBAdobe PDFOpen    Request a copy
ThumbnailConsuegra-Ayala_etal_2024_Knowl-BasedSyst_final.pdfVersión final (acceso restringido)1,08 MBAdobe PDFOpen    Request a copy


Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.