To Write or Not to Write as a Machine? That’s the Question
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10045/151504
Title: | To Write or Not to Write as a Machine? That’s the Question |
---|---|
Authors: | Sepúlveda-Torres, Robiert | Martínez-Murillo, Iván | Saquete Boró, Estela | Lloret, Elena | Palomar, Manuel |
Research Group/s: | Procesamiento del Lenguaje y Sistemas de Información (GPLSI) |
Center, Department or Service: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Keywords: | Multi-task Learning | Multilingual | Natural Language Processing | Large Language Models | AI-Generated Content |
Issue Date: | 30-Jan-2025 |
Publisher: | IEEE |
Citation: | IEEE Transactions on Big Data. 2025. https://doi.org/10.1109/TBDATA.2025.3536938 |
Abstract: | Considering the potential of tools such as ChatGPT or Gemini to generate texts in a similar way to a human would do, having reliable detectors of AI –AI-generated content (AIGC) is vital to combat the misuse and the surrounding negative consequences of those tools. Most research on AIGC detection has focused on the English language, often overlooking other languages that also have tools capable of generating human-like texts, such is the case of the Spanish language. This paper proposes a novel multilingual and multi-task approach for detecting machine vs. human-generated text. The first task classifies whether a text is written by a machine or by a human, which is the research objective of this paper. The second task consists in detect the language of the text. To evaluate the results of our approach, this study has framed the scope of the AuTexTification shared task and also we have collected a different dataset in Spanish. The experiments carried out in Spanish and English show that our approach is very competitive concerning the state of the art, as well as it can generalize better, thus being able to detect an AI-generated text in multiple domains. |
Sponsor: | The research work is part of the R&D projects: “CORTEX: Conscious Text Generation” (PID2021-123956OB-I00), funded by MCIN/ AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”; “CLEAR.TEXT: Enhancing the modernization public sector organizations by deploying Natural Language Processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and “European Union NextGenerationEU/PRTR”; COOLANG.TRIVIAL: Technological Resources for Intelligent Viral AnaLysis (PID2021-122263OB-C22) funded by MCIN/AEI/10.13039/501100011033/ and by ”ERDF A way of making Europe”; SOCIALFAIRNESS.SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22) funded by MCIN/AEI/10.13039/501100011033/ and by the ”European Union NextGenerationEU/PRTR”. At regional level, this research has been funded by the project “NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation with grant reference (CIPROM/2021/21)” by the Generalitat Valenciana. |
URI: | http://hdl.handle.net/10045/151504 |
ISSN: | 2332-7790 |
DOI: | 10.1109/TBDATA.2025.3536938 |
Language: | eng |
Type: | info:eu-repo/semantics/article |
Rights: | This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ |
Peer Review: | si |
Publisher version: | https://doi.org/10.1109/TBDATA.2025.3536938 |
Appears in Collections: | INV - GPLSI - Artículos de Revistas |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
![]() | 2,06 MB | Adobe PDF | Open Preview | |
Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.