Team GPLSI at AuTexTification Shared Task: Determining the Authorship of a Text

Martínez-Murillo, Iván; Sepúlveda-Torres, Robiert; Saquete Boró, Estela; Lloret, Elena; Palomar, Manuel

Team GPLSI at AuTexTification Shared Task: Determining the Authorship of a Text

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/137840

Información del item - Informació de l'item - Item information
Título:	Team GPLSI at AuTexTification Shared Task: Determining the Authorship of a Text
Autor/es:	Martínez-Murillo, Iván \| Sepúlveda-Torres, Robiert \| Saquete Boró, Estela \| Lloret, Elena \| Palomar, Manuel
Grupo/s de investigación o GITE:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave:	Human Language Technologies \| Transformers \| Fine-tunning \| Multilinguality \| Ensemble classification \| Transfer Learning
Fecha de publicación:	26-sep-2023
Editor:	CEUR
Cita bibliográfica:	Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, September 26, 2023. CEUR Workshop Proceedings, Vol-3496
Resumen:	AuTexTification is a shared task within the IberLEF workshop which aims to determine whether a text has been generated by an Artificial Intelligence (AI) or a human. The objective of this paper is to report the participation and results of the GPLSI team from the University of Alicante (Spain) in subtask 1: Human or Generated of the AuTexTification challenge for English and Spanish languages. We propose and experiment with different approaches based on Transfer Learning; Ensemble Learning; fine-tuning existing language models, such as RoBERTa or RemBERT; or relying on linguistic features. Our best models for both languages were trained through Transfer Learning techniques, obtaining the 6th and 8th position in the English and Spanish versions of this subtask, respectively. Results obtained in the Spanish-version were close to the top-performing team.
Patrocinador/es:	This research work is part of the R&D projects “CORTEX: Conscious Text Generation” (PID2021-123956OB-I00) and “TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP” (PID2021-122263OB-C22), both funded by MCIN/ AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”, and “CLEAR.TEXT:Enhancing the modernization public sector organizations by deploying Natural Language Processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and “European Union NextGenerationEU/PRTR”. Moreover, it has been also partially funded by the Generalitat Valenciana through the project “NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation with grant reference (CIPROM/2021/21)", and by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231).
URI:	http://hdl.handle.net/10045/137840
ISSN:	1613-0073
Idioma:	eng
Tipo:	info:eu-repo/semantics/conferenceObject
Derechos:	© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Revisión científica:	si
Versión del editor:	https://ceur-ws.org/Vol-3496/
Aparece en las colecciones:	INV - GPLSI - Comunicaciones a Congresos, Conferencias, etc.

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
Martinez-Murillo_etal_IberLEF2023.pdf		1,18 MB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo