A semi-automatic annotation methodology that combines Summarization and Human-In-The-Loop to create disinformation detection resources

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/135267
Información del item - Informació de l'item - Item information
Título: A semi-automatic annotation methodology that combines Summarization and Human-In-The-Loop to create disinformation detection resources
Autor/es: Bonet-Jover, Alba | Sepúlveda-Torres, Robiert | Saquete Boró, Estela | Martínez-Barco, Patricio
Grupo/s de investigación o GITE: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: Natural Language Processing | Semi-automatic annotation | Disinformation detection | Summarization | Dataset construction | Human-in-the-loop Artificial Intelligence
Fecha de publicación: 16-jun-2023
Editor: Elsevier
Cita bibliográfica: Knowledge-Based Systems. 2023, 275: 110723. https://doi.org/10.1016/j.knosys.2023.110723
Resumen: Early detection of disinformation is one of the most challenging big-scale problems facing present day society. This is why the application of technologies such as Artificial Intelligence and Natural Language Processing is necessary. The vast majority of Artificial Intelligence approaches require annotated data, and generating these resources is very expensive. This proposal aims to improve the efficiency of the annotation process with a two-level semi-automatic annotation methodology. The first level extracts relevant information through summarization techniques. The second applies a Human-in-the-Loop strategy whereby the labels are pre-annotated by the machine, corrected by the human and reused by the machine to retrain the automatic annotator. After evaluating the system, the average annotation time per news item is reduced by 50%. In addition, a set of experiments on the semi-automatically annotated dataset that is generated are performed so as to demonstrate the effectiveness of the proposal. Although the dataset is annotated in terms of unreliable content, it is applied to the veracity detection task with very promising results (0.95 accuracy in reliability detection and 0.78 in veracity detection).
Patrocinador/es: This research work is funded by MCIN/AEI/ 10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). Also funded by Generalitat Valenciana through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/ 2021/21), and the grant ACIF/2020/177.
URI: http://hdl.handle.net/10045/135267
ISSN: 0950-7051 (Print) | 1872-7409 (Online)
DOI: 10.1016/j.knosys.2023.110723
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: © 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Revisión científica: si
Versión del editor: https://doi.org/10.1016/j.knosys.2023.110723
Aparece en las colecciones:INV - GPLSI - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailBonet-Jover_etal_2023_Knowledge-BasedSyst.pdf1,14 MBAdobe PDFAbrir Vista previa


Este ítem está licenciado bajo Licencia Creative Commons Creative Commons