A semi-automatic annotation methodology that combines Summarization and Human-In-The-Loop to create disinformation detection resources

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/135267
Información del item - Informació de l'item - Item information
Title: A semi-automatic annotation methodology that combines Summarization and Human-In-The-Loop to create disinformation detection resources
Authors: Bonet-Jover, Alba | Sepúlveda-Torres, Robiert | Saquete Boró, Estela | Martínez-Barco, Patricio
Research Group/s: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Keywords: Natural Language Processing | Semi-automatic annotation | Disinformation detection | Summarization | Dataset construction | Human-in-the-loop Artificial Intelligence
Issue Date: 16-Jun-2023
Publisher: Elsevier
Citation: Knowledge-Based Systems. 2023, 275: 110723. https://doi.org/10.1016/j.knosys.2023.110723
Abstract: Early detection of disinformation is one of the most challenging big-scale problems facing present day society. This is why the application of technologies such as Artificial Intelligence and Natural Language Processing is necessary. The vast majority of Artificial Intelligence approaches require annotated data, and generating these resources is very expensive. This proposal aims to improve the efficiency of the annotation process with a two-level semi-automatic annotation methodology. The first level extracts relevant information through summarization techniques. The second applies a Human-in-the-Loop strategy whereby the labels are pre-annotated by the machine, corrected by the human and reused by the machine to retrain the automatic annotator. After evaluating the system, the average annotation time per news item is reduced by 50%. In addition, a set of experiments on the semi-automatically annotated dataset that is generated are performed so as to demonstrate the effectiveness of the proposal. Although the dataset is annotated in terms of unreliable content, it is applied to the veracity detection task with very promising results (0.95 accuracy in reliability detection and 0.78 in veracity detection).
Sponsor: This research work is funded by MCIN/AEI/ 10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). Also funded by Generalitat Valenciana through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/ 2021/21), and the grant ACIF/2020/177.
URI: http://hdl.handle.net/10045/135267
ISSN: 0950-7051 (Print) | 1872-7409 (Online)
DOI: 10.1016/j.knosys.2023.110723
Language: eng
Type: info:eu-repo/semantics/article
Rights: © 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Peer Review: si
Publisher version: https://doi.org/10.1016/j.knosys.2023.110723
Appears in Collections:INV - GPLSI - Artículos de Revistas

Files in This Item:
Files in This Item:
File Description SizeFormat 
ThumbnailBonet-Jover_etal_2023_Knowledge-BasedSyst.pdf1,14 MBAdobe PDFOpen Preview


This item is licensed under a Creative Commons License Creative Commons