A semi-automatic annotation methodology that combines Summarization and Human-In-The-Loop to create disinformation detection resources
Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10045/135267
Título: | A semi-automatic annotation methodology that combines Summarization and Human-In-The-Loop to create disinformation detection resources |
---|---|
Autor/es: | Bonet-Jover, Alba | Sepúlveda-Torres, Robiert | Saquete Boró, Estela | Martínez-Barco, Patricio |
Grupo/s de investigación o GITE: | Procesamiento del Lenguaje y Sistemas de Información (GPLSI) |
Centro, Departamento o Servicio: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Palabras clave: | Natural Language Processing | Semi-automatic annotation | Disinformation detection | Summarization | Dataset construction | Human-in-the-loop Artificial Intelligence |
Fecha de publicación: | 16-jun-2023 |
Editor: | Elsevier |
Cita bibliográfica: | Knowledge-Based Systems. 2023, 275: 110723. https://doi.org/10.1016/j.knosys.2023.110723 |
Resumen: | Early detection of disinformation is one of the most challenging big-scale problems facing present day society. This is why the application of technologies such as Artificial Intelligence and Natural Language Processing is necessary. The vast majority of Artificial Intelligence approaches require annotated data, and generating these resources is very expensive. This proposal aims to improve the efficiency of the annotation process with a two-level semi-automatic annotation methodology. The first level extracts relevant information through summarization techniques. The second applies a Human-in-the-Loop strategy whereby the labels are pre-annotated by the machine, corrected by the human and reused by the machine to retrain the automatic annotator. After evaluating the system, the average annotation time per news item is reduced by 50%. In addition, a set of experiments on the semi-automatically annotated dataset that is generated are performed so as to demonstrate the effectiveness of the proposal. Although the dataset is annotated in terms of unreliable content, it is applied to the veracity detection task with very promising results (0.95 accuracy in reliability detection and 0.78 in veracity detection). |
Patrocinador/es: | This research work is funded by MCIN/AEI/ 10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). Also funded by Generalitat Valenciana through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/ 2021/21), and the grant ACIF/2020/177. |
URI: | http://hdl.handle.net/10045/135267 |
ISSN: | 0950-7051 (Print) | 1872-7409 (Online) |
DOI: | 10.1016/j.knosys.2023.110723 |
Idioma: | eng |
Tipo: | info:eu-repo/semantics/article |
Derechos: | © 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
Revisión científica: | si |
Versión del editor: | https://doi.org/10.1016/j.knosys.2023.110723 |
Aparece en las colecciones: | INV - GPLSI - Artículos de Revistas |
Archivos en este ítem:
Archivo | Descripción | Tamaño | Formato | |
---|---|---|---|---|
Bonet-Jover_etal_2023_Knowledge-BasedSyst.pdf | 1,14 MB | Adobe PDF | Abrir Vista previa | |
Este ítem está licenciado bajo Licencia Creative Commons