HeadlineStanceChecker: Exploiting summarization to detect headline disinformation

Sepúlveda-Torres, Robiert; Vicente, Marta; Saquete Boró, Estela; Lloret, Elena; Palomar, Manuel

HeadlineStanceChecker: Exploiting summarization to detect headline disinformation

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/118249

Información del item - Informació de l'item - Item information
Título:	HeadlineStanceChecker: Exploiting summarization to detect headline disinformation
Autor/es:	Sepúlveda-Torres, Robiert \| Vicente, Marta \| Saquete Boró, Estela \| Lloret, Elena \| Palomar, Manuel
Grupo/s de investigación o GITE:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos \| Universidad de Alicante. Instituto Universitario de Investigación Informática
Palabras clave:	Natural Language Processing \| Fake news \| Misleading headlines \| Stance detection \| Applied computing \| Document management and text processing \| Semantic summarization
Área/s de conocimiento:	Lenguajes y Sistemas Informáticos
Fecha de publicación:	27-sep-2021
Editor:	Elsevier
Cita bibliográfica:	Journal of Web Semantics. 2021, 71: 100660. https://doi.org/10.1016/j.websem.2021.100660
Resumen:	The headline of a news article is designed to succinctly summarize its content, providing the reader with a clear understanding of the news item. Unfortunately, in the post-truth era, headlines are more focused on attracting the reader’s attention for ideological or commercial reasons, thus leading to mis- or disinformation through false or distorted headlines. One way of combating this, although a challenging task, is by determining the relation between the headline and the body text to establish the stance. Hence, to contribute to the detection of mis- and disinformation, this paper proposes an approach—HeadlineStanceChecker—that determines the stance of a headline with respect to the body text to which it is associated. The novelty rests on the use of a two-stage classification architecture that uses summarization techniques to shape the input for both classifiers instead of directly passing the full news body text, thereby reducing the amount of information to be processed while keeping important information. Specifically, summarization is done through Positional Language Models leveraging on semantic resources to identify salient information in the body text that is then compared to its corresponding headline. The results obtained show that our approach achieves 94.31% accuracy for the overall classification and the best FNC-1 relative score compared with the state of the art. It is especially remarkable that the system, which uses only the relevant information provided by the automatic summaries instead of the whole text, is able to classify the different stance categories with very competitive results, especially in the discuss stance between the headline and the news body text. It can be concluded that using automatic extractive summaries as input of our approach together with the two-stage architecture is an appropriate solution to the problem.
Patrocinador/es:	This research work has been partially funded by Generalitat Valenciana through project “SIIA: Tecnologías del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” with grant reference PROMETEU/2018/089, by the Spanish Government through project RTI2018-094653-B-C22: “Modelang: Modeling the behavior of digital entities by Human Language Technologies”, as well as by the project RTI2018-094649-B-I00: “INTEGER - Intelligent Text Generation”. Also, this paper is also based upon work from COST Action CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”.
URI:	http://hdl.handle.net/10045/118249
ISSN:	1570-8268
DOI:	10.1016/j.websem.2021.100660
Idioma:	eng
Tipo:	info:eu-repo/semantics/article
Derechos:	© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Revisión científica:	si
Versión del editor:	https://doi.org/10.1016/j.websem.2021.100660
Aparece en las colecciones:	INV - GPLSI - Artículos de Revistas

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
Sepulveda-Torres_etal_2021_JWebSemantics.pdf		783,9 kB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo

Este ítem está licenciado bajo Licencia Creative Commons