HeadlineStanceChecker: Exploiting summarization to detect headline disinformation

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/118249
Información del item - Informació de l'item - Item information
Title: HeadlineStanceChecker: Exploiting summarization to detect headline disinformation
Authors: Sepúlveda-Torres, Robiert | Vicente, Marta | Saquete Boró, Estela | Lloret, Elena | Palomar, Manuel
Research Group/s: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos | Universidad de Alicante. Instituto Universitario de Investigación Informática
Keywords: Natural Language Processing | Fake news | Misleading headlines | Stance detection | Applied computing | Document management and text processing | Semantic summarization
Knowledge Area: Lenguajes y Sistemas Informáticos
Issue Date: 27-Sep-2021
Publisher: Elsevier
Citation: Journal of Web Semantics. 2021, 71: 100660. https://doi.org/10.1016/j.websem.2021.100660
Abstract: The headline of a news article is designed to succinctly summarize its content, providing the reader with a clear understanding of the news item. Unfortunately, in the post-truth era, headlines are more focused on attracting the reader’s attention for ideological or commercial reasons, thus leading to mis- or disinformation through false or distorted headlines. One way of combating this, although a challenging task, is by determining the relation between the headline and the body text to establish the stance. Hence, to contribute to the detection of mis- and disinformation, this paper proposes an approach—HeadlineStanceChecker—that determines the stance of a headline with respect to the body text to which it is associated. The novelty rests on the use of a two-stage classification architecture that uses summarization techniques to shape the input for both classifiers instead of directly passing the full news body text, thereby reducing the amount of information to be processed while keeping important information. Specifically, summarization is done through Positional Language Models leveraging on semantic resources to identify salient information in the body text that is then compared to its corresponding headline. The results obtained show that our approach achieves 94.31% accuracy for the overall classification and the best FNC-1 relative score compared with the state of the art. It is especially remarkable that the system, which uses only the relevant information provided by the automatic summaries instead of the whole text, is able to classify the different stance categories with very competitive results, especially in the discuss stance between the headline and the news body text. It can be concluded that using automatic extractive summaries as input of our approach together with the two-stage architecture is an appropriate solution to the problem.
Sponsor: This research work has been partially funded by Generalitat Valenciana through project “SIIA: Tecnologías del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” with grant reference PROMETEU/2018/089, by the Spanish Government through project RTI2018-094653-B-C22: “Modelang: Modeling the behavior of digital entities by Human Language Technologies”, as well as by the project RTI2018-094649-B-I00: “INTEGER - Intelligent Text Generation”. Also, this paper is also based upon work from COST Action CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”.
URI: http://hdl.handle.net/10045/118249
ISSN: 1570-8268
DOI: 10.1016/j.websem.2021.100660
Language: eng
Type: info:eu-repo/semantics/article
Rights: © 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer Review: si
Publisher version: https://doi.org/10.1016/j.websem.2021.100660
Appears in Collections:INV - GPLSI - Artículos de Revistas

Files in This Item:
Files in This Item:
File Description SizeFormat 
ThumbnailSepulveda-Torres_etal_2021_JWebSemantics.pdf783,9 kBAdobe PDFOpen Preview


This item is licensed under a Creative Commons License Creative Commons