To what extent does content selection affect surface realization in the context of headline generation?

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/112661
Información del item - Informació de l'item - Item information
Título: To what extent does content selection affect surface realization in the context of headline generation?
Autor/es: Barros, Cristina | Vicente, Marta | Lloret, Elena
Grupo/s de investigación o GITE: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave: Natural language generation | Headline generation | Positional language models | Factored language models | Content selection | Abstractive summarization
Área/s de conocimiento: Lenguajes y Sistemas Informáticos
Fecha de publicación: may-2021
Editor: Elsevier
Cita bibliográfica: Computer Speech & Language. 2021, 67: 101179. https://doi.org/10.1016/j.csl.2020.101179
Resumen: Headline generation is a task where the most important information of a news article is condensed and embodied into a single short sentence. This task is normally addressed by summarization techniques, ideally combining extractive and abstractive methods together with sentence compression or fusion techniques. Although Natural Language Generation (NLG) techniques have not been directly exploited for headline generation, they may provide better mechanisms than summarization techniques to paraphrase the information of a text. Therefore, this paper analyzes and evaluates the effectiveness of NLG techniques for generating headlines. In NLG, both content selection and surface realization are equally important—there is no point in generating text without knowing the topic. Considering this premise, we therefore take HanaNLG—a hybrid surface realization approach—as a basis, and we analyze the effect in the generated text when different content selection strategies are integrated at macroplanning stage. The experiments conducted show that, despite not using any sophisticated summarization method, the proposed approach provided the following benefits: i) it generated a coherent, linguistically structured headline; ii) it obtained results on standard datasets (i.e., DUC 2003 and DUC 2004) that were comparable to several competitive systems, in terms of the content of the generated headline; and, iii) the headlines generated by the whole approach (PLM-HanaNLG) were preferred by human assessors compared to those generated by the best performing system in DUC 2003.
Patrocinador/es: This research work has been partially funded by Generalitat Valenciana through project “SIIA: Tecnologias del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” with grant reference PROMETEU/2018/089, by the Spanish Government through project RTI2018-094653-B-C22: “Modelang: Modeling the behavior of digital entities by Human Language Technologies”, as well as by the project RTI2018-094649-B-I00: “INTEGER - Intelligent Text Generation”. Besides, this paper is also based upon work from COST Action CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”.
URI: http://hdl.handle.net/10045/112661
ISSN: 0885-2308 (Print) | 1095-8363 (Online)
DOI: 10.1016/j.csl.2020.101179
Idioma: eng
Tipo: info:eu-repo/semantics/article
Derechos: © 2020 Elsevier Ltd.
Revisión científica: si
Versión del editor: https://doi.org/10.1016/j.csl.2020.101179
Aparece en las colecciones:INV - GPLSI - Artículos de Revistas

Archivos en este ítem:
Archivos en este ítem:
Archivo Descripción TamañoFormato 
ThumbnailBarros_etal_2021_CompSpeechLang_final.pdfVersión final (acceso restringido)1,11 MBAdobe PDFAbrir    Solicitar una copia
ThumbnailBarros_etal_2021_CompSpeechLang_accepted.pdfAccepted Manuscript (acceso abierto)1,16 MBAdobe PDFAbrir Vista previa


Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.