To what extent does content selection affect surface realization in the context of headline generation?
Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10045/112661
Título: | To what extent does content selection affect surface realization in the context of headline generation? |
---|---|
Autor/es: | Barros, Cristina | Vicente, Marta | Lloret, Elena |
Grupo/s de investigación o GITE: | Procesamiento del Lenguaje y Sistemas de Información (GPLSI) |
Centro, Departamento o Servicio: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Palabras clave: | Natural language generation | Headline generation | Positional language models | Factored language models | Content selection | Abstractive summarization |
Área/s de conocimiento: | Lenguajes y Sistemas Informáticos |
Fecha de publicación: | may-2021 |
Editor: | Elsevier |
Cita bibliográfica: | Computer Speech & Language. 2021, 67: 101179. https://doi.org/10.1016/j.csl.2020.101179 |
Resumen: | Headline generation is a task where the most important information of a news article is condensed and embodied into a single short sentence. This task is normally addressed by summarization techniques, ideally combining extractive and abstractive methods together with sentence compression or fusion techniques. Although Natural Language Generation (NLG) techniques have not been directly exploited for headline generation, they may provide better mechanisms than summarization techniques to paraphrase the information of a text. Therefore, this paper analyzes and evaluates the effectiveness of NLG techniques for generating headlines. In NLG, both content selection and surface realization are equally important—there is no point in generating text without knowing the topic. Considering this premise, we therefore take HanaNLG—a hybrid surface realization approach—as a basis, and we analyze the effect in the generated text when different content selection strategies are integrated at macroplanning stage. The experiments conducted show that, despite not using any sophisticated summarization method, the proposed approach provided the following benefits: i) it generated a coherent, linguistically structured headline; ii) it obtained results on standard datasets (i.e., DUC 2003 and DUC 2004) that were comparable to several competitive systems, in terms of the content of the generated headline; and, iii) the headlines generated by the whole approach (PLM-HanaNLG) were preferred by human assessors compared to those generated by the best performing system in DUC 2003. |
Patrocinador/es: | This research work has been partially funded by Generalitat Valenciana through project “SIIA: Tecnologias del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” with grant reference PROMETEU/2018/089, by the Spanish Government through project RTI2018-094653-B-C22: “Modelang: Modeling the behavior of digital entities by Human Language Technologies”, as well as by the project RTI2018-094649-B-I00: “INTEGER - Intelligent Text Generation”. Besides, this paper is also based upon work from COST Action CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”. |
URI: | http://hdl.handle.net/10045/112661 |
ISSN: | 0885-2308 (Print) | 1095-8363 (Online) |
DOI: | 10.1016/j.csl.2020.101179 |
Idioma: | eng |
Tipo: | info:eu-repo/semantics/article |
Derechos: | © 2020 Elsevier Ltd. |
Revisión científica: | si |
Versión del editor: | https://doi.org/10.1016/j.csl.2020.101179 |
Aparece en las colecciones: | INV - GPLSI - Artículos de Revistas |
Archivos en este ítem:
Archivo | Descripción | Tamaño | Formato | |
---|---|---|---|---|
Barros_etal_2021_CompSpeechLang_final.pdf | Versión final (acceso restringido) | 1,11 MB | Adobe PDF | Abrir Solicitar una copia |
Barros_etal_2021_CompSpeechLang_accepted.pdf | Accepted Manuscript (acceso abierto) | 1,16 MB | Adobe PDF | Abrir Vista previa |
Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.