Moreda, Paloma, Suárez Cueto, Armando, Lloret, Elena, Saquete Boró, Estela, Moreno, Isabel
From Sentences to Documents: Extending Abstract Meaning Representation for Understanding Documents
Procesamiento del Lenguaje Natural. 2018, 60: 61-68. doi:10.26342/2018-60-7
URI: http://hdl.handle.net/10045/74616
DOI: 10.26342/2018-60-7
ISSN: 1135-5948
Abstract: 
The overabundance of information and its heterogeneity requires new ways to access, process and generate knowledge according to the user's needs. To define an appropriate formalism to represent textual information capable to allow machines to perform language understanding and generation will be crucial for achieving these tasks. Abstract Meaning Representation (AMR) is foreseen as a standard knowledge representation that can capture the information encoded in a sentence at various linguistic levels. However, its scope only limits to a single sentence, and it does not benefit from additional semantic information that could help the generation of different types of texts. Therefore, the aim of this paper is to address this limitation by proposing and outlining a method that can extend the information provided by AMR and use it to represent entire documents. Based on our proposal, we will determine a unique, invariant and independent standard text representation, called canonical representation. From it and through a transformational process, we will obtain different text variants that will be appropriate to the users' needs.
La sobreabundancia de información y su heterogeneidad requieren nuevas formas de acceder, procesar y generar conocimiento de acuerdo con las necesidades del usuario. Por ello, definir un formalismo adecuado para representar la información textual capaz de permitir a los ordenadores comprender y generar el lenguaje, es crucial para lograr esta tarea. Abstract Meaning Representation (AMR) es una representación del conocimiento estándar que puede capturar la información codificada en una oración en varios niveles lingüísticos. Sin embargo, su alcance se limita a una sola oración, y no se beneficia de la información semántica adicional que podría ayudar a la generación de diferentes tipos de textos. En este artículo propondremos un método que amplía la información proporcionada por AMR y la utiliza para representar documentos completos. En base a nuestra propuesta, definiremos una representación de texto estándar única, invariable e independiente, llamada representación canónica. A partir de la cual, y mediante un proceso de transformación, obtendremos diferentes variantes de texto que serán apropiadas para las necesidades de los usuarios.
Keywords:AMR, Documents, Canonical representation, User, Documentos, Representación canónica, Usuario
Sociedad Española para el Procesamiento del Lenguaje Natural
info:eu-repo/semantics/article