Leveraging Machine Learning to Explain the Nature of Written Genres
Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10045/112916
Título: | Leveraging Machine Learning to Explain the Nature of Written Genres |
---|---|
Autor/es: | Vicente, Marta | Miró Maestre, María | Lloret, Elena | Suárez Cueto, Armando |
Grupo/s de investigación o GITE: | Procesamiento del Lenguaje y Sistemas de Información (GPLSI) |
Centro, Departamento o Servicio: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Palabras clave: | Applied computing | Communicative objectives | Discourse analysis | Genre characterization | Human language technologies | Natural language processing |
Área/s de conocimiento: | Lenguajes y Sistemas Informáticos |
Fecha de publicación: | 3-feb-2021 |
Editor: | IEEE |
Cita bibliográfica: | IEEE Access. 2021, 9: 24705-24726. https://doi.org/10.1109/ACCESS.2021.3056927 |
Resumen: | The analysis of discourse and the study of what characterizes it in terms of communicative objectives is essential to most tasks of Natural Language Processing. Consequently, research on textual genres as expressions of such objectives presents an opportunity to enhance both automatic techniques and resources. To conduct an investigation of this kind, it is necessary to have a good understanding of what defines and distinguishes each textual genre. This research presents a data-driven approach to discover and analyze patterns in several textual genres with the aim of identifying and quantifying the differences between them, considering how language is employed and meaning expressed in each particular case. To identify and analyze patterns within genres, a set of linguistic features is first defined, extracted and computed by using several Natural Language Processing tools. Specifically, the analysis is performed over a corpora of documents—containing news, tales and reviews—gathered from different sources to ensure an heterogeneous representation. Once the feature dataset has been generated, machine learning techniques are used to ascertain how and to what extent each of the features should be present in a document depending on its genre. The results show that the set of features defined is relevant for characterizing the different genres. Furthermore, the findings allow us to perform a qualitative analysis of such features, so that their usefulness and suitability is corroborated. The results of the research can benefit natural language discourse processing tasks, which are useful both for understanding and generating language. |
Patrocinador/es: | This work was supported in part by the Ministry of Science and Innovation of Spain for the project “Integer: Intelligent Text Generarion” under Grant RTI2018-094649-B-I00, and in part by the Generalitat Valenciana through project “SIIA: Tecnologias del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible" under Grant PROMETEU/2018/089. |
URI: | http://hdl.handle.net/10045/112916 |
ISSN: | 2169-3536 |
DOI: | 10.1109/ACCESS.2021.3056927 |
Idioma: | eng |
Tipo: | info:eu-repo/semantics/article |
Derechos: | This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ |
Revisión científica: | si |
Versión del editor: | https://doi.org/10.1109/ACCESS.2021.3056927 |
Aparece en las colecciones: | INV - GPLSI - Artículos de Revistas |
Archivos en este ítem:
Archivo | Descripción | Tamaño | Formato | |
---|---|---|---|---|
Vicente_etal_2021_IEEEAccess.pdf | 3,42 MB | Adobe PDF | Abrir Vista previa | |
Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.