Leveraging Machine Learning to Explain the Nature of Written Genres
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10045/112916
Title: | Leveraging Machine Learning to Explain the Nature of Written Genres |
---|---|
Authors: | Vicente, Marta | Miró Maestre, María | Lloret, Elena | Suárez Cueto, Armando |
Research Group/s: | Procesamiento del Lenguaje y Sistemas de Información (GPLSI) |
Center, Department or Service: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Keywords: | Applied computing | Communicative objectives | Discourse analysis | Genre characterization | Human language technologies | Natural language processing |
Knowledge Area: | Lenguajes y Sistemas Informáticos |
Issue Date: | 3-Feb-2021 |
Publisher: | IEEE |
Citation: | IEEE Access. 2021, 9: 24705-24726. https://doi.org/10.1109/ACCESS.2021.3056927 |
Abstract: | The analysis of discourse and the study of what characterizes it in terms of communicative objectives is essential to most tasks of Natural Language Processing. Consequently, research on textual genres as expressions of such objectives presents an opportunity to enhance both automatic techniques and resources. To conduct an investigation of this kind, it is necessary to have a good understanding of what defines and distinguishes each textual genre. This research presents a data-driven approach to discover and analyze patterns in several textual genres with the aim of identifying and quantifying the differences between them, considering how language is employed and meaning expressed in each particular case. To identify and analyze patterns within genres, a set of linguistic features is first defined, extracted and computed by using several Natural Language Processing tools. Specifically, the analysis is performed over a corpora of documents—containing news, tales and reviews—gathered from different sources to ensure an heterogeneous representation. Once the feature dataset has been generated, machine learning techniques are used to ascertain how and to what extent each of the features should be present in a document depending on its genre. The results show that the set of features defined is relevant for characterizing the different genres. Furthermore, the findings allow us to perform a qualitative analysis of such features, so that their usefulness and suitability is corroborated. The results of the research can benefit natural language discourse processing tasks, which are useful both for understanding and generating language. |
Sponsor: | This work was supported in part by the Ministry of Science and Innovation of Spain for the project “Integer: Intelligent Text Generarion” under Grant RTI2018-094649-B-I00, and in part by the Generalitat Valenciana through project “SIIA: Tecnologias del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible" under Grant PROMETEU/2018/089. |
URI: | http://hdl.handle.net/10045/112916 |
ISSN: | 2169-3536 |
DOI: | 10.1109/ACCESS.2021.3056927 |
Language: | eng |
Type: | info:eu-repo/semantics/article |
Rights: | This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ |
Peer Review: | si |
Publisher version: | https://doi.org/10.1109/ACCESS.2021.3056927 |
Appears in Collections: | INV - GPLSI - Artículos de Revistas |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Vicente_etal_2021_IEEEAccess.pdf | 3,42 MB | Adobe PDF | Open Preview | |
Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.