Leveraging Machine Learning to Explain the Nature of Written Genres

Vicente, Marta; Miró Maestre, María; Lloret, Elena; Suárez Cueto, Armando

Leveraging Machine Learning to Explain the Nature of Written Genres

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/112916

Información del item - Informació de l'item - Item information
Title:	Leveraging Machine Learning to Explain the Nature of Written Genres
Authors:	Vicente, Marta \| Miró Maestre, María \| Lloret, Elena \| Suárez Cueto, Armando
Research Group/s:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Keywords:	Applied computing \| Communicative objectives \| Discourse analysis \| Genre characterization \| Human language technologies \| Natural language processing
Knowledge Area:	Lenguajes y Sistemas Informáticos
Issue Date:	3-Feb-2021
Publisher:	IEEE
Citation:	IEEE Access. 2021, 9: 24705-24726. https://doi.org/10.1109/ACCESS.2021.3056927
Abstract:	The analysis of discourse and the study of what characterizes it in terms of communicative objectives is essential to most tasks of Natural Language Processing. Consequently, research on textual genres as expressions of such objectives presents an opportunity to enhance both automatic techniques and resources. To conduct an investigation of this kind, it is necessary to have a good understanding of what defines and distinguishes each textual genre. This research presents a data-driven approach to discover and analyze patterns in several textual genres with the aim of identifying and quantifying the differences between them, considering how language is employed and meaning expressed in each particular case. To identify and analyze patterns within genres, a set of linguistic features is first defined, extracted and computed by using several Natural Language Processing tools. Specifically, the analysis is performed over a corpora of documents—containing news, tales and reviews—gathered from different sources to ensure an heterogeneous representation. Once the feature dataset has been generated, machine learning techniques are used to ascertain how and to what extent each of the features should be present in a document depending on its genre. The results show that the set of features defined is relevant for characterizing the different genres. Furthermore, the findings allow us to perform a qualitative analysis of such features, so that their usefulness and suitability is corroborated. The results of the research can benefit natural language discourse processing tasks, which are useful both for understanding and generating language.
Sponsor:	This work was supported in part by the Ministry of Science and Innovation of Spain for the project “Integer: Intelligent Text Generarion” under Grant RTI2018-094649-B-I00, and in part by the Generalitat Valenciana through project “SIIA: Tecnologias del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible" under Grant PROMETEU/2018/089.
URI:	http://hdl.handle.net/10045/112916
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2021.3056927
Language:	eng
Type:	info:eu-repo/semantics/article
Rights:	This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Peer Review:	si
Publisher version:	https://doi.org/10.1109/ACCESS.2021.3056927
Appears in Collections:	INV - GPLSI - Artículos de Revistas

Files in This Item:

Files in This Item:
File	Description	Size	Format
Vicente_etal_2021_IEEEAccess.pdf		3,42 MB	Adobe PDF	Open Preview Close preview

See citations in Google Scholar

Show full item record