Leveraging Machine Learning to Explain the Nature of Written Genres

Vicente, Marta; Miró Maestre, María; Lloret, Elena; Suárez Cueto, Armando

Leveraging Machine Learning to Explain the Nature of Written Genres

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/112916

Información del item - Informació de l'item - Item information
Título:	Leveraging Machine Learning to Explain the Nature of Written Genres
Autor/es:	Vicente, Marta \| Miró Maestre, María \| Lloret, Elena \| Suárez Cueto, Armando
Grupo/s de investigación o GITE:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Palabras clave:	Applied computing \| Communicative objectives \| Discourse analysis \| Genre characterization \| Human language technologies \| Natural language processing
Área/s de conocimiento:	Lenguajes y Sistemas Informáticos
Fecha de publicación:	3-feb-2021
Editor:	IEEE
Cita bibliográfica:	IEEE Access. 2021, 9: 24705-24726. https://doi.org/10.1109/ACCESS.2021.3056927
Resumen:	The analysis of discourse and the study of what characterizes it in terms of communicative objectives is essential to most tasks of Natural Language Processing. Consequently, research on textual genres as expressions of such objectives presents an opportunity to enhance both automatic techniques and resources. To conduct an investigation of this kind, it is necessary to have a good understanding of what defines and distinguishes each textual genre. This research presents a data-driven approach to discover and analyze patterns in several textual genres with the aim of identifying and quantifying the differences between them, considering how language is employed and meaning expressed in each particular case. To identify and analyze patterns within genres, a set of linguistic features is first defined, extracted and computed by using several Natural Language Processing tools. Specifically, the analysis is performed over a corpora of documents—containing news, tales and reviews—gathered from different sources to ensure an heterogeneous representation. Once the feature dataset has been generated, machine learning techniques are used to ascertain how and to what extent each of the features should be present in a document depending on its genre. The results show that the set of features defined is relevant for characterizing the different genres. Furthermore, the findings allow us to perform a qualitative analysis of such features, so that their usefulness and suitability is corroborated. The results of the research can benefit natural language discourse processing tasks, which are useful both for understanding and generating language.
Patrocinador/es:	This work was supported in part by the Ministry of Science and Innovation of Spain for the project “Integer: Intelligent Text Generarion” under Grant RTI2018-094649-B-I00, and in part by the Generalitat Valenciana through project “SIIA: Tecnologias del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible" under Grant PROMETEU/2018/089.
URI:	http://hdl.handle.net/10045/112916
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2021.3056927
Idioma:	eng
Tipo:	info:eu-repo/semantics/article
Derechos:	This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Revisión científica:	si
Versión del editor:	https://doi.org/10.1109/ACCESS.2021.3056927
Aparece en las colecciones:	INV - GPLSI - Artículos de Revistas

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
Vicente_etal_2021_IEEEAccess.pdf		3,42 MB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo