Ensembles for clinical entity extraction

Weegar, Rebecka; Pérez Ramírez, Alicia; Dalianis, Hercules; Gojenola Galletebeitia, Koldo; Casillas Rubio, Arantza; Oronoz Anchordoqui, Maite

Ensembles for clinical entity extraction

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/74611

Información del item - Informació de l'item - Item information
Title:	Ensembles for clinical entity extraction
Other Titles:	Agrupaciones para la extracción de entidades clínicas
Authors:	Weegar, Rebecka \| Pérez Ramírez, Alicia \| Dalianis, Hercules \| Gojenola Galletebeitia, Koldo \| Casillas Rubio, Arantza \| Oronoz Anchordoqui, Maite
Keywords:	Clinical entity recognition \| Ensembles \| Swedish \| Spanish \| Reconocimiento de entidades médicas \| Agrupaciones \| Sueco \| Castellano
Knowledge Area:	Lenguajes y Sistemas Informáticos
Issue Date:	Mar-2018
Publisher:	Sociedad Española para el Procesamiento del Lenguaje Natural
Citation:	Procesamiento del Lenguaje Natural. 2018, 60: 13-20. doi:10.26342/2018-60-1
Abstract:	Health records are a valuable source of clinical knowledge and Natural Language Processing techniques have previously been applied to the text in health records for a number of applications. Often, a first step in clinical text processing is clinical entity recognition; identifying, for example, drugs, disorders, and body parts in clinical text. However, most of this work has focused on records in English. Therefore, this work aims to improve clinical entity recognition for languages other than English by comparing the same methods on two different languages, specifically by employing ensemble methods. Models were created for Spanish and Swedish health records using SVM, Perceptron, and CRF and four different feature sets, including unsupervised features. Finally, the models were combined in ensembles. Weighted voting was applied according to the models individual F-scores. In conclusion, the ensembles improved the overall performance for Spanish and the precision for Swedish. \| Los informes médicos son una valiosa fuente de conocimiento clínico. Las técnicas de Procesamiento del Lenguaje Natural han sido aplicadas al procesamiento de informes médicos para diversas aplicaciones. Generalmente un primer paso es la detección de entidades médicas: identificar medicamentos, enfermedades y partes del cuerpo. Sin embargo, la mayoría de los trabajos se han desarrollado para informes en Inglés. El objetivo de este trabajo es mejorar el reconocimiento de entidades médicas para otras lenguas diferentes a Inglés, comparando los mismos métodos en dos lenguas y utilizando agrupaciones de modelos. Los modelos han sido creados para informes médicos en Español y Sueco utilizando SVM, Perceptron, CRF y cuatro conjuntos diferentes de atributos, incluyendo atributos no supervisados. Para el modelo combinado se ha aplicado votación ponderada teniendo en cuenta la F-measure individual. En conclusión, el modelo combinado mejora el rendimiento general y para posibles mejoras debemos investigar métodos más sofisticados de agrupación.
Sponsor:	This work has been partially funded by the Spanish ministry (PROSAMED: TIN2016-77820-C3-1-R, TADEEP: TIN2015-70214-P), the Basque Government (DETEAMI: 2014111003), the University of the Basque Country UPV-EHU (MOV17/14) and the Nordic Center of Excellence in Health-Related e-Sciences (NIASC).
URI:	http://hdl.handle.net/10045/74611
ISSN:	1135-5948
DOI:	10.26342/2018-60-1
Language:	eng
Type:	info:eu-repo/semantics/article
Rights:	© Sociedad Española para el Procesamiento del Lenguaje Natural
Peer Review:	si
Publisher version:	https://doi.org/10.26342/2018-60-1
Appears in Collections:	Procesamiento del Lenguaje Natural - Nº 60 (2018)

Files in This Item:

Files in This Item:
File	Description	Size	Format
PLN_60_01.pdf		713,38 kB	Adobe PDF	Open Preview Close preview

See citations in Google Scholar

Show full item record