The aid of machine learning to overcome the classification of real health discharge reports written in Spanish

Pérez Ramírez, Alicia; Casillas Rubio, Arantza; Gojenola Galletebeitia, Koldo; Oronoz Anchordoqui, Maite; Aguirre, Nerea; Amillano, Estibaliz

The aid of machine learning to overcome the classification of real health discharge reports written in Spanish

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/40027

Información del item - Informació de l'item - Item information
Title:	The aid of machine learning to overcome the classification of real health discharge reports written in Spanish
Other Titles:	Aportaciones de las técnicas de aprendizaje automático a la clasificación de partes de alta hospitalarios reales en castellano
Authors:	Pérez Ramírez, Alicia \| Casillas Rubio, Arantza \| Gojenola Galletebeitia, Koldo \| Oronoz Anchordoqui, Maite \| Aguirre, Nerea \| Amillano, Estibaliz
Keywords:	Natural language processing \| Biomedicine \| Machine learning \| Procesamiento del lenguaje natural \| Biomedicina \| Aprendizaje automático
Knowledge Area:	Lenguajes y Sistemas Informáticos
Issue Date:	Sep-2014
Publisher:	Sociedad Española para el Procesamiento del Lenguaje Natural
Citation:	Procesamiento del Lenguaje Natural. 2014, 53: 77-84
Abstract:	Hospitals attached to the Spanish Ministry of Health are currently using the International Classification of Diseases 9 Clinical Modification (ICD9-CM) to classify health discharge records. Nowadays, this work is manually done by experts. This paper tackles the automatic classification of real Discharge Records in Spanish following the ICD9-CM standard. The challenge is that the Discharge Records are written in spontaneous language. We explore several machine learning techniques to deal with the classification problem. Random Forest resulted in the most competitive one, achieving an F-measure of 0.876. \| La red de hospitales que configuran el sistema español de sanidad utiliza la Clasificación Internacional de Enfermedades Modificación Clínica (ICD9-CM) para codificar partes de alta hospitalaria. Hoy en día, este trabajo lo realizan a mano los expertos. Este artículo aborda la problemática de clasificar automáticamente partes reales de alta hospitalaria escritos en español teniendo en cuenta el estándar ICD9-CM. El desafío radica en que los partes hospitalarios están escritos con lenguaje espontáneo. Hemos experimentado con varios sistemas de aprendizaje automático para solventar este problema de clasificación. El algoritmo Random Forest es el más competitivo de los probados, obtiene un F-measure de 0.876.
Sponsor:	This work was partially supported by the European Commission (SEP-210087649), the Spanish Ministry of Science and Innovation (TIN2012-38584-C06-02) and the Industry of the Basque Government (IT344-10).
URI:	http://hdl.handle.net/10045/40027
ISSN:	1135-5948
Language:	eng
Type:	info:eu-repo/semantics/article
Peer Review:	si
Publisher version:	http://journal.sepln.org/sepln/ojs/ojs/index.php/pln
Appears in Collections:	Procesamiento del Lenguaje Natural - Nº 53 (2014)

Files in This Item:

Files in This Item:
File	Description	Size	Format
PLN_53_08.pdf		753,96 kB	Adobe PDF	Open Preview Close preview

See citations in Google Scholar

Show full item record