OntoLM: Integrating Knowledge Bases and Language Models for classification in the medical domain
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10045/142188
Title: | OntoLM: Integrating Knowledge Bases and Language Models for classification in the medical domain |
---|---|
Other Titles: | OntoLM: Integrando bases de conocimiento y modelos de lenguaje para clasificación en dominio médico |
Authors: | Yáñez Romero, Fabio | Montoyo, Andres | Muñoz, Rafael | Gutiérrez, Yoan | Suárez Cueto, Armando |
Research Group/s: | Procesamiento del Lenguaje y Sistemas de Información (GPLSI) |
Center, Department or Service: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos |
Keywords: | External Knowledge | Ontologies | Large Language Models | Graph Neural Networks | Conocimiento Externo | Ontologías | Grandes Modelos de lenguaje | Redes Neuronales de Grafos |
Issue Date: | Mar-2024 |
Publisher: | Sociedad Española para el Procesamiento del Lenguaje Natural |
Citation: | Procesamiento del Lenguaje Natural. 2024, 72: 137-148. https://doi.org/10.26342/2024-72-10 |
Abstract: | Large language models have shown impressive performance in Natural Language Processing tasks, but their black box characteristics render the explainability of the model’s decision difficult to achieve and the integration of semantic knowledge. There has been a growing interest in combining external knowledge sources with language models to address these drawbacks. This paper, OntoLM, proposes a novel architecture combining an ontology with a pre-trained language model to classify biomedical entities in text. This approach involves constructing and processing graphs from ontologies and then using a graph neural network to contextualize each entity. Next, the language model and the graph neural network output are combined into a final classifier. Results show that OntoLM improves the classification of entities in medical texts using a set of categories obtained from the Unified Medical Language System. We can create more traceable natural language processing architectures using ontology graphs and graph neural networks. | Los grandes modelos de lenguaje han mostrado un rendimiento impresionante en tareas de Procesamiento del Lenguaje Natural, pero su condición de caja negra hace difícil explicar las decisiones del modelo e integrar conocimiento semántico. Existe un interés creciente en combinar fuentes de conocimiento externas con LLMs para solventar estos inconvenientes. En este artículo, proponemos OntoLM, una arquitectura novedosa que combina una ontología con un modelo de lenguaje pre-entrenado para clasificar entidades biomédicas en texto. El enfoque propuesto consiste en construir y procesar grafos provenientes de una ontología utilizando una red neuronal de grafos para contextualizar cada entidad. A continuación, combinamos los resultados del modelo de lenguaje y la red neuronal de grafos en un clasificador final. Los resultados muestran que OntoLM mejora la clasificación de entidades en textos médicos utilizando un conjunto de categorías obtenidas de Unified Medical Language System. Utilizando grafos de ontologías y redes neuronales de grafos podemos crear arquitecturas de procesamiento de lenguaje natural más rastreables. |
Sponsor: | This research has been funded by the University of Alicante, the Spanish Ministry of Science and Innovation, the Generalitat Valenciana, and the European Regional Development Fund (ERDF) through the following funding: At the national level, the following projects were granted: Coolang (PID2021-122263OB-C22); CORTEX (PID2021-123956OB-I00); CLEARTEXT (TED2021-130707B-I00); and SOCIALTRUST (PDC2022-133146-C22), funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by ERDF A way of making Europe, by the European Union or by the European Union NextGenerationEU/PRTR. At regional level, the Generalitat Valenciana (Conselleria d’Educacio, Investigacio, Cultura i Esport), granted funding for NL4DISMIS (CIPROM/2021/21). |
URI: | http://hdl.handle.net/10045/142188 |
ISSN: | 1135-5948 |
DOI: | 10.26342/2024-72-10 |
Language: | eng |
Type: | info:eu-repo/semantics/article |
Rights: | © Sociedad Española para el Procesamiento del Lenguaje Natural. Distribuido bajo Licencia Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 |
Peer Review: | si |
Publisher version: | https://doi.org/10.26342/2024-72-10 |
Appears in Collections: | INV - GPLSI - Artículos de Revistas |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
![]() | 950,32 kB | Adobe PDF | Open Preview | |
Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.