Integrating advanced vision-language models for context recognition in risks assessment
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10045/150006
Title: | Integrating advanced vision-language models for context recognition in risks assessment |
---|---|
Authors: | Rodriguez-Juan, Javier | Ortiz Pérez, David | Garcia-Rodriguez, Jose | Tomás, David | Nalepa, Grzegorz J. |
Research Group/s: | Arquitecturas Inteligentes Aplicadas (AIA) | Procesamiento del Lenguaje y Sistemas de Información (GPLSI) |
Center, Department or Service: | Universidad de Alicante. Departamento de Tecnología Informática y Computación | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos | Universidad de Alicante. Instituto Universitario de Investigación Informática |
Keywords: | Deep learning | Risks assessment | Object detection | Action recognition | Transformers |
Issue Date: | 7-Dec-2024 |
Publisher: | Elsevier |
Citation: | Neurocomputing. 2025, 618: 129131. https://doi.org/10.1016/j.neucom.2024.129131 |
Abstract: | This study proposes an open-environment, multi-label human risk classification framework, capable of identifying possible risks to which individuals appearing on input video data are exposed. The framework consists of an ensemble of models covering object detection, action recognition, context understanding and text classification tasks. Each model is evaluated separately in the context of home environments, with the overall framework performing well in each evaluation after fine tuning. The models were evaluated using a combination of several datasets, including Charades, ETRI-Activity3D, and custom video question answering and risk datasets. This study exploits the ability of large language models to interpret semantic visual features combined with textual input in order to understand the context in which the person is placed. The framework’s ability to output multiple risks and its cross-domain capabilities make it a powerful tool that can enhance current risk management systems in a variety of scenarios, such as homes, construction sites and industry. |
Sponsor: | We would like to thank CIAICO/2022/132 Consolidated group project “AI4Health” funded by the Valencian government and International Center for Aging Research ICAR funded project “IASISTEM” This work has also been supported by a Valencian government grant for PhD studies, CIACIF/2022/175 and a research initiation grant from the University of Alicante, AII23-12. Finally we would like to thanks the support of the University Institute for Computer Research at the University of Alicante. |
URI: | http://hdl.handle.net/10045/150006 |
ISSN: | 0925-2312 (Print) | 1872-8286 (Online) |
DOI: | 10.1016/j.neucom.2024.129131 |
Language: | eng |
Type: | info:eu-repo/semantics/article |
Rights: | © 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
Peer Review: | si |
Publisher version: | https://doi.org/10.1016/j.neucom.2024.129131 |
Appears in Collections: | INV - AIA - Artículos de Revistas INV - GPLSI - Artículos de Revistas |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
![]() | 1,28 MB | Adobe PDF | Open Preview | |
Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.