Indoor Risks Assessment Using Video Captioning

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/135404
Información del item - Informació de l'item - Item information
Title: Indoor Risks Assessment Using Video Captioning
Authors: Rodríguez Juan, Javier
Research Director: Garcia-Rodriguez, Jose | Tomás, David
Center, Department or Service: Universidad de Alicante. Departamento de Tecnología Informática y Computación
Keywords: Video Captioning | Deep Learning | Indoor Environment | Research Engineering
Issue Date: 22-Jun-2023
Date of defense: 14-Jun-2023
Abstract: The progress of automatic scene analysis techniques for homes and the development of ambient assisted living systems is vital to help different kinds of people, such as the elderly or visually impaired individuals, who require special care in their daily lives. In this bachelor’s thesis we are going to develop a study of the most promising used techniques inside the Video Captioning and scene analysis scope and we will propose a Deep Learning pipeline aimed at performing Risks Assessment on input videos using the knowledge acquired during the study. This can be potentially applied to create systems aimed to help aforementioned people. Moreover, we will propose different evaluation architectures to test each of the stages involved in the Risks Assessment pipeline in order to observe its effectiveness and limitations. In this work we will introduce SwinBERT, a powerful and recent Video Captioning model, complemented with YOLOv7, a model aimed at the Object Recognition task, for the analysis of home scenes. Moreover, we will use various lexical transformations and linguistic models to maximize the semantic similarity of descriptions generated and objects detected, aligning them with the annotations provided by the datasets used. This approach will allow us to achieve more accurate matches from a human perspective. In the experiments we will outstand the usage of the large-scale dataset Charades, which was created with the goal of producing a vast dataset designed for the visual analysis, while preserving the naturalness and spontaneity of household and daily activities.
URI: http://hdl.handle.net/10045/135404
Language: eng
Type: info:eu-repo/semantics/bachelorThesis
Rights: Licencia Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0
Appears in Collections:Grado en Ingeniería Informática - Trabajos Fin de Grado

Files in This Item:


Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.