Exploiting the Relationship Between Visual and Textual Features in Social Networks for Image Classification with Zero-Shot Deep Learning

Lucas, Luis; Tomás, David; Garcia-Rodriguez, Jose

Exploiting the Relationship Between Visual and Textual Features in Social Networks for Image Classification with Zero-Shot Deep Learning

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/118582

Información del item - Informació de l'item - Item information
Título:	Exploiting the Relationship Between Visual and Textual Features in Social Networks for Image Classification with Zero-Shot Deep Learning
Autor/es:	Lucas, Luis \| Tomás, David \| Garcia-Rodriguez, Jose
Grupo/s de investigación o GITE:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI) \| Arquitecturas Inteligentes Aplicadas (AIA)
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Tecnología Informática y Computación \| Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos \| Universidad de Alicante. Instituto Universitario de Investigación Informática
Palabras clave:	Multimodal classification \| CLIP \| Zero-shot classification \| Unsupervised machine learning \| Social media
Área/s de conocimiento:	Arquitectura y Tecnología de Computadores \| Lenguajes y Sistemas Informáticos
Fecha de publicación:	23-sep-2021
Editor:	Springer, Cham
Cita bibliográfica:	Lucas L., Tomás D., Garcia-Rodriguez J. (2022) Exploiting the Relationship Between Visual and Textual Features in Social Networks for Image Classification with Zero-Shot Deep Learning. In: Sanjurjo González H., Pastor López I., García Bringas P., Quintián H., Corchado E. (eds) 16th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2021). SOCO 2021. Advances in Intelligent Systems and Computing, vol 1401. Springer, Cham. https://doi.org/10.1007/978-3-030-87869-6_35
Resumen:	One of the main issues related to unsupervised machine learning is the cost of processing and extracting useful information from large datasets. In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture in multimodal environments (image and text) from social media. For this purpose, we used the InstaNY100K dataset and proposed a validation approach based on sampling techniques. Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part, and then adding the associated texts as support. The results obtained demonstrated that trained neural networks such as CLIP can be successfully applied to image classification with little fine-tuning, and considering the associated texts to the images can help to improve the accuracy depending on the goal. The results demonstrated what seems to be a promising research direction.
Patrocinador/es:	This work was funded by the University of Alicante UAPOSTCOVID19-10 grant for “Collecting and publishing open data for the revival of the tourism sector post-COVID-19” project.
URI:	http://hdl.handle.net/10045/118582
ISBN:	978-3-030-87868-9 \| 978-3-030-87869-6
DOI:	10.1007/978-3-030-87869-6_35
Idioma:	eng
Tipo:	info:eu-repo/semantics/conferenceObject
Derechos:	© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
Revisión científica:	si
Versión del editor:	https://doi.org/10.1007/978-3-030-87869-6_35
Aparece en las colecciones:	INV - GPLSI - Comunicaciones a Congresos, Conferencias, etc. INV - AIA - Comunicaciones a Congresos, Conferencias, etc.

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
Lucas_etal_2021_SOCO_final.pdf	Versión final (acceso restringido)	698,19 kB	Adobe PDF	Abrir Solicitar una copia
Lucas_etal_2021_SOCO_preprint.pdf	Preprint (acceso abierto)	774,4 kB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Ver citas en Google Académico

Muestra el registro completo