A Large Visual, Qualitative, and Quantitative Dataset for Web Intelligence Applications

Mejia-Escobar, Christian; Cazorla, Miguel; Martinez-Martin, Ester

A Large Visual, Qualitative, and Quantitative Dataset for Web Intelligence Applications

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10045/138004

Información del item - Informació de l'item - Item information
Título:	A Large Visual, Qualitative, and Quantitative Dataset for Web Intelligence Applications
Autor/es:	Mejia-Escobar, Christian \| Cazorla, Miguel \| Martinez-Martin, Ester
Grupo/s de investigación o GITE:	Robótica y Visión Tridimensional (RoViT)
Centro, Departamento o Servicio:	Universidad de Alicante. Departamento de Ciencia de la Computación e Inteligencia Artificial \| Universidad de Alicante. Instituto Universitario de Investigación Informática
Palabras clave:	Web intelligence applications \| Dataset
Fecha de publicación:	10-oct-2023
Editor:	Hindawi
Cita bibliográfica:	Computational Intelligence and Neuroscience. 2023, Article ID 1094823. https://doi.org/10.1155/2023/1094823
Resumen:	The Web is the communication platform and source of information par excellence. The volume and complexity of its content have grown enormously, with organizing, retrieving, and cleaning Web information becoming a challenge for traditional techniques. Web intelligence is a novel research area to improve Web-based services and applications using artificial intelligence and automatic learning algorithms, for which a large amount of Web-related data are essential. Current datasets are, however, limited and do not combine visual representation and attributes of Web pages. Our work provides a large dataset of 49,438 Web pages, composed of webshots, along with qualitative and quantitative attributes. This dataset covers all the countries in the world and a wide range of topics, such as art, entertainment, economics, business, education, government, news, media, science, and the environment, addressing different cultural characteristics and varied design preferences. We use this dataset to develop three Web Intelligence applications: knowledge extraction on Web design using statistical analysis, recognition of error Web pages using a customized convolutional neural network (CNN) to eliminate invalid pages, and Web categorization based solely on screenshots using a CNN with transfer learning to assist search engines, indexers, and Web directories.
Patrocinador/es:	This work has been funded by the grant awarded by the Central University of Ecuador through budget certification No. 34 of March 25, 2022 for the development of the research project with code: DOCT-DI-2020-37.
URI:	http://hdl.handle.net/10045/138004
ISSN:	1687-5265 (Print) \| 1687-5273 (Online)
DOI:	10.1155/2023/1094823
Idioma:	eng
Tipo:	info:eu-repo/semantics/article
Derechos:	© 2023 Christian Mejia-Escobar et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Revisión científica:	si
Versión del editor:	https://doi.org/10.1155/2023/1094823
Aparece en las colecciones:	INV - RoViT - Artículos de Revistas

Archivos en este ítem:

Archivos en este ítem:
Archivo	Descripción	Tamaño	Formato
Mejia-Escobar_etal_2023_ComputIntelligNeurosci.pdf		2,59 MB	Adobe PDF	Abrir Vista previa Cerrar vista previa

Muestra el registro completo