A Large Visual, Qualitative, and Quantitative Dataset for Web Intelligence Applications
Por favor, use este identificador para citar o enlazar este ítem:
http://hdl.handle.net/10045/138004
Título: | A Large Visual, Qualitative, and Quantitative Dataset for Web Intelligence Applications |
---|---|
Autor/es: | Mejia-Escobar, Christian | Cazorla, Miguel | Martinez-Martin, Ester |
Grupo/s de investigación o GITE: | Robótica y Visión Tridimensional (RoViT) |
Centro, Departamento o Servicio: | Universidad de Alicante. Departamento de Ciencia de la Computación e Inteligencia Artificial | Universidad de Alicante. Instituto Universitario de Investigación Informática |
Palabras clave: | Web intelligence applications | Dataset |
Fecha de publicación: | 10-oct-2023 |
Editor: | Hindawi |
Cita bibliográfica: | Computational Intelligence and Neuroscience. 2023, Article ID 1094823. https://doi.org/10.1155/2023/1094823 |
Resumen: | The Web is the communication platform and source of information par excellence. The volume and complexity of its content have grown enormously, with organizing, retrieving, and cleaning Web information becoming a challenge for traditional techniques. Web intelligence is a novel research area to improve Web-based services and applications using artificial intelligence and automatic learning algorithms, for which a large amount of Web-related data are essential. Current datasets are, however, limited and do not combine visual representation and attributes of Web pages. Our work provides a large dataset of 49,438 Web pages, composed of webshots, along with qualitative and quantitative attributes. This dataset covers all the countries in the world and a wide range of topics, such as art, entertainment, economics, business, education, government, news, media, science, and the environment, addressing different cultural characteristics and varied design preferences. We use this dataset to develop three Web Intelligence applications: knowledge extraction on Web design using statistical analysis, recognition of error Web pages using a customized convolutional neural network (CNN) to eliminate invalid pages, and Web categorization based solely on screenshots using a CNN with transfer learning to assist search engines, indexers, and Web directories. |
Patrocinador/es: | This work has been funded by the grant awarded by the Central University of Ecuador through budget certification No. 34 of March 25, 2022 for the development of the research project with code: DOCT-DI-2020-37. |
URI: | http://hdl.handle.net/10045/138004 |
ISSN: | 1687-5265 (Print) | 1687-5273 (Online) |
DOI: | 10.1155/2023/1094823 |
Idioma: | eng |
Tipo: | info:eu-repo/semantics/article |
Derechos: | © 2023 Christian Mejia-Escobar et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Revisión científica: | si |
Versión del editor: | https://doi.org/10.1155/2023/1094823 |
Aparece en las colecciones: | INV - RoViT - Artículos de Revistas |
Archivos en este ítem:
Archivo | Descripción | Tamaño | Formato | |
---|---|---|---|---|
Mejia-Escobar_etal_2023_ComputIntelligNeurosci.pdf | 2,59 MB | Adobe PDF | Abrir Vista previa | |
Todos los documentos en RUA están protegidos por derechos de autor. Algunos derechos reservados.