Model-Driven Development of Web APIs to Access Integrated Tabular Open Data

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/110402
Información del item - Informació de l'item - Item information
Title: Model-Driven Development of Web APIs to Access Integrated Tabular Open Data
Authors: González Mora, César | Tomás, David | Garrigós, Irene | Zubcoff, Jose | Mazón, Jose-Norberto
Research Group/s: Web and Knowledge (WaKe) | Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos | Universidad de Alicante. Departamento de Ciencias del Mar y Biología Aplicada
Keywords: Data integration | Join | Union | Open data | Data access | Web APIs | Word embeddings
Knowledge Area: Lenguajes y Sistemas Informáticos | Estadística e Investigación Operativa
Issue Date: 6-Nov-2020
Publisher: IEEE
Citation: IEEE Access. 2020, 8: 202669-202686. https://doi.org/10.1109/ACCESS.2020.3036462
Abstract: More and more governments around the world are publishing tabular open data, mainly in formats such as CSV or XLS(X). These datasets are mostly individually published, i.e. each publisher exposes its data on the Web without considering potential relationships with other datasets (from its own or from other publishers). As a result, reusing several open datasets together is not a trivial task, thus requiring mechanisms that allow data consumers (as software developers or data scientists) to integrate and access tabular open data published on the Web. In this paper, we propose a model-driven approach to automatically generate Web APIs that homogeneously access multiple integrated tabular open datasets. This work focuses on data that can be integrated by means of join and union operations. As a first step, our approach detects unionable and joinable tabular open data by using a table similarity measure based on word embeddings. Then, an APIfication process is developed to create APIs that access the previously integrated datasets through a single endpoint. A running example is presented throughout the article, as well as a set of experiments for performance evaluation to show the feasibility of our approach.
Sponsor: This work was supported by the National Foundation for Research, Technology and Development of the Spanish Ministry of Economy, Industry and Competitiveness under Project TIN2016-78103-C2-2-R and Project RTI2018-094653-B-C22. The work of César González-Mora was supported by a contract for predoctoral training with the Generalitat Valenciana and the European Social Fund under Grant ACIF/2019/044.
URI: http://hdl.handle.net/10045/110402
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2020.3036462
Language: eng
Type: info:eu-repo/semantics/article
Rights: This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Peer Review: si
Publisher version: https://doi.org/10.1109/ACCESS.2020.3036462
Appears in Collections:INV - GPLSI - Artículos de Revistas
INV - WaKe - Artículos de Revistas

Files in This Item:
Files in This Item:
File Description SizeFormat 
ThumbnailGonzalez-Mora_etal_2020_IEEEAccess.pdf2,51 MBAdobe PDFOpen Preview


This item is licensed under a Creative Commons License Creative Commons