Boosting Perturbation-Based Iterative Algorithms to Compute the Median String

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/141854
Información del item - Informació de l'item - Item information
Title: Boosting Perturbation-Based Iterative Algorithms to Compute the Median String
Authors: Mirabal, Pedro | Abreu Salas, José Ignacio | Seco, Diego | Pedreira, Óscar | Chávez, Edgar
Research Group/s: Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service: Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos | Universidad de Alicante. Instituto Universitario de Investigación Informática
Keywords: Approximate median string | Algorithm initialization | Half space proximal neighbors
Issue Date: 23-Dec-2021
Publisher: IEEE
Citation: IEEE Access. 2021, 9: 169299-169308. https://doi.org/10.1109/ACCESS.2021.3137767
Abstract: The most competitive heuristics for calculating the median string are those that use perturbation-based iterative algorithms. Given the complexity of this problem, which under many formulations is NP-hard, the computational cost involved in the exact solution is not affordable. In this work, the heuristic algorithms that solve this problem are addressed, emphasizing its initialization and the policy to order possible editing operations. Both factors have a significant weight in the solution of this problem. Initial string selection influences the algorithm’s speed of convergence, as does the criterion chosen to select the modification to be made in each iteration of the algorithm. To obtain the initial string, we use the median of a subset of the original dataset; to obtain this subset, we employ the Half Space Proximal (HSP) test to the median of the dataset. This test provides sufficient diversity within the members of the subset while at the same time fulfilling the centrality criterion. Similarly, we provide an analysis of the stop condition of the algorithm, improving its performance without substantially damaging the quality of the solution. To analyze the results of our experiments, we computed the execution time of each proposed modification of the algorithms, the number of computed editing distances, and the quality of the solution obtained. With these experiments, we empirically validated our proposal.
Sponsor: This work was supported in part by the Comisión Nacional de Investigación Científica y Tecnológica - Programa de Formación de Capital Humano Avanzado (CONICYT-PCHA)/Doctorado Nacional/2014-63140074 through the Ph.D. Scholarship, in part by the European Union's Horizon 2020 under the Marie Sklodowska-Curie under Grant 690941, in part by the Millennium Institute for Foundational Research on Data (IMFD), and in part by the FONDECYT-CONICYT under Grant 1170497. The work of ÓSCAR PEDREIRA was supported in part by the Xunta de Galicia/FEDER-UE refs under Grant CSI ED431G/01 and Grant GRC: ED431C 2017/58, in part by the Office of the Vice President for Research and Postgraduate Studies of the Universidad Católica de Temuco, VIPUCT Project 2020EM-PS-08, and in part by the FEQUIP 2019-INRN-03 of the Universidad Católica de Temuco.
URI: http://hdl.handle.net/10045/141854
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2021.3137767
Language: eng
Type: info:eu-repo/semantics/article
Rights: This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Peer Review: si
Publisher version: https://doi.org/10.1109/ACCESS.2021.3137767
Appears in Collections:INV - GPLSI - Artículos de Revistas
Research funded by the EU

Files in This Item:
Files in This Item:
File Description SizeFormat 
ThumbnailMirabal_etal_2021_IEEEAccess.pdf1,02 MBAdobe PDFOpen Preview


Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.