Candela, Gustavo, Sáez Fernández, María Dolores, Escobar Esteban, María Pilar, Marco Such, Manuel A benchmark of Spanish language datasets for computationally driven research Journal of Information Science. 2023, 49(6): 1451-1461. https://doi.org/10.1177/01655515211060530 URI: http://hdl.handle.net/10045/120141 DOI: 10.1177/01655515211060530 ISSN: 0165-5515 (Print) Abstract: In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable to computational use. Standardised benchmarks can be useful to broaden the scope of machine-actionable collections and to promote cultural and linguistic diversity. In this article, we propose a methodology to select datasets for computationally driven research applied to Spanish text corpora. This work seeks to encourage Spanish and Latin American institutions to publish machine-actionable collections based on best practices and avoiding common mistakes. Keywords:Collections as data, Data quality metrics, Digital libraries, GLAM labs SAGE Publications info:eu-repo/semantics/article