Application of Information Retrieval Techniques to Document Filtered Set Generation for External Plagiarism Detection

Micol Ponce, Daniel; Ferrández Escámez, Óscar; Muñoz, Rafael

Application of Information Retrieval Techniques to Document Filtered Set Generation for External Plagiarism Detection

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/85176

Información del item - Informació de l'item - Item information
Title:	Application of Information Retrieval Techniques to Document Filtered Set Generation for External Plagiarism Detection
Other Titles:	Aplicación de Técnicas de Recuperación de Información a la Generación de Conjuntos Filtrados de Documentos para la Detección de Plagios Externos
Authors:	Micol Ponce, Daniel \| Ferrández Escámez, Óscar \| Muñoz, Rafael
Research Group/s:	Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Center, Department or Service:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Keywords:	Conjunto Filtrado \| Recuperación de Información \| Detección de Plagios \| Filtered Set \| Information Retrieval \| Plagiarism Detection
Knowledge Area:	Lenguajes y Sistemas Informáticos
Issue Date:	Oct-2010
Publisher:	Sociedad Española para el Procesamiento del Lenguaje Natural
Citation:	Micol, Daniel; Ferrández, Óscar; Muñoz, Rafael. “Application of Information Retrieval Techniques to Document Filtered Set Generation for External Plagiarism Detection”. Procesamiento del Lenguaje Natural. N. 45 (2010). ISSN 1135-5948
Abstract:	En este artículo presentamos un método para la generación de conjuntos filtrados de documentos empleando técnicas de recuperación de información. Esto se presenta en el contexto de la detección de plagios externos, aunque las técnicas detalladas en este artículo son aplicables a cualquier tipo de documentos o consultas. La producción de conjuntos filtrados, y por ende la limitación del espacio de búsqueda del problema, puede resultar en una gran mejora de rendimiento y es utilizada hoy en día en gran cantidad de aplicaciones reales, como buscadores web. Respecto a la detección de plagios en documentos, la base de datos de textos con los que comparar el candidato sospechoso es potencialmente grande, y por lo tanto es muy recomendable aplicar técnicas de generación de conjuntos filtrados. \| In this paper we present an approach to generate document filtered sets using information retrieval techniques. This is presented in the context of external document plagiarism detection, although the techniques detailed in this paper are applicable to any sort of documents or queries. Producing filtered sets, and hence limiting the problem's search space, can be a tremendous performance improvement and is used today in many real world applications such as web search engines. With regards to document plagiarism detection, the database of documents to match the suspicious candidate against is potentially fairly large, and hence it becomes very recommendable to apply filtered set generation techniques.
Sponsor:	This research has been partially funded by the Spanish Ministry of Science and Innovation (grant TIN2009-13391-C04-01), the Conselleria d'Educació of the Spanish Generalitat Valenciana (grants PROMETEO/2009/119 and ACOMP/2010/286), and the University of Alicante post-doctoral fellowship program funded by Fundación CajaMurcia.
URI:	http://hdl.handle.net/10045/85176
ISSN:	1135-5948
Language:	eng
Type:	info:eu-repo/semantics/article
Peer Review:	si
Appears in Collections:	Procesamiento del Lenguaje Natural - Nº 45 (2010) INV - GPLSI - Artículos de Revistas

Files in This Item:

Files in This Item:
File	Description	Size	Format
PLN_45_277-280.pdf		596,35 kB	Adobe PDF	Open Preview Close preview

See citations in Google Scholar

Show full item record