Performance analysis of the FDTD method applied to holographic volume gratings: multi-core CPU versus GPU computing

Please use this identifier to cite or link to this item: http://hdl.handle.net/10045/26220
Información del item - Informació de l'item - Item information
Title: Performance analysis of the FDTD method applied to holographic volume gratings: multi-core CPU versus GPU computing
Authors: Francés, Jorge | Bleda, Sergio | Neipp, Cristian | Márquez, Andrés | Pascual, Inmaculada | Beléndez, Augusto
Research Group/s: Holografía y Procesado Óptico
Center, Department or Service: Universidad de Alicante. Departamento de Física, Ingeniería de Sistemas y Teoría de la Señal | Universidad de Alicante. Departamento de Óptica, Farmacología y Anatomía | Universidad de Alicante. Instituto Universitario de Física Aplicada a las Ciencias y las Tecnologías
Keywords: CUDA | GPU Computing | Gratings | Holography | OpenMP | SEE | SIMD | Speed up
Knowledge Area: Física Aplicada | Óptica
Date Created: 11-Apr-2011
Issue Date: 1-Mar-2013
Publisher: Elsevier
Citation: FRANCÉS MONLLOR, Jorge, et al. "Performance analysis of the FDTD method applied to holographic volume gratings: multi-core CPU versus GPU computing". Computer Physics Communications. Vol. 184, No. 3 (2013). ISSN 0010-4655, pp. 469-479
Abstract: The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations.
Sponsor: This work was supported by the “Ministerio de Economía y Competitividad” of Spain under projects FIS2011-29803-C02-01, FIS2011-29803-C02-02 and by the “Generalitat Valenciana” of Spain under projects PROMETEO/ 2011/021, ISIC/ 2012/013, and GV/ 2012/099.
URI: http://hdl.handle.net/10045/26220
ISSN: 0010-4655 (Print) | 1879-2944 (Online)
DOI: 10.1016/j.cpc.2012.09.025
Language: eng
Type: info:eu-repo/semantics/article
Peer Review: si
Publisher version: http://dx.doi.org/10.1016/j.cpc.2012.09.025
Appears in Collections:INV - GHPO - Artículos de Revistas
INV - Acústica Aplicada - Artículos de Revistas

Files in This Item:
Files in This Item:
File Description SizeFormat 
ThumbnailCPC_v184_n3_p469_2013.pdfVersión final (acceso restringido)1,39 MBAdobe PDFOpen    Request a copy


Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.