Performance analysis of the FDTD method applied to holographic volume gratings: multi-core CPU versus GPU computing
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10045/26220
Title: | Performance analysis of the FDTD method applied to holographic volume gratings: multi-core CPU versus GPU computing |
---|---|
Authors: | Francés, Jorge | Bleda, Sergio | Neipp, Cristian | Márquez, Andrés | Pascual, Inmaculada | Beléndez, Augusto |
Research Group/s: | Holografía y Procesado Óptico |
Center, Department or Service: | Universidad de Alicante. Departamento de Física, Ingeniería de Sistemas y Teoría de la Señal | Universidad de Alicante. Departamento de Óptica, Farmacología y Anatomía | Universidad de Alicante. Instituto Universitario de Física Aplicada a las Ciencias y las Tecnologías |
Keywords: | CUDA | GPU Computing | Gratings | Holography | OpenMP | SEE | SIMD | Speed up |
Knowledge Area: | Física Aplicada | Óptica |
Date Created: | 11-Apr-2011 |
Issue Date: | 1-Mar-2013 |
Publisher: | Elsevier |
Citation: | FRANCÉS MONLLOR, Jorge, et al. "Performance analysis of the FDTD method applied to holographic volume gratings: multi-core CPU versus GPU computing". Computer Physics Communications. Vol. 184, No. 3 (2013). ISSN 0010-4655, pp. 469-479 |
Abstract: | The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations. |
Sponsor: | This work was supported by the “Ministerio de Economía y Competitividad” of Spain under projects FIS2011-29803-C02-01, FIS2011-29803-C02-02 and by the “Generalitat Valenciana” of Spain under projects PROMETEO/ 2011/021, ISIC/ 2012/013, and GV/ 2012/099. |
URI: | http://hdl.handle.net/10045/26220 |
ISSN: | 0010-4655 (Print) | 1879-2944 (Online) |
DOI: | 10.1016/j.cpc.2012.09.025 |
Language: | eng |
Type: | info:eu-repo/semantics/article |
Peer Review: | si |
Publisher version: | http://dx.doi.org/10.1016/j.cpc.2012.09.025 |
Appears in Collections: | INV - GHPO - Artículos de Revistas INV - Acústica Aplicada - Artículos de Revistas |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
CPC_v184_n3_p469_2013.pdf | Versión final (acceso restringido) | 1,39 MB | Adobe PDF | Open Request a copy |
Items in RUA are protected by copyright, with all rights reserved, unless otherwise indicated.