

In brief, the FDTD method operates upon symmetric Maxwell’s equations. Numerical experiments described in this work have been conducted on three different architectures: Intel’s Knights Landing and Skylake, and ARM’s Cavium ThunderX2.įinally, we make a systematic study of its performance and apply it to a challenging problem consisting in the simulation, in a high performance computing (HPC) cluster, of the exposure of a human phantom to a statistically random EM environment. Moreover, the results presented in this paper provide a quantitative estimate of the bandwidth threshold as a function of computational workload on different shared- as well as distributed-memory systems.

A noteworthy finding of this analysis shows that novel GPU-like processor architectures such as Intel’s Knights Landing help to alleviate memory bandwidth issues at zero cost of re-programming tasks opposed to native GPU architectures, which require considerable programming effort. This work presents an exhaustive analysis of the effects of different memory bandwidths on the scalability and speed of a parallel FDTD algorithm with focus on: (i) memory-to-CPU for single-node performance of shared-memory parallelization with OpenMP and (ii) node-to-node distributed-memory parallelization with MPI. Therefore, while FDTD will scales almost linearly in multi-node distributed-memory clusters, its speeding-up saturates quickly inside each single-node shared-memory machine.
NUANCE OMNIPAGE ULTIMATE 19.1 DOWNLOAD FULL
This is the main performance-limiting factor, since it hinders making use of the CPU at full speed. As all CPU cores on a computing node share a common addressable memory, this data movement creates a bottleneck in shared-memory access. Computationally, this means that all the unknowns must be moved between the RAM and the CPU at each time iteration. However, this explicitness also implies that making the next snapshot of the system requires the processing of all the electromagnetic field unknowns in the entire domain. Additionally, FDTD features good cache locality, which allows taking advantage of SIMD parallelization implemented in SSE and AVX instruction sets. In FDTD, the field unknowns are ordered spatially and are updated with their closest neighbors.Īs a consequence of this explicitness, the algorithm can be easily parallelized to exploit the benefits of shared- as well as distributed-memory architectures. In essence, the FDTD method is an explicit marching-on-in-time algorithm based on the staggered space-time discretization of Maxwell’s curl equations. Among the broad family of different numerical techniques found in CEM, the finite-difference time-domain (FDTD) method is one of the most widely employed, being applied in many fields, including bioelectromagnetics, photonics, electromagnetic compatibility (EMC). Computational electromagnetics (CEM) has become an essential discipline which allows the analysis of large and complex engineering problems.
