Three Dirac operators on two architectures with one piece of code and no hassle
Published on:
May 29, 2019
Abstract
A simple minded approach to implement three discretizations of the Dirac operator (staggered, Wilson, Brillouin) on two architectures (KNL and core-i7) is presented. The idea is to use a high-level compiler along with OpenMP parallelization and SIMD pragmas, but to stay away from cache-line optimization and/or assembly-tuning. The implementation is for $N_v$ right-hand-sides, and this extra index is used to fill the SIMD pipeline. On one KNL node single precision performance figures for $N_c=3$, $N_v=12$ read 475 Gflop/s, 345 Gflop/s, and 790 Gflop/s for the three discretization schemes, respectively.
DOI: https://doi.org/10.22323/1.334.0033
How to cite
Metadata are provided both in
article format (very
similar to INSPIRE)
as this helps creating very compact bibliographies which
can be beneficial to authors and readers, and in
proceeding format which
is more detailed and complete.