Fig 1.
PIC simulation phases.
Fig 2.
Particles distribution into separate memory buffers.
Fig 3.
Odd-even memory buffers arrangement.
Fig 4.
The pipelined execution architecture created using the Intel FPGA synthesizer.
(CC: Clock Cycle, R: Read-phase, W: Write-phase, Ex: Execute-phase).
Fig 5.
The initial design report with high (II) because of memory dependencies.
Fig 6.
The optimized design report after using several optimizations techniques.
Fig 7.
Loop unrolling optimization technique utilized in the proposed implementation.
Table 1.
The execution times (nano seconds) per particle are measured for four different algorithms [47], (A: first algorithm, B: second algorithm, C: third algorithm: and D: fourth algorithm) using the DE5 FFPGA, GTX 580 GPU, GTX Titan Black and GTX Titan X.
Table 2.
Approximate energy consumed per particles in nano Joules (nJ) for various computation platforms and algorithms (A: first algorithm, B: second algorithm, C: third algorithm: and D: fourth algorithm).