Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

TL with four llc configurations.
Temporal locality of lbm, libquantum and omnetpp, represented as time interval probability between consecutive accesses (100 in this figure is the forget threshold), with four different llc configurations: Size 16MB-32MB and associativity 1-8. The number of off-chip accesses is included showing the ability of llc to filter cache misses.

More »

Fig 2 — Fig 2.

Example of two groups identified by hmm in mcf benchmark.
The figures represent the off-chip accessed lines through time and the address interval (red lines) that our prefetcher identify and may use to trigger dram cache prefetches. These groups appear simultaneously and are identified, isolated and grouped by our hmm proposal so prefetches may be individualized to each group.

More »

Fig 3 — Fig 3.

Analysis of spatial locality in mcf benchmark.
Spatial locality is modeled using off-chip opkc, so when opkc increases hit ratio in external cache becames critical. In (a) can be seen how starting on cycle 1100-Million opkc increases in all llc configurations. (b) shows how starting on cycle 1100-Million our proposal hmm achieves very good hit ratio when off-chip cache presure increases, beating clearly g/dc prefetch technique in this scenario.

More »

Fig 4 — Fig 4.

Frequency analysis with four llc configurations.
In this figure we use off-chip opkc to describe the types of misses related to llc size and associativity. lbm opkc is independent of llc configurations, libquantum misses reduce with llc size so they are capacity type, omnetpp misses diminish when associativiy increases so they are conflict type and, finally, milc has both types.

More »

Fig 5 — Fig 5.

Schematic architecture.
Proposed virtual address (va) based architecture for off-chip prefetching. The use of va allows the prefetcher to exploit all the locality information at the cost of increase memory and energy use to store tags and asid information. The number of va to pa translations is greatly reduced due to the positive effect of the cache hierarchy in reducing off-chip accesses. Off-chip prefetchers move data/instructions in advance from nvm-ram to dram cache.

More »

Fig 6 — Fig 6.

Symbolic representation of Hidden Markov Model (HMM).

More »

Fig 7 — Fig 7.

Example of astar spatial locality.
Algorithmic complexity in astar with multiple simultaneous groups that are isolated and identified by our hmm proposal. In (c) the different groups are represented by colors.

More »

Fig 8 — Fig 8.

Main areas of spatial locality in astar.
Example of four groups identified in astar by hmm. This information is feeded to the prefetcher to get intervals of addresses with high probability of future use.

More »

Fig 9 — Fig 9.

Main areas of spatial locality in astar with recognized intervals.
Based on the group identification, hmm gets address intervals with high probability of use, which are shown in this figure between the red lines.

More »

Fig 10 — Fig 10.

Prefetch circuit.
Schematic description of the prefetcher on-chip implementation. llc miss address (q) is used to identify a group and generate the prefetch address interval, or to create a new group based on the nearest current group. This implementation of hmm allows for precise identification of the simultanenous off-chip memory groups accessed by the different processes running in a multicore chip.

More »

Table 1 — Table 1.

Evaluation parameters.

More »

Table 2 — Table 2.

Benchmarks.

More »

Fig 11 — Fig 11.

Hit ratio and overhead of base, hmm, g/dc and g/ac in a single core architecture.
The hit ratio for the base experiment is represented by a horizontal line (0-6.3% for all benchmarks). The last plot is the geometric mean of all benchmarks.

More »

Fig 12 — Fig 12.

Hit ratio and overhead of base, hmm, g/dc and g/ac in a 9 core architecture.
The hit ratio of the base experiment is shown in the box. Each mix consists of nine benchmarks.

More »

Fig 13 — Fig 13.

Hit ratio and overhead of base, hmm and g/dc in a 16 core architecture and in a multiprogrammed 4 core architecture.
The hit ratio of the base experiment is shown in the box.

More »