Slide 6 of 24
First we show the Low End case: This chart shows the load bandwidth for a conventional Wintel Pentium Pro PC with 440 FX Chipset.
The X-Axis varies the working set parameter. This shows how the memory system hierarchy supports temporal locality, i.e. the effect of cache hits through reuse of recently accessed data.
The colored stripes indicate the largest working set in the graph with the characteristic performance of caches and DRAM. The red stripe shows the L1 cache read performance, which lays about at 600 MB/s. This means that in practice a load needs about 3 clock cycles in a 200MHz PentiumPro. The blue stripe shows the L2 performance at about 450MB/s. The main memory performance lays at 180MB/s which is indicated by the orange stripe.
The Y-Axis varies the access pattern for contiguous blocks and strides (even and uneven) . The stride parameter shows how well caches and external stream logic help with read ahead and other means of improving bandwidth for accesses with spatial locality.
A slope of increasing performance marks the end of the access pattern axis toward lower strides. Its steepness indicates improved bandwidth for loads with contiguous accesses and accesses with small strides.
A selection of even, odd, and prime strides permits to detect performance gains and losses due to a banked memory system.