Local Load Access: SGI Origin
Access pattern (stride between 64bit words)
As a comparison the High End case:
This chart shows the load bandwidth of an SGI Origin with the processor at about the same clock frequency as the Pentium Pro.
It shows a much sharper picture.
The L1 cache works much better and accesses one double word per clock cycle whereas in the Pentium Pro the load in practice needs about 3 cycles. The red stipe indicates the performance at 1600MB/s. Also the L2 cache works at a remarkable speed with 1GB/s.
The load bandwidth of the main memory is nearly 100% faster for contiguous blocks but slower for strided access, which can be explained by the larger cache lines of the MIPS processor.