Slide 8 of 24
The DEC 8400 figure depicts a comprehensive picture of the memory hierarchy with bandwidth data for all levels of the memory hierarchy. The horizontal plateaus at 700 MByte/s, 120 MByte/s and 28 MByte/s show the level of memory system performance for different sizes of working sets.
Maximum memory performance for loads is approximately 1100 MByte/s in small working sets that fit entirely into the L1 cache.
An application may experience a bandwidth of 1100 MByte/s out of L1 cache and 750 MByte/s out of L2 cache for large strides even if these bandwidths cannot be measured with a micro-benchmark. For loads out of L3 cache, we experience the peak of 600 MByte/s for contiguous accesses only, while strided accesses fall down to 120 MByte/s. This behavior is caused by the large cache lines at that level and by the read-ahead logic of the L2 cache which consumes load bandwidth unnecessarily to read-allocate the whole cache line, although only a single word is used.