Slide 12 of 24
To really reveal the differences of High and Low End systems we separately compare the copy bandwidth in a multiprocessing scenario where either 1, 2, 4 and 8 processors copy data in the memory.
For small working sets in the caches, the performance remains the same, as measurements prove.
More interesting are the difference for large working sets in main memory.
We not only use a simple copy as e.g. MCCalpins ‘Stream Benchmark’ but measure a gather copy stream where the processors read strided data and store it contiguously.