Slide 24 of 24
Notes:
We can conclude, that the Extended Copy Transfer Model is applicable for MPP-Nodes, NUMA and SMP systems as well as for Clusters of PCs where it reveals most of the architectural differences of the measured memory subsystems.
On low end SMPs the benefits of symmetric multiprocessing ends abruptly as the working set exceeds the caches that are located near the micro-processors. We constitute less than the half performance on two processor PCs.
We heard that things will be improving with the BX chipset but have not yet verified it.
Fast communication puts high demands on the memory system:
High end MPP nodes deliver excellent performance for remote transfers. Low End system, interconnected by Gigabit networks, only peak near the bandwidth of the I/O bus for simple contiguous block transfers. For more complex remote accesses the performance collapses.
Coming back to the optimal design of a compute node for a cluster of PCs we conclude that adding more P6 processors without reinforcing the memory system is therefore questionable.