Subset Barrier Synchronization
on a Private-Memory Parallel System
Anja Feldmann Thomas Gross David O'Hallaron Thomas M. Stricker
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Abstract
A global barrier synchronizes all processors in a parallel system.
This paper investigates algorithms that allow disjoint subsets of
processors to synchronize independently and in parallel. The user
model of a subset barrier is straight forward; a processor that parti-
cipates in a subset barrier needs to know only the name of the barrier
and the number of participating processors. This paper identifies
two general communication models for private-memory parallel
systems: the bounded buffer broadcast model and the anonymous
destination message passing model and presents algorithms for bar-
rier synchronization in the terms of these models. The models
are detailed enough to allow meaningful cost estimates for their
primitives, yet independent of a specific architecture and can be
supported efficiently by a modern private-memory parallel system.
The anonymous destination message passing model is the most at-
tractive. The time complexity to synchronize over a uni-directional
ring of N processors is O(log N) for common cases, and O(sqrt(N))
in the worst case. The algorithms have been implemented on iWarp,
a private-memory parallel system and are now in daily use. The
paper concludes with timing measurements obtained on a 64-node
system.
Note: Reprint from the proceedings of the ACM Symposium on Parallel
Algorithms and Architectures, SPAA92 June 29 - July 1, 1992, San
Diego, California.