Bruck, Jehoshua and Ho, Ching-Tien and Kipnis, Shlomo and Upfal, Eli and Weathersby, Derrick (1997) Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems, 8 (11). pp. 1143-1156. ISSN 1045-9219. http://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds97
- Published Version
See Usage Policy.
Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds97
We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected message-passing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k ≥ 1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth. In the index operation among n processors, initially, each processor has n blocks of data, and the goal is to exchange the ith block of processor j with the jth block of processor i. We present a class of index algorithms that is designed for all values of n and that features a trade-off between the communication start-up time and the data transfer time. This class of algorithms includes two special cases: an algorithm that is optimal with respect to the measure of the start-up time, and an algorithm that is optimal with respect to the measure of the data transfer time. We also present experimental results featuring the performance tuneability of our index algorithms on the IBM SP-1 parallel system. In the concatenation operation, among n processors, initially, each processor has one block of data, and the goal is to concatenate the n blocks of data from the n processors, and to make the concatenation result known to all the processors. We present a concatenation algorithm that is optimal, for most values of n, in the number of communication rounds and in the amount of data transferred.
|Additional Information:||© Copyright 1997 IEEE. Reprinted with permission. Manuscript received 6 Apr. 1994; revised 27 Apr. 1997. We thank Robert Cypher for his help in deriving Lemma C.1. Jehoshua Bruck was supported in part by U.S. National Science Foundation Young Investigator Award CCR-9457811, by the Sloan Research Fellowship, and by DARPA and BMDO through an agreement with NASA/OSAT.|
|Subject Keywords:||All-to-all broadcast; all-to-all personalized communication; complete exchange; concatenation operation; distributed-memory system; index operation; message-passing system; multiscatter/gather; parallel system; communication complexity; message passing; parallel algorithms; all-to-all communications; communication bandwidth; communication start-up; complexity measures; concatenation; index; message-passing systems ; multiport; point-to-point communication; sender-receiver pair|
|Usage Policy:||No commercial reproduction, distribution, display or performance rights in this work are provided.|
|Deposited By:||Kristin Buxton|
|Deposited On:||12 Nov 2008 22:34|
|Last Modified:||26 Dec 2012 10:30|
Repository Staff Only: item control page