CaltechAUTHORS
  A Caltech Library Service

On the design and implementation of broadcast and global combine operations using the postal model

Bruck, Jehoshua and De Coster, Luc and Dewulf, Natalie and Ho, Ching-Tien and Lauwereins, Rudy (1996) On the design and implementation of broadcast and global combine operations using the postal model. IEEE Transactions on Parallel and Distributed Systems, 7 (3). pp. 256-265. ISSN 1045-9219. http://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds96

[img]
Preview
PDF - Published Version
See Usage Policy.

1171Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds96

Abstract

There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1109/71.491579DOIUNSPECIFIED
http://ieeexplore.ieee.org/xpls/abs_all.jsp?isnumber=10561&arnumber=491579PublisherUNSPECIFIED
Additional Information:© Copyright 1996 IEEE. Reprinted with permission. Manuscript received Dec. 12, 1994; revised Apr. 1, 1995. The work on the Intel Delta parallel system was facilitated by a joint study with the Caltech Concurrent Supercomputing Facility. We thank Paul Messina, Director of the CCSF, for his support and help. We also thank Bob Cypher, Alex Ho, and Eric Leu for various helpful discussions. Jehoshua Bruck was supported in part by the National Science Foundation Young Investigator Award CCR-9457811; by the Sloan Research Fellowship; by a grant from the IBM Almaden Research Center, San Jose, California; and by a grant from the AT&T Foundation. Luc De Coster was supported by the Belgian National Fund for Scientific Research.
Funders:
Funding AgencyGrant Number
National Science FoundationCCR-9457811
Alfred P. Sloan FoundationUNSPECIFIED
IBM Almaden Research Center, San Jose, CA.UNSPECIFIED
AT&T FoundationUNSPECIFIED
Belgian National Fund for Scientific ResearchUNSPECIFIED
Subject Keywords:Broadcast; global combine; postal model; complete graph; collective communication; message passing; parallel algorithms; LogP model; broadcast operations; communication latency; global combine operations; postal model
Record Number:CaltechAUTHORS:BRUieeetpds96
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:BRUieeetpds96
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:12347
Collection:CaltechAUTHORS
Deposited By: Kristin Buxton
Deposited On:12 Nov 2008 22:57
Last Modified:26 Dec 2012 10:30

Repository Staff Only: item control page