Bruck, Jehoshua and Cypher, Robert and Ho, Ching-Tien (1993) Fault-tolerant meshes and hypercubes with minimal numbers of spares. IEEE Transactions on Computers, 42 (9). pp. 1089-1104. ISSN 0018-9340 http://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc93a
- Published Version
See Usage Policy.
Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc93a
Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Two- and three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-and-conquer algorithms requiring global communication. However, even a single faulty processor or communication link can seriously affect the performance of these machines. This paper presents several techniques for tolerating faults in d-dimensional mesh and hypercube architectures. Our approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. We optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and k = 1, we present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2d) and has only one spare processor. We also present efficient layouts for fault-tolerant two- and three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, we give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes.
|Additional Information:||© Copyright 1993 IEEE. Reprinted with permission. Manuscript received July 2, 1991; revised November 12, 1991, and August 28, 1992. This work is based on “Fault-Tolerant Meshes with Minimal Numbers of Spares," by J. Bruck, R. Cypher, and C. T. Ho, which appeared in the Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, Dallas, TX, Dec. 2-5, 1991, pp. 288-295, and “Efficient Fault-Tolerant Mesh and Hypercube Architectures,” which appeared in the Proceedings of the 22nd Annual International Symposium on Fault-Tolerant Computing, Boston, MA, July 8-10, 1992, pp. 162-169. ©1991, 1992 IEEE. The authors would like to thank the anonymous referees for their helpful comments and suggestions|
|Subject Keywords:||fault tolerant computing; hypercube networks; performance evaluation; buses; d-dimensional mesh; fault-tolerant architecture; fault-tolerant meshes; hexagonal meshes; hypercubes; multiplexers; tori|
|Usage Policy:||No commercial reproduction, distribution, display or performance rights in this work are provided.|
|Deposited By:||Kristin Buxton|
|Deposited On:||02 Oct 2008 02:17|
|Last Modified:||26 Dec 2012 10:17|
Repository Staff Only: item control page