CaltechAUTHORS
  A Caltech Library Service

Fault-tolerant meshes and hypercubes with minimal numbers of spares

Bruck, Jehoshua and Cypher, Robert and Ho, Ching-Tien (1993) Fault-tolerant meshes and hypercubes with minimal numbers of spares. IEEE Transactions on Computers, 42 (9). pp. 1089-1104. ISSN 0018-9340. doi:10.1109/12.241598. https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc93a

[img]
Preview
PDF - Published Version
See Usage Policy.

1MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc93a

Abstract

Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Two- and three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-and-conquer algorithms requiring global communication. However, even a single faulty processor or communication link can seriously affect the performance of these machines. This paper presents several techniques for tolerating faults in d-dimensional mesh and hypercube architectures. Our approach consists of adding spare processors and communication links so that the resulting architecture will contain a fault-free mesh or hypercube in the presence of faults. We optimize the cost of the fault-tolerant architecture by adding exactly k spare processors (while tolerating up to k processor and/or link faults) and minimizing the maximum number of links per processor. For example, when the desired architecture is a d-dimensional mesh and k = 1, we present a fault-tolerant architecture that has the same maximum degree as the desired architecture (namely, 2d) and has only one spare processor. We also present efficient layouts for fault-tolerant two- and three-dimensional meshes, and show how multiplexers and buses can be used to reduce the degree of fault-tolerant architectures. Finally, we give constructions for fault-tolerant tori, eight-connected meshes, and hexagonal meshes.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1109/12.241598DOIUNSPECIFIED
http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=241598PublisherUNSPECIFIED
ORCID:
AuthorORCID
Bruck, Jehoshua0000-0001-8474-0812
Additional Information:© Copyright 1993 IEEE. Reprinted with permission. Manuscript received July 2, 1991; revised November 12, 1991, and August 28, 1992. This work is based on “Fault-Tolerant Meshes with Minimal Numbers of Spares," by J. Bruck, R. Cypher, and C. T. Ho, which appeared in the Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing, Dallas, TX, Dec. 2-5, 1991, pp. 288-295, and “Efficient Fault-Tolerant Mesh and Hypercube Architectures,” which appeared in the Proceedings of the 22nd Annual International Symposium on Fault-Tolerant Computing, Boston, MA, July 8-10, 1992, pp. 162-169. ©1991, 1992 IEEE. The authors would like to thank the anonymous referees for their helpful comments and suggestions
Subject Keywords:fault tolerant computing; hypercube networks; performance evaluation; buses; d-dimensional mesh; fault-tolerant architecture; fault-tolerant meshes; hexagonal meshes; hypercubes; multiplexers; tori
Issue or Number:9
DOI:10.1109/12.241598
Record Number:CaltechAUTHORS:BRUieeetc93a
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:BRUieeetc93a
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:11607
Collection:CaltechAUTHORS
Deposited By:INVALID USER
Deposited On:02 Oct 2008 02:17
Last Modified:08 Nov 2021 22:01

Repository Staff Only: item control page