CaltechAUTHORS
  A Caltech Library Service

A Leader Election Protocol for Fault Recovery in Asynchronous Fully-Connected Networks

Franceschetti, Massimo and Bruck, Jehoshua (1998) A Leader Election Protocol for Fault Recovery in Asynchronous Fully-Connected Networks. California Institute of Technology . (Unpublished) http://resolver.caltech.edu/CaltechPARADISE:1998.ETR024

[img]
Preview
PDF (Adobe PDF (2MB))
See Usage Policy.

1974Kb
[img]
Preview
Postscript
See Usage Policy.

690Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechPARADISE:1998.ETR024

Abstract

We introduce a new algorithm for consistent failure detection in asynchronous systems. Informally, consistent failure detection requires processes in a distributed system to distinguish between two different populations: a fault free population and a faulty one. The major contribution of this paper is in combining ideas from group membership and leader election, in order to have an election protocol for a fault manager whose convergence is delayed until a new consistent view of the connectivity of the network is established by all processes. In our algorithm a group of processes agrees upon the failed population of the system, and then gives to a unique leader, called the fault manager, the possibility of executing distributed tasks in a centralized way. This research and the new perspective that we propose are driven by the study of an actual system, the Caltech RAIN (Reliable Array of Independent Nodes), on which our protocol has been implemented in order to perform fault recovery in distributed checkpointing. Other potential applications include fault tolerant distributed database services and fault tolerant distributed web servers.


Item Type:Report or Paper (Technical Report)
Related URLs:
URLURL TypeDescription
http://www.paradise.caltech.edu/papers/etr024.psPublisherUNSPECIFIED
Group:Parallel and Distributed Systems Group
Record Number:CaltechPARADISE:1998.ETR024
Persistent URL:http://resolver.caltech.edu/CaltechPARADISE:1998.ETR024
Usage Policy:You are granted permission for individual, educational, research and non-commercial reproduction, distribution, display and performance of this work in any format.
ID Code:26050
Collection:CaltechPARADISE
Deposited By: Imported from CaltechPARADISE
Deposited On:03 Sep 2002
Last Modified:26 Dec 2012 13:52

Repository Staff Only: item control page