CaltechAUTHORS
  A Caltech Library Service

Analysis of checkpointing schemes for multiprocessor systems

Ziv, Avi and Bruck, Jehoshua (1994) Analysis of checkpointing schemes for multiprocessor systems. In: Symposium on Reliable Distributed Systems, 13th, Dana Point, CA, 25-27 October 1994. IEEE , Piscataway, NJ, pp. 52-61. ISBN 0818665750 http://resolver.caltech.edu/CaltechAUTHORS:ZIVreldis94

[img]
Preview
PDF
See Usage Policy.

874Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:ZIVreldis94

Abstract

Parallel computing systems provide hardware redundancy that helps to achieve low cost fault-tolerance, by duplicating the task into more than a single processor, and comparing the states of the processors at checkpoints. This paper suggests a novel technique, based on a Markov Reward Model (MRM), for analyzing the performance of checkpointing schemes with task duplication. We show how this technique can be used to derive the average execution time of a task and other important parameters related to the performance of checkpointing schemes. Our analytical results match well the values we obtained using a simulation program. We compare the average task execution time and total work of four checkpointing schemes, and show that generally increasing the number of processors reduces the average execution time, but increases the total work done by the processors. However, in cases where there is a big difference between the time it takes to perform different operations, those results can change.


Item Type:Book Section
Additional Information:© 1994 IEEE. Reprinted with Permission. Publication Date: 25-27 Oct. 1994. This research was partially supported by the IBM Almaden Research Center, San Jose, California, and partially supported by NSF Young Investigator Award CCR-9457811.
Subject Keywords:Markov processes; multiprocessing systems; performance evaluation; Markov reward model; checkpointing schemes; hardware redundancy; low cost fault-tolerance; multiprocessor systems; parallel computing systems; simulation program
Record Number:CaltechAUTHORS:ZIVreldis94
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:ZIVreldis94
Alternative URL:http://dx.doi.org/10.1109/RELDIS.1994.336909
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:9886
Collection:CaltechAUTHORS
Deposited By: Kristin Buxton
Deposited On:26 Mar 2008
Last Modified:26 Dec 2012 09:53

Repository Staff Only: item control page