Ziv, Avi and Bruck, Jehoshua (1996) An on-line algorithm for checkpoint placement. In: International Symposium on Software Reliability Engineering, 7th, White Plains, NY, 30 October-2 November 1996. IEEE , Piscataway, NJ, pp. 274-283. ISBN 0818677074. https://resolver.caltech.edu/CaltechAUTHORS:ZIVissre96
![]()
|
PDF
See Usage Policy. 1MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:ZIVissre96
Abstract
Checkpointing is a common technique for reducing the time to recover from faults in computer systems. By saving intermediate states of programs in a reliable storage, checkpointing enables to reduce the lost processing time caused by faults. The length of the intervals between checkpoints affects the execution time of programs. Long intervals lead to long re-processing time, while too frequent checkpointing leads to high checkpointing overhead. In this paper we present an on-line algorithm for placement of checkpoints. The algorithm uses on-line knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. We show how the execution time of a program using this algorithm can be analyzed. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only on-line knowledge about the cost of checkpointing, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost.
Item Type: | Book Section | ||||||
---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||
ORCID: |
| ||||||
Additional Information: | © 1996 IEEE. Reprinted with Permission. The research reported in this paper was supported in part by the NSF Young Investigator Award CCR-9457811, by the Sloan Research Fellowship, and by a grant from the IBM Almaden Research Center, San Jose, California. | ||||||
Subject Keywords: | software fault tolerance; system recovery; checkpoint interval length; checkpoint placement; checkpointing cost; checkpointing overhead; computer systems; fault recovery time; intermediate program state saving; online algorithm; processing time loss reduction; program execution time; reprocessing time | ||||||
DOI: | 10.1109/ISSRE.1996.558869 | ||||||
Record Number: | CaltechAUTHORS:ZIVissre96 | ||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:ZIVissre96 | ||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||
ID Code: | 9887 | ||||||
Collection: | CaltechAUTHORS | ||||||
Deposited By: | INVALID USER | ||||||
Deposited On: | 26 Mar 2008 | ||||||
Last Modified: | 08 Nov 2021 21:02 |
Repository Staff Only: item control page