CaltechAUTHORS
  A Caltech Library Service

An on-line algorithm for checkpoint placement

Ziv, Avi and Bruck, Jehoshua (1997) An on-line algorithm for checkpoint placement. IEEE Transactions on Computers, 46 (9). pp. 976-985. ISSN 0018-9340. doi:10.1109/12.620479. https://resolver.caltech.edu/CaltechAUTHORS:ZIVieeetc97b

[img]
Preview
PDF
See Usage Policy.

260kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:ZIVieeetc97b

Abstract

Checkpointing enables us to reduce the time to recover from a fault by saving intermediate states of the program in a reliable storage. The length of the intervals between checkpoints affects the execution time of programs. On one hand, long intervals lead to long reprocessing time, while, on the other hand, too frequent checkpointing leads to high checkpointing overhead. In this paper, we present an on-line algorithm for placement of checkpoints. The algorithm uses knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only on-line knowledge about the cost of checkpointing, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1109/12.620479DOIUNSPECIFIED
ORCID:
AuthorORCID
Bruck, Jehoshua0000-0001-8474-0812
Additional Information:© 1997 IEEE. Reprinted with Permission. The research reported in this paper was supported in part by U.S. National Science Foundation Young Investigator Award CCR-9457811, by the Sloan Research Fellowship, and by DARPA and BMDO through an agreement with NASA/OSAT.
Subject Keywords:Fault-tolerant computing; checkpointing; on-line algorithm; performance optimization; program diagnostics; software fault tolerance; system recovery; checkpoint placement; fixed intervals; intermediate states
Issue or Number:9
DOI:10.1109/12.620479
Record Number:CaltechAUTHORS:ZIVieeetc97b
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:ZIVieeetc97b
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:9928
Collection:CaltechAUTHORS
Deposited By:INVALID USER
Deposited On:27 Mar 2008
Last Modified:08 Nov 2021 21:03

Repository Staff Only: item control page