Departmental Technical Reports (CS)

Opportunistic Checkpoint Intervals to Improve System Performance

Sarala Arunagiri, The University of Texas at El PasoFollow
John T. Daly
Patricia J. Teller, The University of Texas at El PasoFollow
Seetharami Seelam
Ron A. Oldfield
Maria Ruiz Varela
Rolf Riesen

Publication Date

4-2008

Comments

Technical Report: UTEP-CS-08-24

Abstract

The massive scale of current and next-generation massively parallel processing (MPP) systems presents significant challenges related to fault tolerance. For applications that perform periodic checkpoints, the choice of the checkpoint interval, the period between checkpoints, can have a significant impact on the execution time of the application. Finding the optimal checkpoint interval that minimizes the wall clock execution time, has been a subject of research over the last decade. In an environment where there are concurrent applications competing for access to the network and storage resources, in addition to application execution times, contention at these shared resources need to be factored into the process of choosing checkpoint intervals. In this paper, we perform analytical modeling of a complementary performance metric - the aggregate number of checkpoint I/O operations. We then show the existence and characterize a range of checkpoint intervals which have a potential of improving application and system performance.

Download

Included in

Computer Engineering Commons

COinS

Departmental Technical Reports (CS)

Opportunistic Checkpoint Intervals to Improve System Performance

Publication Date

Comments

Abstract

Included in

Search

Links

Browse

Author Corner

Links

Departmental Technical Reports (CS)

Opportunistic Checkpoint Intervals to Improve System Performance

Authors

Publication Date

Comments

Abstract

Included in

Share

Search

Links

Browse

Author Corner

Links