DMTCP: Distributed MultiThreaded CheckPointing
DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpointing the state of an arbitrary group of programs spread across many machines and connected by sockets. It runs directly on the user binary executable, without needing to modify either the user binary or the operating system. Among the applications supported by DMTCP are OpenMPI, MATLAB, Python, Perl, and many programming languages and shell scripting languages. With the use of TightVNC, it can also checkpoint and restart X-Windows applications, as long as they do not use extensions (e.g.: no OpenGL, no video).
Fri Oct 1 17:49:16 2010 - permalink -
-
http://dmtcp.sourceforge.net/