37 | 40 | Unexpected failures and outages will continue to affect the operation of cyber infrastructures like Amazon EC2 and network infrastructures like GENI. For many applications running in such infrastructures, such as long-running scientific jobs and networked system emulations, failure recovery means re-running the application from the beginning thus losing (partial) work done and wasting system resources. It is desirable for the infrastructure to provide efficient, application-transparent failure recovery capability that takes live "snapshots" of an infrastructure for future recovery or replay. |