wiki:EmergencyStop

Introduction

This page explains an example procedure for how site provider might respond to an Emergency Stop request from the GMOC. Each site can define its own procedure, but each site must have a procedure in place.

Example Procedure for Site Operator Response to Emergency Stop Request

Future enhancement: Optionally authenticate that the Emergency Stop request is valid by verifying that a GMOC ticket exists before acknowledging request.

  1. Within one hour of an Emergency Stop request from the GMOC, acknowledge to them that you are responding to the request.
  2. Identify the cause of the problem.
    • The GMOC will provide you with as much information as they can share, which may include exactly what needs to be shut down.
    • If the GMOC doesn't specify what to shut down, you may shut down the entire GENI sliver at your site.
  3. Attempt to isolate the problem using aggregate-specific documentation:
  4. Notify the GMOC that you have taken action to isolate the problematic traffic.
    • Share any details with the GMOC that may be relevant to helping them fix the problem at other sites.
    • Continue to help the GMOC identify and document the problem.
    • Work with the GMOC to ensure that the owner of the slice has been notified, and if the action that you took affected other experimenters, let the GMOC know which parties you believe may be affected.
  5. Once the GMOC notifies you that it is OK to restore the resources to normal operations, do so, if there is anything that can be restored. Notify the GMOC when you are finished.
    • If the action taken was a sliver shut down, then there is no action to be taken for this step.
    • Let the GMOC know which pieces have been restored.
    • Work with the GMOC to make sure previously affected experimenters are notified.

Emergency Stop Links

GMOC Documents

GMOC Emergency Stop Workflow

Aggregate-specific Procedures

Last modified 7 years ago Last modified on 03/01/13 17:16:41