Changes between Initial Version and Version 1 of GENIOperationsTrial/ScheduledMaintenance


Ignore:
Timestamp:
06/18/15 08:11:39 (9 years ago)
Author:
lnevers@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIOperationsTrial/ScheduledMaintenance

    v1 v1  
     1[[PageOutline(1-2)]]
     2
     3= OPS-005-A Scheduled Site Maintenance =
     4
     5This procedures describes how to handle a GENI Scheduled Site Maintenance. A Scheduled Maintenance may be requested by the Rack team or by operations (i.e. service upgrade). This type of request may also be submitted by a site contact for local maintenance activities.
     6
     7Regardless of the source for the reported event, a ticket must be written to track the scheduled maintenance completion. Ticket must copy the issue reporter and the GENI Experimenters at  geni-users@googlegroups.com.   But before writing the ticket, GMOC should check the GMOC operations calendars (http://gmoc.grnoc.iu.edu/gmoc/index/support/gmoc-operations-calendars.html) and:
     8 - Look for upcoming GENI events (e.g. conferences and tutorials) and note the times.
     9 - Check for other upcoming GENI maintenance and note the times.
     10
     11GMOC may have to contact requester to make them aware of other upcoming GENI Events or Maintenance that may interfere with their maintenance request.
     12
     13= 1. Issue Reported =
     14
     15GMOC gathers technical details about the Scheduled Maintenance including:
     16 - Requester Organization
     17 - Requester Name
     18 - Requester email
     19 - GENI aggregates impacted and services within that aggregate will be directly affected.
     20 - Estimated duration.
     21 - Individuals/organizations needed to complete the maintenance.
     22
     23GMOC classifies the problem priority based on its urgency `Normal`: Usually a routine installation/provisioning or reporter initiated maintenance.
     24
     25== 1.1 GENI Event Type Prioritization ==
     26
     27GMOC normally classifies a scheduled maintenance as `Normal` priority that is Usually a routine installation/provisioning or reporter initiated maintenance. There may be priority exceptions when a scheduled maintenance impacts high priority resources like AL2S or services like the SCS. In those cases the priority should selection should consider the event type and the impact on the GENI Environment. Following is the Event Type prioritization that can be referenced to increase priority for a maintenance:
     28
     29||'''Priority'''||'''#'''||'''Event type'''                ||'''Dispatch'''||'''Procedure'''||
     30|| Critical     || 1     ||Emergency Stop and LLR          || GMOC         || GMOC Emergency Stop and LLR procedures exist ||
     31|| Critical     || 2     ||Security Event ''[*]''          || GMOC         || <<Insert OPS procedure link>> ||
     32|| Critical     || 3     ||GENI !Clearinghouse/Portal Event        || GMOC         || <<Insert OPS procedure link>> ||
     33|| Critical     || 4     ||Stitching Computation Service   || GMOC         || GMOC I2 SCS Procedure  exists        ||
     34|| Critical     || 5     ||AL2S Aggregate Event            || GMOC         || GMOC OESS Procedures exist           ||
     35|| Critical     || 6     ||GENI Stitching Event            || GMOC         || [wiki:LuisaSandbox/GENIOperationsTrial/GENINetworkStitching OPS-003-B: Network Stitching Experiment Debugging Procedure] ||
     36|| High         || 7     ||WiMAX Multicast VLANs Events    || GMOC         || <<Insert OPS procedure link>> ||
     37|| High         || 8     ||Regional (AM+Switches+links)    || Rack or Site Group || <<Insert OPS procedures links>> ||
     38|| High         || 9     ||Site Events reported by site contact|| GMOC         ||   <<Insert OPS procedures links>> ||
     39|| High         || 10     ||Site Events(AM+Switches+links reported by experimenters or tools) ''[**]''|| Rack or Site Group || <<Insert OPS procedures links>> ||
     40|| High         || 11     ||Experimenter Tools Events(Portal, jacks, omni..)|| Tools Contacts ||  ||
     41|| High         || 12    ||Monitoring Infrastructure Events|| UKY Monitoring team ||<<Insert GENI Monitoring procedures  ||
     42
     43 ''[*] Security Events start as Critical and may be re-prioritized upon investigation. [[BR]]''
     44 ''[**] Some `Site Events` may affect multiple sites (ExoSM) or non-GENI functions (!CloudLab, Emulab, Apt). These events require no special GMOC action and should be assigned to the team that owns the resources.''
     45
     46== 1.2 Create Ticket ==
     47
     48The GMOC ticketing system is used to capture maintenance information.  The ticket creation operation results in an email notification to the original requester and to the GENI experimenter list geni-users@googlegroups.com. GMOC will follow up to verify completion of the maintenance task.  Subsequent updates and interactions between GMOC and reporter will also generate notifications to the issue reporter. GMOC may postpone a ticket if the scheduled maintenance does not take place or if it cannot be completed in the time allocated.
     49
     50= 2. Investigate and Identify Response =
     51
     52GENI Operation only tracks GENI scheduled maintenance and verifies that maintenace completion has taken place.
     53
     54== 2.1 Investigate the Problem ==
     55
     56None
     57
     58== 2.2 Identify Potential Response ==
     59
     60None
     61
     62= 3. GMOC Response =
     63
     64Scheduled Maintenance ticket is usually dispatched to the organization responsible for the resource and usually requesting the maintenance. The GMOC verifies that the maintenance is completed and closes the ticket when the scheduled maintenance is done.  If a scheduled maintenance cannot be completed in the scheduled time, the GMOC updates the ticket to capture the new scheduled time.
     65
     66== 3.1 Implement Response ==
     67
     68The GMOC executes the steps outlined and verifies the completion of the scheduled event. The maintenance may take few iteration and GMOC may may have to post-pone ticket.
     69
     70== 3.2 Procedure Updates ==
     71
     72When instructions in a procedure are found to miss symptoms, required actions, or potential impact, then action must be taken by the GMOC to provide feedback to enhance the procedure for future use.
     73
     74= 4. Resolution =
     75
     76For scheduled event, the GMOC coordinate with the person that originally scheduled the event to make sure that it was completed successfully.  There is also a potential for scheduled event tickets being postponed, and remaining open until the next scheduled time.
     77
     78== 4.1 Document Resolution and Close Ticket ==
     79
     80GMOC captures maintenance completion in the ticket and closes the ticket. If the maintenance does not fully address its goals or introduces a new problem, a new ticket may be created to track the remaining issue.
     81