wiki:GENIOperationsTrial/ScheduledMaintenance

Version 1 (modified by lnevers@bbn.com, 9 years ago) (diff)

--

OPS-005-A Scheduled Site Maintenance

This procedures describes how to handle a GENI Scheduled Site Maintenance. A Scheduled Maintenance may be requested by the Rack team or by operations (i.e. service upgrade). This type of request may also be submitted by a site contact for local maintenance activities.

Regardless of the source for the reported event, a ticket must be written to track the scheduled maintenance completion. Ticket must copy the issue reporter and the GENI Experimenters at geni-users@googlegroups.com. But before writing the ticket, GMOC should check the GMOC operations calendars (http://gmoc.grnoc.iu.edu/gmoc/index/support/gmoc-operations-calendars.html) and:

  • Look for upcoming GENI events (e.g. conferences and tutorials) and note the times.
  • Check for other upcoming GENI maintenance and note the times.

GMOC may have to contact requester to make them aware of other upcoming GENI Events or Maintenance that may interfere with their maintenance request.

1. Issue Reported

GMOC gathers technical details about the Scheduled Maintenance including:

  • Requester Organization
  • Requester Name
  • Requester email
  • GENI aggregates impacted and services within that aggregate will be directly affected.
  • Estimated duration.
  • Individuals/organizations needed to complete the maintenance.

GMOC classifies the problem priority based on its urgency Normal: Usually a routine installation/provisioning or reporter initiated maintenance.

1.1 GENI Event Type Prioritization

GMOC normally classifies a scheduled maintenance as Normal priority that is Usually a routine installation/provisioning or reporter initiated maintenance. There may be priority exceptions when a scheduled maintenance impacts high priority resources like AL2S or services like the SCS. In those cases the priority should selection should consider the event type and the impact on the GENI Environment. Following is the Event Type prioritization that can be referenced to increase priority for a maintenance:

Priority#Event type DispatchProcedure
Critical 1 Emergency Stop and LLR GMOC GMOC Emergency Stop and LLR procedures exist
Critical 2 Security Event [*] GMOC <<Insert OPS procedure link>>
Critical 3 GENI Clearinghouse/Portal Event GMOC <<Insert OPS procedure link>>
Critical 4 Stitching Computation Service GMOC GMOC I2 SCS Procedure exists
Critical 5 AL2S Aggregate Event GMOC GMOC OESS Procedures exist
Critical 6 GENI Stitching Event GMOC OPS-003-B: Network Stitching Experiment Debugging Procedure?
High 7 WiMAX Multicast VLANs Events GMOC <<Insert OPS procedure link>>
High 8 Regional (AM+Switches+links) Rack or Site Group <<Insert OPS procedures links>>
High 9 Site Events reported by site contact GMOC <<Insert OPS procedures links>>
High 10 Site Events(AM+Switches+links reported by experimenters or tools) [] Rack or Site Group <<Insert OPS procedures links>>
High 11 Experimenter Tools Events(Portal, jacks, omni..) Tools Contacts
High 12 Monitoring Infrastructure Events UKY Monitoring team <<Insert GENI Monitoring procedures

[*] Security Events start as Critical and may be re-prioritized upon investigation.
[] Some Site Events may affect multiple sites (ExoSM) or non-GENI functions (CloudLab, Emulab, Apt). These events require no special GMOC action and should be assigned to the team that owns the resources.

1.2 Create Ticket

The GMOC ticketing system is used to capture maintenance information. The ticket creation operation results in an email notification to the original requester and to the GENI experimenter list geni-users@googlegroups.com. GMOC will follow up to verify completion of the maintenance task. Subsequent updates and interactions between GMOC and reporter will also generate notifications to the issue reporter. GMOC may postpone a ticket if the scheduled maintenance does not take place or if it cannot be completed in the time allocated.

2. Investigate and Identify Response

GENI Operation only tracks GENI scheduled maintenance and verifies that maintenace completion has taken place.

2.1 Investigate the Problem

None

2.2 Identify Potential Response

None

3. GMOC Response

Scheduled Maintenance ticket is usually dispatched to the organization responsible for the resource and usually requesting the maintenance. The GMOC verifies that the maintenance is completed and closes the ticket when the scheduled maintenance is done. If a scheduled maintenance cannot be completed in the scheduled time, the GMOC updates the ticket to capture the new scheduled time.

3.1 Implement Response

The GMOC executes the steps outlined and verifies the completion of the scheduled event. The maintenance may take few iteration and GMOC may may have to post-pone ticket.

3.2 Procedure Updates

When instructions in a procedure are found to miss symptoms, required actions, or potential impact, then action must be taken by the GMOC to provide feedback to enhance the procedure for future use.

4. Resolution

For scheduled event, the GMOC coordinate with the person that originally scheduled the event to make sure that it was completed successfully. There is also a potential for scheduled event tickets being postponed, and remaining open until the next scheduled time.

4.1 Document Resolution and Close Ticket

GMOC captures maintenance completion in the ticket and closes the ticket. If the maintenance does not fully address its goals or introduces a new problem, a new ticket may be created to track the remaining issue.