Version 1 (modified by 9 years ago) (diff) | ,
---|
OPS-005-A Scheduled Site Maintenance
This procedures describes how to handle a GENI Scheduled Site Maintenance. A Scheduled Maintenance may be requested by the Rack team or by operations (i.e. service upgrade). This type of request may also be submitted by a site contact for local maintenance activities.
Regardless of the source for the reported event, a ticket must be written to track the scheduled maintenance completion. Ticket must copy the issue reporter and the GENI Experimenters at geni-users@googlegroups.com. But before writing the ticket, GMOC should check the GMOC operations calendars (http://gmoc.grnoc.iu.edu/gmoc/index/support/gmoc-operations-calendars.html) and:
- Look for upcoming GENI events (e.g. conferences and tutorials) and note the times.
- Check for other upcoming GENI maintenance and note the times.
GMOC may have to contact requester to make them aware of other upcoming GENI Events or Maintenance that may interfere with their maintenance request.
1. Issue Reported
GMOC gathers technical details about the Scheduled Maintenance including:
- Requester Organization
- Requester Name
- Requester email
- GENI aggregates impacted and services within that aggregate will be directly affected.
- Estimated duration.
- Individuals/organizations needed to complete the maintenance.
GMOC classifies the problem priority based on its urgency Normal
: Usually a routine installation/provisioning or reporter initiated maintenance.
1.1 GENI Event Type Prioritization
GMOC normally classifies a scheduled maintenance as Normal
priority that is Usually a routine installation/provisioning or reporter initiated maintenance. There may be priority exceptions when a scheduled maintenance impacts high priority resources like AL2S or services like the SCS. In those cases the priority should selection should consider the event type and the impact on the GENI Environment. Following is the Event Type prioritization that can be referenced to increase priority for a maintenance:
Priority | # | Event type | Dispatch | Procedure |
Critical | 1 | Emergency Stop and LLR | GMOC | GMOC Emergency Stop and LLR procedures exist |
Critical | 2 | Security Event [*] | GMOC | <<Insert OPS procedure link>> |
Critical | 3 | GENI Clearinghouse/Portal Event | GMOC | <<Insert OPS procedure link>> |
Critical | 4 | Stitching Computation Service | GMOC | GMOC I2 SCS Procedure exists |
Critical | 5 | AL2S Aggregate Event | GMOC | GMOC OESS Procedures exist |
Critical | 6 | GENI Stitching Event | GMOC | OPS-003-B: Network Stitching Experiment Debugging Procedure? |
High | 7 | WiMAX Multicast VLANs Events | GMOC | <<Insert OPS procedure link>> |
High | 8 | Regional (AM+Switches+links) | Rack or Site Group | <<Insert OPS procedures links>> |
High | 9 | Site Events reported by site contact | GMOC | <<Insert OPS procedures links>> |
High | 10 | Site Events(AM+Switches+links reported by experimenters or tools) [] | Rack or Site Group | <<Insert OPS procedures links>> |
High | 11 | Experimenter Tools Events(Portal, jacks, omni..) | Tools Contacts | |
High | 12 | Monitoring Infrastructure Events | UKY Monitoring team | <<Insert GENI Monitoring procedures |
[*] Security Events start as Critical and may be re-prioritized upon investigation.
[] SomeSite Events
may affect multiple sites (ExoSM) or non-GENI functions (CloudLab, Emulab, Apt). These events require no special GMOC action and should be assigned to the team that owns the resources.
1.2 Create Ticket
The GMOC ticketing system is used to capture maintenance information. The ticket creation operation results in an email notification to the original requester and to the GENI experimenter list geni-users@googlegroups.com. GMOC will follow up to verify completion of the maintenance task. Subsequent updates and interactions between GMOC and reporter will also generate notifications to the issue reporter. GMOC may postpone a ticket if the scheduled maintenance does not take place or if it cannot be completed in the time allocated.
2. Investigate and Identify Response
GENI Operation only tracks GENI scheduled maintenance and verifies that maintenace completion has taken place.
2.1 Investigate the Problem
None
2.2 Identify Potential Response
None
3. GMOC Response
Scheduled Maintenance ticket is usually dispatched to the organization responsible for the resource and usually requesting the maintenance. The GMOC verifies that the maintenance is completed and closes the ticket when the scheduled maintenance is done. If a scheduled maintenance cannot be completed in the scheduled time, the GMOC updates the ticket to capture the new scheduled time.
3.1 Implement Response
The GMOC executes the steps outlined and verifies the completion of the scheduled event. The maintenance may take few iteration and GMOC may may have to post-pone ticket.
3.2 Procedure Updates
When instructions in a procedure are found to miss symptoms, required actions, or potential impact, then action must be taken by the GMOC to provide feedback to enhance the procedure for future use.
4. Resolution
For scheduled event, the GMOC coordinate with the person that originally scheduled the event to make sure that it was completed successfully. There is also a potential for scheduled event tickets being postponed, and remaining open until the next scheduled time.
4.1 Document Resolution and Close Ticket
GMOC captures maintenance completion in the ticket and closes the ticket. If the maintenance does not fully address its goals or introduces a new problem, a new ticket may be created to track the remaining issue.