wiki:OperationsProcedures/StitchingComputationService

Version 2 (modified by adaadwil@indiana.edu, 3 years ago) (diff)

--

GENI Stitching Computation Service

The GENI SCS (Stitching Computation Service) runs on geni-scs.net.internet2.edu and allows experimenters to reserve GENI resources across multiple domains.

Issue can be received via monitoring systems and by reporting.

Monitoring: == The SCS is directly monitored by GlobalNOC Nagios of the Internet2 host. GMOC techs will see alarms on the host geni-scs.net.internet2.edu in Alertmon.

GENI Type Prioritization

  • SCS Failure: Critical Issue. During an outage all stitching is unavailable
  • Site failure: High priority. Prevents one or more sites from setting up layer2 GENI stitching connections.
  • ExoGENI to ExoGENI failure: High priority. Issue prevent ExoGENI to ExoGENI connections using ExoGENI stitching.

Observed Alarm- Defined as GMOC proactively responding to an active alarm

GMOC will need to create a ticket and record the below

  • Gather Alarm information
    • Host Name
    • Service
    • Time stamps
    • Logs

Ticket will be sent to Internet2 to troubleshoot and resolve.

Reported Issue

GMOC will need to create a ticket and record as much of the below information as possible.

  • Initial reporters contact information. Verify in the GMOC DB
    • Name
    • Organization
    • GENI Site Name
    • Phone Number
    • Email Address
  • When did this start?
  • Symptoms and Impact to GENI
  • Criticality of Issue (priority for expected response time)
    • single experimenter issues are defaulted to Elevated - Priority 3
    • Tutorials, Reservations, Classes are defaulted to High - Priority 2

Note: While GMOC is 24x7 GENI and its partners operate on normal business hour model. This means no anticipated after hours support or responses from other GENI members.

GMOC will create ticket and begin triage steps

  • GMOC to determine "does the site support stitching?" This is verified on the GMOC WIKI "http://groups.geni.net/geni/wiki/GeniNetworkStitchingSites"
  • After GMOC works with the sites to verify of both endpoints are supported.
    • For EXO-EXO endpoints, SCS cannot be used.
  • Escalate to the appropriate rack teams (ExoGENI, InstaGENI)
  • Request ticket is sent to Internet2 for investigation.
    • Ticket priority is set by Criticality of issue (single user vs. tutorial)
  • Follow up with the parties during next business day until the issue has been resolved.
    • Update Ticket
    • Send Notification to the community
    • Determine if after action report is needed
    • Close ticket