wiki:GENIOperationsTrial/GENIStitchingCheck

Version 18 (modified by sblais@bbn.com, 5 years ago) (diff)

--

CHK-003 GENI Network Stitching Checks

This procedure defines how GENI Network Stitching status can be verified. Checks to be verified include:

  • Status of the Stitching Computation Service (SCS) Server
  • Status of the GENI Stitching Sites Paths

1.0 GENI SCS - Server Check

1.1 Goals of SCS - Server Check

The SCS is a service that provides stitching path information for known sites. This section defines to verify that the SCS server is up and providing path information for the known stitching sites. All information used is found in the GENI Monitoring System and is also available via alerts to GPO Nagios System. There are currently 2 SCS Server Checks that are run for:

  • scs-geni - The Internet2 Production SCS server
  • scs-geni-test - The Test SCS which is used for testing purposed by GPO, InstaGENI, CloudLab and iMind teams.

1.2 Steps for SCS - Server Check

  1. Login to GPO Nagios System
  2. Select Host Groups->Summary in the left hand navigation bar. This will bring you to this page
    Nagios-Host Groups Summary
  3. Look for GENI Stitching Computation Services, and click on the GENI Stitching Computation Services link in the Host Group column to see details. This will bring you to the Service Overview For Host Group page.
    Nagios-Service Overview For Host Group SCS
  4. Resulting page shows the SCS systems being monitored. There are currently 2 instances being monitored: Internet2 SCS and the Test SCS at MAX.
  5. Review the Internet2 SCS scs-geni service status, which can be UP, DOWN or PENDING.
  6. Click on the scs-geni link of the Host column. This will bring you to the Service Status Details For Host scs-geni` page.
    Service Status Details For Host scs-geni part 1
  7. Look for a service named gpo:is_available and confirm that the status is UP.
    Service Status Details For Host scs-geni part 2
  8. Go back to the Service Overview For Host Group page reached in Step 3. Review the Test SCS scs-geni-test service status, which can be UP, DOWN or PENDING.
  9. Click on the scs-geni-test link of the Host column. This will bring you to the Service Status Details For Host scs-geni-test` page.
  10. Look for a service named gpo:is_available and confirm that the status is UP.

Note: the host status themselves is expected to always be PENDING.

1.3 SCS - Server Check - Pass Criteria

If step 7 shows UP, than the Production Internet2 SCS check passes. A value of UP means that the SCS service is running and answering a listaggregates request. The monitoring system also verifies that the list of aggregates returned includes the expected sites.

If step 10 shows UP, than test SCS check passes. A value of UP means that the SCS service is running and answering a listaggregates request. The monitoring system also verifies that the list of aggregates returned includes the expected sites.

1.4 SCS - Server Check - Fail Criteria and Escalation

If step 7 shows DOWN, than the Production SCS check fails. A value of DOWN means that the SCS system is either not responding or responding with the wrong list of site aggregates.

Escalation: SCS status issues should be escalated to GMOC.

If step 10 shows DOWN, than the check fails. A value of DOWN means that the test SCS system is either not responding or responding with the wrong list of site aggregates.

Escalation: Test SCS issues should be escalated to the MAX Development team.

2.0 GENI Stitching Sites - Path Check

2.1 Goals of Stitching Sites - Path Check

The Stitching Site check verifies that the SCS knows all path combinations between all aggregates that support stitching. This check does no actually verify that the paths suggested are able to exchange traffic, but that they are known to the SCS. This check occurs 3 times a day at 1 am, 9 am and 5 pm (Eastern time).

2.2 Steps for Stitching Sites - Path Check

BLOCKING: This check is not working yet due to the problem tracked by http://trac.gpolab.bbn.com/ops-monitoring/ticket/284

Nagios alerts are available for stitching paths checks. Alerts can be found as follows:

  1. Login to GPO Nagios System
  2. Select Host Groups->Summary in the left hand navigation bar.
  3. Look for GENI SCS Path Availability -> Service Status Summary for the SCS check. This provides an overall status.
  4. Status can be UP, DOWN or PENDING. If some results are DOWN or PENDING, select each to see the details.

2.3 Stitching Sites - Path Check - Pass Criteria

If step 3 shows all results as UP, this check passes and you can skip subsequent step 4.

2.4 Stitching Sites - Path Check - Fail Criteria and Escalation

If step 3 shows status as DOWN or PENDING then proceed to step 4 to collect path details about failure.

Escalation: SCS path issues should be escalated to GMOC.

Attachments (4)

Download all attachments as: .zip