Version 7 (modified by 9 years ago) (diff) | ,
---|
CHK-003 GENI Network Stitching Checks
This procedure defines how GENI Network Stitching status can be verified. Checks to be verified include:
- Status of the Stitching Computation Service (SCS)
- Status of the GENI Stitching Sites
1.0 GENI SCS Check
1.1 Goals of SCS Check
The SCS is a service that provides stitching path information for known sites. This section defines to verify that the SCS is up and providing path information for the known stitching sites. All information used is found in the GENI Monitoring System and available via alerts to GPO Nagios System. Checks are run for:
- scs-geni - The Internet2 Production SCS server
- scs-geni-test - The Test SCS which is used for testing purposed by GPO, InstaGENI, CloudLab and iMind teams.
1.2 Steps for SCS Check
- Login to GPO Nagios System
- Select
Host Groups->Summary
in the left hand navigation bar. - Look for
GENI Stitching Computation Services
, and select the"Host Status Summary"
for the SCS check to see details. - Resulting page shows the SCS systems being monitored. There are currently 2 instances being monitored: Internet2 SCS and the Test SCS at MAX.
- Review the Internet2 SCS
scs-geni
status, which can beUP
orDOWN
. - Review the Test SCS
scs-geni-test
status, which can beUP
orDOWN
.
1.3 SCS Check - Pass Criteria
If step 5 shows UP
, than the Production Internet2 SCS check passes. A value of UP
means that the SCS service is running and answering a listaggregates
request. The monitoring system also verifies that the list of aggregates returned includes the expected sites.
If step 6 shows UP
, than test SCS check passes. A value of UP
means that the SCS service is running and answering a listaggregates
request. The monitoring system also verifies that the list of aggregates returned includes the expected sites.
1.4 SCS Check - Fail Criteria and Escalation
If step 5 shows DOWN
, than the Production SCS check fails. A value of DOWN
means that the SCS system is either not responding or responding with the wrong list of site aggregates.
Escalation: SCS issues should be escalated to GMOC.
If step 6 shows DOWN
, than the check fails. A value of DOWN
means that the test SCS system is either not responding or responding with the wrong list of site aggregates.
Escalation: Test SCS issues should be escalated to the MAX Development team.
2.0 GENI Stitching Sites Check
2.1 Goals of Stitching Sites Check
The Stitching Site check verifies that the SCS knows all path combinations between all aggregates that support stitching. This check does no actually verify that the paths are able to exchange traffic over the paths at this time.
2.2 Steps for Stitching Sites Check
Nagios alerts are available for failed checks of stitching paths. Alerts can be found as follows:
- Login to GPO Nagios System
- Select
Host Groups->Summary
in the left hand navigation bar. - Look for
GENI Stitching Computation Services
and select the"Service Status Summary"
for the SCS check. - The resulting page show the
scs-geni
status, which can beUP
orDOWN
.
2.3 Stitching Sites Check - Pass Criteria
If status is ???
than there are no path checks failing.
2.4 "Stitching Sites Check - Fail Criteria and Escalation
If ???
shows status as pending
than check path function failed.
Escalation: SCS path issues should be escalated to GMOC.
Attachments (4)
-
Nagios-Host Groups Summary.png (160.3 KB) - added by 9 years ago.
Nagios-Host Groups Summary
-
Nagios-Service Overview For Host Group SCS.png (124.3 KB) - added by 9 years ago.
Nagios-Service Overview For Host Group SCS
-
Nagios-Service Status Details For Host scs-geni 1.png (226.2 KB) - added by 9 years ago.
Service Status Details For Host scs-geni part 1
-
Nagios-Service Status Details For Host scs-geni 2.png (240.8 KB) - added by 9 years ago.
Service Status Details For Host scs-geni part 2
Download all attachments as: .zip