Version 12 (modified by 9 years ago) (diff) | ,
---|
CHK-005: GENI Network Connectivity OpenFlow Checks
Connectivity through the GENI network is continuously monitored via sets of pings between resources on many GENI aggregates. These resources have been allocated in the same way experimenters would allocated resources. OpenFlow controllers are also used as part of this monitoring experiment to direct the ping traffic flow across the network. The GENI Network Connectivity OpenFlow Checks procedure defines the steps to make sure that connectivity is achieved throughout the network.
BLOCKING: This check is not working yet due to the problem tracked by http://trac.gpolab.bbn.com/ops-monitoring/ticket/306
1.0 GENI Network Connectivity OpenFlow Check
1.1 Goals of Network Connectivity OpenFlow Check
The goal of this check is to ensure that the GENI network is performing as expected.
1.2 Steps for Network Connectivity OpenFlow Check
- Log onto the GPO alerting system.
- Select
Service Group->Summary
in the left pane.
- In the "GENI data plane connectivity checks" group row, check for the presence of CRITICAL or PENDING service under the "Service Status Summary" column.
- Click on the OK link under the "Service Status Summary" column, which will bring you to the "Service Status Details" for all the services in OK state.
- Sort the service with the "Last Check" columns values (click on the up (ascending) orange arrow). Make sure that the time stamps are all within the last 15 minutes or so.
1.3 Network Connectivity OpenFlow Check - Pass Criteria
This check passes if there are no CRITICAL or PENDING services on step 3 of the Steps above, AND if the time stamps of the OK services are recent on step 5 of the Steps above.
1.4 Network Connectivity OpenFlow Check - Fail Criteria and Escalation
If there are CRITICAL services in step 3 above:
- Click on the CRITICAL link under the "Service Status Summary" column, which will bring you to the "Service Status Details" for all the services in CRITICAL state.
- Sort the service with the "Last Check" columns values (click on the up (ascending) orange arrow). Make sure that the time stamps are all within the last 15 minutes or so. If the time stamps:
- are within the accepted range, the services are indeed in CRITICAL states.
- are not within the accepted range, something is amiss in the monitoring system and is preventing timely status updates.
If there are PENDING services in step 3 above:
- Click on the PENDING link under the "Service Status Summary" column, which will bring you to the "Service Status Details" for all the services in PENDING state.
A PENDING state, means that the monitoring system has never reported on the availability status of a particular aggregate.
Escalation: If there are availability services in CRITICAL states: Report to ??? GMOC team - gmoc@grnoc.iu.edu
Escalation: If there are availability services in PENDING states: Report to UKY team - ???
Escalation: If there are availability services with stale time stamps: Report to UKY team - ???
Attachments (3)
-
Nagios-Service Groups Summary.png (116.1 KB) - added by 9 years ago.
Nagios-Service Groups Summary
-
Nagios-Connectivity Service Groups Details OK.png (206.2 KB) - added by 9 years ago.
Nagios-Connectivity Service Groups Details OK
-
Nagios-Connectivity Service Groups Details CRITICAL.png (236.4 KB) - added by 9 years ago.
Nagios-Connectivity Service Groups Details CRITICAL
Download all attachments as: .zip