wiki:GENIOperationsTrial/GENIOpenFlowCheck

Version 15 (modified by lnevers@bbn.com, 5 years ago) (diff)

--

CHK-005: GENI Network Connectivity OpenFlow Checks

Connectivity through the GENI network is continuously monitored via sets of pings between resources on many GENI aggregates. These resources have been allocated in the same way experimenters would allocated resources. OpenFlow controllers are also used as part of this monitoring experiment to direct the ping traffic flow across the network. The GENI Network Connectivity OpenFlow Checks procedure defines the steps to make sure that connectivity is achieved throughout the network.

BLOCKING: This check is not working yet due to the problem tracked by http://trac.gpolab.bbn.com/ops-monitoring/ticket/306

1.0 GENI Network Connectivity OpenFlow Check

1.1 Goals of Network Connectivity OpenFlow Check

The goal of this check is to ensure that the GENI network is performing as expected.

1.2 Steps for Network Connectivity OpenFlow Check

  1. Log onto the GPO alerting system. (??? does GMOC get accounts or do we change nagios to read-only???)
  2. Select Service Group->Summary in the left pane.
    Nagios-Service Groups Summary
  3. In the "GENI data plane connectivity checks" group row, check for the presence of CRITICAL or PENDING service under the "Service Status Summary" column.
  4. Click on the OK link under the "Service Status Summary" column, which will bring you to the "Service Status Details" for all the services in OK state.
    Nagios-Connectivity Service Groups Details OK
  5. Sort the service with the "Last Check" columns values (click on the up (ascending) orange arrow). Make sure that the time stamps are all within the last 15 minutes or so.

1.3 Network Connectivity OpenFlow Check - Pass Criteria

This check passes if there are no CRITICAL or PENDING services on step 3 AND if the time stamps of the OK services are recent on step 5 of the Steps above.

1.4 Network Connectivity OpenFlow Check - Fail Criteria and Escalation

If there are CRITICAL services in step 3 above:

  1. Click on the CRITICAL link under the "Service Status Summary" column, which will bring you to the "Service Status Details" for all the services in CRITICAL state.
    Nagios-Connectivity Service Groups Details CRITICAL
  2. Sort the service with the "Last Check" columns values (click on the up (ascending) orange arrow). Make sure that the time stamps are all within the last 15 minutes or so. If the time stamps:
    • are within the accepted range, the services are indeed in CRITICAL states.
    • are not within the accepted range, something is amiss in the monitoring system and is preventing timely status updates.

If there are PENDING services in step 3 above:

  1. Click on the PENDING link under the "Service Status Summary" column, which will bring you to the "Service Status Details" for all the services in PENDING state.

PENDING state, means that the monitoring system has never reported on the availability status of a particular aggregate.

Escalation: If there are availability services in CRITICAL states: Report to ??? GMOC team - gmoc@grnoc.iu.edu
Escalation: If there are availability services in PENDING states: Report to UKY team - ???
Escalation: If there are availability services with stale time stamps: Report to UKY team - ???

Attachments (3)

Download all attachments as: .zip