wiki:GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-ADM-6

Version 2 (modified by chaos@bbn.com, 7 years ago) (diff)

--

Detailed test plan for IG-ADM-6: Control Network Disconnection Test

This page is GPO's working page for performing IG-ADM-6. It is public for informational purposes, but it is not an official status report. See GENIRacksHome/InstageniRacks/AcceptanceTestStatus for the current status of InstaGENI acceptance tests.

Last substantive edit of this page: 2013-02-28

Page format

  • The status chart summarizes the state of this test
  • The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
  • The steps contain things i will actually do/verify:
    • Steps may be composed of related substeps where i find this useful for clarity
    • Each step is either a preparatory step (identified by "(prep)") or a verification step (the default):
      • Preparatory steps are just things we have to do. They're not tests of the rack, but are prerequisites for subsequent verification steps
      • Verification steps are steps in which we will actually look at rack output and make sure it is as expected. They contain a Using: block, which lists the steps to run the verification, and an Expect: block which lists what outcome is expected for the test to pass.

Status of test

Step State Date completed Open Tickets Closed Tickets/Comments
1A
1B
1C

High-level description from test plan

In this test, we disconnect parts of the rack control network or its dependencies to test partial rack functionality in an outage situation.

Procedure

  • Simulate an outage of emulab.net by inserting a firewall rule on the GPO router blocking the rack from reaching it. Verify that an administrator can still access the rack, that rack monitoring to GMOC continues through the outage, and that some experimenter operations still succeed.

Criteria to verify as part of this test

  • V.09. When the rack control network is partially down or the rack vendor's home site is inaccessible from the rack, it is still possible to access the primary control network device and server for recovery. All devices/networks which must be operational in order for the control network switch and primary server to be reachable, are documented. (C.3.b)
  • VII.14. A site administrator can locate information about the network reachability of all rack infrastructure which should live on the control network, and can get alerts when any rack infrastructure control IP becomes unavailable from the rack server host, or when the rack server host cannot reach the commodity internet. (D.6.c)

Step 1: prevent 155.98.33.0/24 from sending traffic to the InstaGENI rack

Step 1A: baseline before simulating control network outage

Overview of Step 1A

Run all checks before performing the disconnection test, so that if something is already down, we won't be confused:

  • Attempt to SSH to boss.instageni.gpolab.bbn.com:
    ssh boss.instageni.gpolab.bbn.com
    
    • If SSH is successful, attempt to sudo on boss:
      sudo -v
      
    • If SSH is successful, ping ops.emulab.net to verify connectivity:
      ping ops.emulab.net
      
  • Attempt to SSH to gpolab.control-nodes.geniracks.net:
    ssh gpolab.control-nodes.geniracks.net
    
  • If SSH is successful, attempt to sudo on control:
    sudo -v
    
  • Attempt to SSH to foam.instageni.gpolab.bbn.com:
    ssh foam.instageni.gpolab.bbn.com
    
  • If SSH is successful, attempt to sudo on foam:
    sudo -v
    
  • Browse to https://instageni.gpolab.bbn.com/ and attempt to login as chaos
    • If successful, enter red dot mode
  • Browse to http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3 and ensure that GPO nagios is available
    • If successful, enumerate any instageni-relevant errors currently outstanding in GPO nagios
  • Run omni getversion and listresources against the BBN rack ProtoGENI AM:
    omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am getversion
    omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am listresources -o
    
  • Run omni getversion and listresources against the BBN rack FOAM AM:
    omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
    omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources -o
    
  • Verify that http://monitor.gpolab.bbn.com/connectivity/campus.html currently shows a successful connection from argos to instageni-vlan1750.bbn.dataplane.geni.net

Step 1B: simulate a network problem

Overview of Step 1B

Using:

  • On maple (lab subnet router), do:
    conf t
      ip access-list standard gst-4544-ig-adm-6
        deny 155.98.33.0 0.0.0.255
        permit any
      exit
      int gi 0/1.830
        ip access-group gst-4544-ig-adm-6 out
      exit
    exit
    
  • Wait 5 minutes
  • Attempt to SSH to boss.instageni.gpolab.bbn.com:
    ssh boss.instageni.gpolab.bbn.com
    
    • If SSH is successful, attempt to sudo on boss:
      sudo -v
      
    • If SSH is successful, ping ops.emulab.net to verify connectivity:
      ping ops.emulab.net
      
  • Attempt to SSH to gpolab.control-nodes.geniracks.net:
    ssh gpolab.control-nodes.geniracks.net
    
  • If SSH is successful, attempt to sudo on control:
    sudo -v
    
  • Attempt to SSH to foam.instageni.gpolab.bbn.com:
    ssh foam.instageni.gpolab.bbn.com
    
  • If SSH is successful, attempt to sudo on foam:
    sudo -v
    
  • Browse to https://instageni.gpolab.bbn.com/ and attempt to login as chaos
    • If successful, enter red dot mode
  • Browse to http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3 and ensure that GPO nagios is available
    • If successful, enumerate any instageni-relevant errors currently outstanding in GPO nagios
  • Run omni getversion and listresources against the BBN rack ProtoGENI AM:
    omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am getversion
    omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am listresources -o
    
  • Run omni getversion and listresources against the BBN rack FOAM AM:
    omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
    omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources -o
    
  • Verify that http://monitor.gpolab.bbn.com/connectivity/campus.html currently shows a successful connection from argos to instageni-vlan1750.bbn.dataplane.geni.net

Verify:

  • Site admins should be able to login to boss, control, and foam, and perform sudo operations
  • The Emulab web UI should be available, and red dot mode should work
  • Monitoring of the rack should be available (though it may show some errors due to the disconnection)
  • Rack ProtoGENI and FOAM aggregates should respond to getversion and listresources
  • Existing experiments should continue to run, and should be able to respond to dataplane traffic

Step 1C: undo the changes

Overview of Step 1C

Using:

  • On maple (lab subnet router), do:
    conf t
      int gi 0/1.830
        no ip access-group gst-4544-ig-adm-6 out
      exit
      no ip access-list standard gst-4544-ig-adm-6
    exit
    
  • Wait 5 minutes
  • Attempt to SSH to boss.instageni.gpolab.bbn.com:
    ssh boss.instageni.gpolab.bbn.com
    
    • If SSH is successful, attempt to sudo on boss:
      sudo -v
      
    • If SSH is successful, ping ops.emulab.net to verify connectivity:
      ping ops.emulab.net
      
  • Attempt to SSH to gpolab.control-nodes.geniracks.net:
    ssh gpolab.control-nodes.geniracks.net
    
  • If SSH is successful, attempt to sudo on control:
    sudo -v
    
  • Attempt to SSH to foam.instageni.gpolab.bbn.com:
    ssh foam.instageni.gpolab.bbn.com
    
  • If SSH is successful, attempt to sudo on foam:
    sudo -v
    
  • Browse to https://instageni.gpolab.bbn.com/ and attempt to login as chaos
    • If successful, enter red dot mode
  • Browse to http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3 and ensure that GPO nagios is available
    • If successful, enumerate any instageni-relevant errors currently outstanding in GPO nagios
  • Run omni getversion and listresources against the BBN rack ProtoGENI AM:
    omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am getversion
    omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am listresources -o
    
  • Run omni getversion and listresources against the BBN rack FOAM AM:
    omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
    omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources -o
    
  • Verify that http://monitor.gpolab.bbn.com/connectivity/campus.html currently shows a successful connection from argos to instageni-vlan1750.bbn.dataplane.geni.net

Verify:

  • Site admins should be able to login to boss, control, and foam, and perform sudo operations
  • The Emulab web UI should be available, and red dot mode should work
  • Monitoring of the rack should be available (though it may show some errors due to the disconnection)
  • Rack ProtoGENI and FOAM aggregates should respond to getversion and listresources
  • Existing experiments should continue to run, and should be able to respond to dataplane traffic