wiki:GENIRacksHome/ExogeniRacks/AcceptanceTestStatus/EG-ADM-6

Version 5 (modified by chaos@bbn.com, 12 years ago) (diff)

--

Detailed test plan for EG-ADM-6: Control Network Disconnection Test

This page is GPO's working page for performing EG-ADM-6. It is public for informational purposes, but it is not an official status report. See GENIRacksHome/ExogeniRacks/AcceptanceTestStatus for the current status of ExoGENI acceptance tests.

Last substantive edit of this page: 2012-08-29

Page format

  • The status chart summarizes the state of this test
  • The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
  • The steps contain things i will actually do/verify:
    • Steps may be composed of related substeps where i find this useful for clarity
    • Each step is either a preparatory step (identified by "(prep)") or a verification step (the default):
      • Preparatory steps are just things we have to do. They're not tests of the rack, but are prerequisites for subsequent verification steps
      • Verification steps are steps in which we will actually look at rack output and make sure it is as expected. They contain a Using: block, which lists the steps to run the verification, and an Expect: block which lists what outcome is expected for the test to pass.

Status of test

Step State Date completed Open Tickets Closed Tickets/Comments
1A Color(green,Pass)? 2012-08-29
1B Color(green,Pass)? 2012-08-29
1C Color(green,Pass)? 2012-08-29

High-level description from test plan

In this test, we disconnect parts of the rack control network or its dependencies to test partial rack functionality in an outage situation.

Procedure

  • Simulate an outage of geni.renci.org by inserting a firewall rule on the GPO router blocking the rack from reaching it. Verify that an administrator can still access the rack, that rack monitoring to GMOC continues through the outage, and that some experimenter operations still succeed.
  • Simulate an outage of each of the rack head node and management switch by disabling their respective interfaces on the GPO's control network switch. Verify that GPO, ExoGENI, and GMOC monitoring all see the outage.

Criteria to verify as part of this test

  • V.09. When the rack control network is partially down or the rack vendor's home site is inaccessible from the rack, it is still possible to access the primary control network device and server for recovery. All devices/networks which must be operational in order for the control network switch and primary server to be reachable, are documented. (C.3.b)
  • VII.14. A site administrator can locate information about the network reachability of all rack infrastructure which should live on the control network, and can get alerts when any rack infrastructure control IP becomes unavailable from the rack server host, or when the rack server host cannot reach the commodity internet. (D.6.c)

Step 1: prevent 152.54.0.0/16 from sending traffic to the ExoGENI rack

Step 1A: baseline before simulating control network outage

Overview of Step 1A

Run all checks before performing the disconnection test, so that if something is already down, we won't be confused:

  • Attempt to SSH to bbn-hn.exogeni.gpolab.bbn.com:
    ssh bbn-hn.exogeni.gpolab.bbn.com
    
  • If SSH is successful, attempt to sudo on bbn-hn:
    sudo -v
    
  • Browse to https://bbn-hn.exogeni.net/rack_bbn/ and attempt to login to nagios
  • If successful, enumerate any errors currently outstanding in rack nagios
  • Browse to http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3 and ensure that GPO nagios is available
  • If successful, enumerate any exogeni-relevant errors currently outstanding in GPO nagios
  • Run omni getversion and listresources against the BBN rack ORCA AM:
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc listresources
    
  • Run omni getversion and listresources against the BBN rack FOAM AM:
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 listresources
    
  • Verify that http://monitor.gpolab.bbn.com/connectivity/campus.html currently shows a successful connection from argos to exogeni-vlan1750.bbn.dataplane.geni.net

Results of testing step 1A on 2012-08-29

  • Login to bbn-hn was successful
  • I noticed that sudo -v doesn't ask for my password... I guess i was using sudo whoami for the previous tests, but anyway sudo -l did ask for a password. I want to ask about that configuration, but that's not part of this test.
  • Login to nagios was successful. Outstanding errors:
    • critical:
      • Updates needed on bbn-w1, bbn-w2, bbn-w3
      • 8052.bbn.xo reports that interfaces: 31-36,45-46 are down
    • warning:
      • Updates needed on bbn-hn
      • 8052.bbn.xo reports that interfaces: 10-15,41-44 are the "wrong speed" (running at 1Gbps, expected 100Mbps)
  • GPO nagios is available. Current errors are:
    TANGO-openflow4.stanford.edu/gmoc_monitoring
    ganel.gpolab.bbn.com/cpu_idle
    gardil.gpolab.bbn.com/cpu_idle
    ilian.gpolab.bbn.com/ping_192.1.248.250
    
    none of which pertain to the ExoGENI rack.
  • ORCA getversion output:
    jericho,[~],16:23(0)$ omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
    INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
    INFO:omni:Using control framework pg
    INFO:omni:AM URN: unspecified_AM_URN (url: https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc) has version:
    INFO:omni:{   'code': {   'geni_code': 0},
        'geni_api': 2,
        'value': {   'geni_ad_rspec_versions': [   {   'extensions': [   'http://hpn.east.isi.edu/rspec/ext/stitch/0.1/stitch-schema.xsd',
                                                                         'http://www.protogeni.net/resources/rspec/ext/emulab/1/ptop_extension.xsd'],
                                                       'namespace': 'http://www.geni.net/resources/rspec/3',
                                                       'schema': 'http://www.geni.net/resources/rspec/3/ad.xsd',
                                                       'type': 'GENI',
                                                       'version': '3'}],
                     'geni_api': 2,
                     'geni_api_versions': {   '1': 'https://geni.renci.org:11443/orca/xmlrpc/geniV1',
                                              '2': 'https://geni.renci.org:11443/orca/xmlrpc/geni'},
                     'geni_request_rspec_versions': [   {   'extensions': [   'http://www.protogeni.net/resources/rspec/ext/shared-vlan/1',
                                                                              'http://www.geni.net/resources/rspec/ext/postBootScript/1'],
                                                            'namespace': 'http://www.geni.net/resources/rspec/3',
                                                            'schema': 'http://www.geni.net/resources/rspec/3/request.xsd',
                                                            'type': 'GENI',
                                                            'version': '3'}],
                     'orca_version': 'ORCA Camano: v.3.1-extended.build-4724'}}
    INFO:omni: ------------------------------------------------------------
    INFO:omni: Completed getversion:
    
      Options as run:
                    aggregate: https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc
                    configfile: /home/chaos/omni/omni_pgeni
                    framework: pg
                    native: True
    
      Args: getversion
    
      Result Summary: 
    Got version for 1 out of 1 aggregates
     
    INFO:omni: ============================================================
    
  • ORCA listresources succeeded, and i saved the output to:
    rspec-bbn-hn-exogeni-gpolab-bbn-com-11443-orca.20120829.080807.xml
    
  • FOAM getversion:
    jericho,[~],08:08(0)$ omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
    INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
    INFO:omni:Using control framework pg
    INFO:omni:AM URN: unspecified_AM_URN (url: https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1) has version:
    INFO:omni:{   'ad_rspec_versions': [   {   'extensions': [   'http://www.geni.net/resources/rspec/ext/openflow/3'],
                                     'namespace': 'http://www.geni.net/resources/rspec/3',
                                     'schema': 'http://www.geni.net/resources/rspec/3/ad.xsd',
                                     'type': 'GENI',
                                     'version': '3'}],
        'foam_version': '0.8.2',
        'geni_api': 1,
        'request_rspec_versions': [   {   'extensions': [   'http://www.geni.net/resources/rspec/ext/openflow/3',
                                                            'http://www.geni.net/resources/rspec/ext/openflow/4',
                                                            'http://www.geni.net/resources/rspec/ext/flowvisor/1'],
                                          'namespace': 'http://www.geni.net/resources/rspec/3',
                                          'schema': 'http://www.geni.net/resources/rspec/3/request.xsd',
                                          'type': 'GENI',
                                          'version': '3'}],
        'site_info': {   }}
    INFO:omni: ------------------------------------------------------------
    INFO:omni: Completed getversion:
    
      Options as run:
                    aggregate: https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1
                    configfile: /home/chaos/omni/omni_pgeni
                    framework: pg
                    native: True
    
      Args: getversion
    
      Result Summary: 
    Got version for 1 out of 1 aggregates
     
    INFO:omni: ============================================================
    
  • FOAM listresources succeeded, and i saved the output to:
    rspec-bbn-hn-exogeni-gpolab-bbn-com-3626-foam-gapi.20120829.080927.xml
    
  • The GPO lab to ExoGENI test dataplane connection is currently up

Step 1B: simulate a network problem

Overview of Step 1B

Using:

  • On maple (lab subnet router), do:
    conf t
      ip access-list standard gst-4083-eg-adm-6
        deny 152.54.0.0 0.0.255.255
        permit any
      exit
      int gi 0/1.829
        ip access-group gst-4083-eg-adm-6 out
      exit
    exit
    
  • Wait 5 minutes
  • Attempt to SSH to bbn-hn.exogeni.gpolab.bbn.com:
    ssh bbn-hn.exogeni.gpolab.bbn.com
    
  • If SSH is successful, attempt to sudo on bbn-hn:
    sudo -v
    
  • Browse to https://bbn-hn.exogeni.net/rack_bbn/ and attempt to login to nagios
  • If successful, enumerate any errors currently outstanding in rack nagios
  • Browse to http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3 and ensure that GPO nagios is available
  • If successful, enumerate any exogeni-relevant errors currently outstanding in GPO nagios
  • Run omni getversion and listresources against the BBN rack ORCA AM:
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc listresources
    
  • Run omni getversion and listresources against the BBN rack FOAM AM:
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 listresources
    
  • Look at http://monitor.gpolab.bbn.com/connectivity/campus.html to see the status of the connection from argos to exogeni-vlan1750.bbn.dataplane.geni.net

Verify:

  • Site admins should be able to login to bbn-hn and perform sudo operations
  • Rack nagios should be available (though it may show some errors based on the disconnection, obviously)
  • Rack ORCA and FOAM aggregates should respond to getversion and listresources
  • Existing experiments should continue to run, and should be able to respond to dataplane traffic

Results of testing step 1B on 2012-08-29

  • SSH is still successful
  • I can still run sudo whoami
  • I can still login to rack nagios, and the set of problems is exactly as before
  • GPO nagios is still available, and the set of down services is exactly as before
  • ORCA omni getversion succeeds:
    jericho,[~],08:09(0)$ omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
    INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
    INFO:omni:Using control framework pg
    INFO:omni:AM URN: unspecified_AM_URN (url: https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc) has version:
    INFO:omni:{   'code': {   'geni_code': 0},
        'geni_api': 2,
        'value': {   'geni_ad_rspec_versions': [   {   'extensions': [   'http://hpn.east.isi.edu/rspec/ext/stitch/0.1/stitch-schema.xsd',
                                                                         'http://www.protogeni.net/resources/rspec/ext/emulab/1/ptop_extension.xsd'],
                                                       'namespace': 'http://www.geni.net/resources/rspec/3',
                                                       'schema': 'http://www.geni.net/resources/rspec/3/ad.xsd',
                                                       'type': 'GENI',
                                                       'version': '3'}],
                     'geni_api': 2,
                     'geni_api_versions': {   '1': 'https://geni.renci.org:11443/orca/xmlrpc/geniV1',
                                              '2': 'https://geni.renci.org:11443/orca/xmlrpc/geni'},
                     'geni_request_rspec_versions': [   {   'extensions': [   'http://www.protogeni.net/resources/rspec/ext/shared-vlan/1',
                                                                              'http://www.geni.net/resources/rspec/ext/postBootScript/1'],
                                                            'namespace': 'http://www.geni.net/resources/rspec/3',
                                                            'schema': 'http://www.geni.net/resources/rspec/3/request.xsd',
                                                            'type': 'GENI',
                                                            'version': '3'}],
                     'orca_version': 'ORCA Camano: v.3.1-extended.build-4724'}}
    INFO:omni: ------------------------------------------------------------
    INFO:omni: Completed getversion:
    
      Options as run:
                    aggregate: https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc
                    configfile: /home/chaos/omni/omni_pgeni
                    framework: pg
                    native: True
    
      Args: getversion
    
      Result Summary: 
    Got version for 1 out of 1 aggregates
     
    INFO:omni: ============================================================
    
  • ORCA omni listresources succeeds, and i've saved that output as:
    rspec-bbn-hn-exogeni-gpolab-bbn-com-11443-orca.20120829.082044.xml
    
  • FOAM omni getversion succeeds:
    jericho,[~],08:20(0)$ omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
    INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
    INFO:omni:Using control framework pg
    INFO:omni:AM URN: unspecified_AM_URN (url: https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1) has version:
    INFO:omni:{   'ad_rspec_versions': [   {   'extensions': [   'http://www.geni.net/resources/rspec/ext/openflow/3'],
                                     'namespace': 'http://www.geni.net/resources/rspec/3',
                                     'schema': 'http://www.geni.net/resources/rspec/3/ad.xsd',
                                     'type': 'GENI',
                                     'version': '3'}],
        'foam_version': '0.8.2',
        'geni_api': 1,
        'request_rspec_versions': [   {   'extensions': [   'http://www.geni.net/resources/rspec/ext/openflow/3',
                                                            'http://www.geni.net/resources/rspec/ext/openflow/4',
                                                            'http://www.geni.net/resources/rspec/ext/flowvisor/1'],
                                          'namespace': 'http://www.geni.net/resources/rspec/3',
                                          'schema': 'http://www.geni.net/resources/rspec/3/request.xsd',
                                          'type': 'GENI',
                                          'version': '3'}],
        'site_info': {   }}
    INFO:omni: ------------------------------------------------------------
    INFO:omni: Completed getversion:
    
      Options as run:
                    aggregate: https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1
                    configfile: /home/chaos/omni/omni_pgeni
                    framework: pg
                    native: True
    
      Args: getversion
    
      Result Summary: 
    Got version for 1 out of 1 aggregates
     
    INFO:omni: ============================================================
    
  • FOAM omni listresources succeeds, and i've saved that output as:
    rspec-bbn-hn-exogeni-gpolab-bbn-com-3626-foam-gapi.20120829.082137.xml
    
  • The dataplane connection between argos and Tim's test VM is still working
  • On bbn-hn, i did verify that i couldn't ping geni.renci.org or control.exogeni.net:
    bbn-hn,[~],12:16(0)$ ping geni.renci.org
    PING geni.renci.org (152.54.3.34) 56(84) bytes of data.
    ^C
    --- geni.renci.org ping statistics ---
    3 packets transmitted, 0 received, 100% packet loss, time 2170ms
    
    bbn-hn,[~],12:19(1)$ ping control.exogeni.net
    PING control.exogeni.net (152.54.1.65) 56(84) bytes of data.
    ^C
    --- control.exogeni.net ping statistics ---
    2 packets transmitted, 0 received, 100% packet loss, time 1198ms
    

Step 1C: undo the changes

Overview of Step 1C

Using:

  • On maple (lab subnet router), do:
    conf t
      int gi 0/1.829
        no ip access-group gst-4083-eg-adm-6 out
      exit
      no ip access-list standard gst-4083-eg-adm-6
    exit
    
  • Wait 5 minutes
  • Attempt to SSH to bbn-hn.exogeni.gpolab.bbn.com:
    ssh bbn-hn.exogeni.gpolab.bbn.com
    
  • If SSH is successful, attempt to sudo on bbn-hn:
    sudo -v
    
  • Browse to https://bbn-hn.exogeni.net/rack_bbn/ and attempt to login to nagios
  • Run omni getversion and listresources against the BBN rack ORCA AM:
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc listresources
    
  • Run omni getversion and listresources against the BBN rack FOAM AM:
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
    omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 listresources
    
  • Look at http://monitor.gpolab.bbn.com/connectivity/campus.html to see the status of the connection from argos to exogeni-vlan1750.bbn.dataplane.geni.net

Verify:

  • Site admins should be able to login to bbn-hn and perform sudo operations
  • Rack nagios should be available (though it may show some errors based on the disconnection, obviously)
  • Rack ORCA and FOAM aggregates should respond to getversion and listresources
  • Existing experiments should continue to run, and should be able to respond to dataplane traffic

Results of testing step 1C on 2012-08-29

  • I can login to bbn-hn and sudo (no surprises there)
  • I verified that i can now ping geni.renci.org and control.exogeni.net:
    bbn-hn,[~],12:19(1)$ ping geni.renci.org
    PING geni.renci.org (152.54.3.34) 56(84) bytes of data.
    64 bytes from geni.renci.org (152.54.3.34): icmp_seq=1 ttl=49 time=23.7 ms
    64 bytes from geni.renci.org (152.54.3.34): icmp_seq=2 ttl=49 time=22.7 ms
    ^C
    --- geni.renci.org ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1369ms
    rtt min/avg/max/mdev = 22.794/23.277/23.760/0.483 ms
    bbn-hn,[~],12:24(0)$ ping control.exogeni.net
    PING control.exogeni.net (152.54.1.65) 56(84) bytes of data.
    64 bytes from control.exogeni.net (152.54.1.65): icmp_seq=1 ttl=49 time=22.8 ms
    64 bytes from control.exogeni.net (152.54.1.65): icmp_seq=2 ttl=49 time=23.1 ms
    ^C
    --- control.exogeni.net ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1390ms
    rtt min/avg/max/mdev = 22.896/23.005/23.114/0.109 ms
    
  • Everything else seems to be unchanged.

I tried one additional thing:

  • I ran omni listresources against geni.renci.org's ORCA
  • I redid the changes on maple to block 152.54.0.0/16 access
  • I ran omni listresources against geni.renci.org's ORCA again
  • Those two listresources outputs also did not differ