Changes between Initial Version and Version 1 of GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-ADM-6


Ignore:
Timestamp:
02/28/13 09:35:25 (11 years ago)
Author:
chaos@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-ADM-6

    v1 v1  
     1[[PageOutline]]
     2
     3= Detailed test plan for IG-ADM-6: Control Network Disconnection Test =
     4
     5''This page is GPO's working page for performing IG-ADM-6.  It is public for informational purposes, but it is not an official status report.  See [wiki:GENIRacksHome/InstageniRacks/AcceptanceTestStatus] for the current status of InstaGENI acceptance tests.''
     6
     7''Last substantive edit of this page: 2013-02-28''
     8
     9== Page format ==
     10
     11 * The status chart summarizes the state of this test
     12 * The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
     13 * The steps contain things i will actually do/verify:
     14   * Steps may be composed of related substeps where i find this useful for clarity
     15   * Each step is either a preparatory step (identified by "(prep)") or a verification step (the default):
     16     * Preparatory steps are just things we have to do.  They're not tests of the rack, but are prerequisites for subsequent verification steps
     17     * Verification steps are steps in which we will actually look at rack output and make sure it is as expected.  They contain a '''Using:''' block, which lists the steps to run the verification, and an '''Expect:''' block which lists what outcome is expected for the test to pass.
     18
     19== Status of test ==
     20
     21|| '''Step''' || '''State'''           || '''Date completed''' || '''Open Tickets''' || '''Closed Tickets/Comments''' ||
     22|| 1A         ||                       ||                      ||                    ||                               ||
     23|| 1B         ||                       ||                      ||                    ||                               ||
     24|| 1C         ||                       ||                      ||                    ||                               ||
     25
     26== High-level description from test plan ==
     27
     28In this test, we disconnect parts of the rack control network or its dependencies to test partial rack functionality in an outage situation.
     29
     30=== Procedure ===
     31
     32 * Simulate an outage of emulab.net by inserting a firewall rule on the GPO router blocking the rack from reaching it. Verify that an administrator can still access the rack, that rack monitoring to GMOC continues through the outage, and that some experimenter operations still succeed.
     33 * Simulate an outage of the rack management network by disabling its interface on the GPO's control network switch. Verify that GPO and GMOC monitoring see the outage.
     34
     35=== Criteria to verify as part of this test ===
     36
     37 * V.09. When the rack control network is partially down or the rack vendor's home site is inaccessible from the rack, it is still possible to access the primary control network device and server for recovery. All devices/networks which must be operational in order for the control network switch and primary server to be reachable, are documented. (C.3.b)
     38 * VII.14. A site administrator can locate information about the network reachability of all rack infrastructure which should live on the control network, and can get alerts when any rack infrastructure control IP becomes unavailable from the rack server host, or when the rack server host cannot reach the commodity internet. (D.6.c)
     39
     40== Step 1: prevent 155.98.33.0/24 from sending traffic to the InstaGENI rack ==
     41
     42=== Step 1A: baseline before simulating control network outage ===
     43
     44==== Overview of Step 1A ====
     45
     46Run all checks before performing the disconnection test, so that if something is already down, we won't be confused:
     47 * Attempt to SSH to boss.instageni.gpolab.bbn.com:
     48{{{
     49ssh boss.instageni.gpolab.bbn.com
     50}}}
     51   * If SSH is successful, attempt to sudo on boss:
     52{{{
     53sudo -v
     54}}}
     55   * If SSH is successful, ping ops.emulab.net to verify connectivity:
     56{{{
     57ping ops.emulab.net
     58}}}
     59 * Attempt to SSH to gpolab.control-nodes.geniracks.net:
     60{{{
     61ssh gpolab.control-nodes.geniracks.net
     62}}}
     63 * If SSH is successful, attempt to sudo on control:
     64{{{
     65sudo -v
     66}}}
     67 * Attempt to SSH to foam.instageni.gpolab.bbn.com:
     68{{{
     69ssh foam.instageni.gpolab.bbn.com
     70}}}
     71 * If SSH is successful, attempt to sudo on foam:
     72{{{
     73sudo -v
     74}}}
     75 * Browse to [https://instageni.gpolab.bbn.com/] and attempt to login as chaos
     76   * If successful, enter red dot mode
     77 * Browse to [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3] and ensure that GPO nagios is available
     78   * If successful, enumerate any instageni-relevant errors currently outstanding in GPO nagios
     79 * Run omni getversion and listresources against the BBN rack ProtoGENI AM:
     80{{{
     81omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am getversion
     82omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am listresources -o
     83}}}
     84 * Run omni getversion and listresources against the BBN rack FOAM AM:
     85{{{
     86omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     87omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources -o
     88}}}
     89 * Verify that [http://monitor.gpolab.bbn.com/connectivity/campus.html] currently shows a successful connection from argos to instageni-vlan1750.bbn.dataplane.geni.net
     90
     91=== Step 1B: simulate a network problem ===
     92
     93==== Overview of Step 1B ====
     94
     95'''Using:'''
     96 * On maple (lab subnet router), do:
     97{{{
     98conf t
     99  ip access-list standard gst-4544-ig-adm-6
     100    deny 155.98.33.0 0.0.0.255
     101    permit any
     102  exit
     103  int gi 0/1.830
     104    ip access-group gst-4544-ig-adm-6 out
     105  exit
     106exit
     107}}}
     108 * Wait 5 minutes
     109 * Attempt to SSH to boss.instageni.gpolab.bbn.com:
     110{{{
     111ssh boss.instageni.gpolab.bbn.com
     112}}}
     113   * If SSH is successful, attempt to sudo on boss:
     114{{{
     115sudo -v
     116}}}
     117   * If SSH is successful, ping ops.emulab.net to verify connectivity:
     118{{{
     119ping ops.emulab.net
     120}}}
     121 * Attempt to SSH to gpolab.control-nodes.geniracks.net:
     122{{{
     123ssh gpolab.control-nodes.geniracks.net
     124}}}
     125 * If SSH is successful, attempt to sudo on control:
     126{{{
     127sudo -v
     128}}}
     129 * Attempt to SSH to foam.instageni.gpolab.bbn.com:
     130{{{
     131ssh foam.instageni.gpolab.bbn.com
     132}}}
     133 * If SSH is successful, attempt to sudo on foam:
     134{{{
     135sudo -v
     136}}}
     137 * Browse to [https://instageni.gpolab.bbn.com/] and attempt to login as chaos
     138   * If successful, enter red dot mode
     139 * Browse to [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3] and ensure that GPO nagios is available
     140   * If successful, enumerate any instageni-relevant errors currently outstanding in GPO nagios
     141 * Run omni getversion and listresources against the BBN rack ProtoGENI AM:
     142{{{
     143omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am getversion
     144omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am listresources -o
     145}}}
     146 * Run omni getversion and listresources against the BBN rack FOAM AM:
     147{{{
     148omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     149omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources -o
     150}}}
     151 * Verify that [http://monitor.gpolab.bbn.com/connectivity/campus.html] currently shows a successful connection from argos to instageni-vlan1750.bbn.dataplane.geni.net
     152
     153
     154'''Verify:'''
     155 * Site admins should be able to login to boss, control, and foam, and perform sudo operations
     156 * The Emulab web UI should be available, and red dot mode should work
     157 * Monitoring of the rack should be available (though it may show some errors due to the disconnection)
     158 * Rack ProtoGENI and FOAM aggregates should respond to getversion and listresources
     159 * Existing experiments should continue to run, and should be able to respond to dataplane traffic
     160
     161=== Step 1C: undo the changes ===
     162
     163==== Overview of Step 1C ====
     164
     165'''Using:'''
     166 * On maple (lab subnet router), do:
     167{{{
     168conf t
     169  int gi 0/1.830
     170    no ip access-group gst-4544-ig-adm-6 out
     171  exit
     172  no ip access-list standard gst-4544-ig-adm-6
     173exit
     174}}}
     175 * Wait 5 minutes
     176 * Attempt to SSH to boss.instageni.gpolab.bbn.com:
     177{{{
     178ssh boss.instageni.gpolab.bbn.com
     179}}}
     180   * If SSH is successful, attempt to sudo on boss:
     181{{{
     182sudo -v
     183}}}
     184   * If SSH is successful, ping ops.emulab.net to verify connectivity:
     185{{{
     186ping ops.emulab.net
     187}}}
     188 * Attempt to SSH to gpolab.control-nodes.geniracks.net:
     189{{{
     190ssh gpolab.control-nodes.geniracks.net
     191}}}
     192 * If SSH is successful, attempt to sudo on control:
     193{{{
     194sudo -v
     195}}}
     196 * Attempt to SSH to foam.instageni.gpolab.bbn.com:
     197{{{
     198ssh foam.instageni.gpolab.bbn.com
     199}}}
     200 * If SSH is successful, attempt to sudo on foam:
     201{{{
     202sudo -v
     203}}}
     204 * Browse to [https://instageni.gpolab.bbn.com/] and attempt to login as chaos
     205   * If successful, enter red dot mode
     206 * Browse to [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3] and ensure that GPO nagios is available
     207   * If successful, enumerate any instageni-relevant errors currently outstanding in GPO nagios
     208 * Run omni getversion and listresources against the BBN rack ProtoGENI AM:
     209{{{
     210omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am getversion
     211omni -a https://utah.geniracks.net:12369/protogeni/xmlrpc/am listresources -o
     212}}}
     213 * Run omni getversion and listresources against the BBN rack FOAM AM:
     214{{{
     215omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     216omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources -o
     217}}}
     218 * Verify that [http://monitor.gpolab.bbn.com/connectivity/campus.html] currently shows a successful connection from argos to instageni-vlan1750.bbn.dataplane.geni.net
     219
     220
     221'''Verify:'''
     222 * Site admins should be able to login to boss, control, and foam, and perform sudo operations
     223 * The Emulab web UI should be available, and red dot mode should work
     224 * Monitoring of the rack should be available (though it may show some errors due to the disconnection)
     225 * Rack ProtoGENI and FOAM aggregates should respond to getversion and listresources
     226 * Existing experiments should continue to run, and should be able to respond to dataplane traffic
     227