Changes between Initial Version and Version 1 of GENIRacksHome/ExogeniRacks/AcceptanceTestStatus/EG-ADM-6


Ignore:
Timestamp:
08/24/12 13:55:33 (12 years ago)
Author:
chaos@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIRacksHome/ExogeniRacks/AcceptanceTestStatus/EG-ADM-6

    v1 v1  
     1[[PageOutline]]
     2
     3= Detailed test plan for EG-ADM-6: Control Network Disconnection Test =
     4
     5''This page is GPO's working page for performing EG-ADM-6.  It is public for informational purposes, but it is not an official status report.  See [wiki:GENIRacksHome/ExogeniRacks/AcceptanceTestStatus] for the current status of ExoGENI acceptance tests.''
     6
     7''Last substantive edit of this page: 2012-08-24''
     8
     9== Page format ==
     10
     11 * The status chart summarizes the state of this test
     12 * The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
     13 * The steps contain things i will actually do/verify:
     14   * Steps may be composed of related substeps where i find this useful for clarity
     15   * Each step is either a preparatory step (identified by "(prep)") or a verification step (the default):
     16     * Preparatory steps are just things we have to do.  They're not tests of the rack, but are prerequisites for subsequent verification steps
     17     * Verification steps are steps in which we will actually look at rack output and make sure it is as expected.  They contain a '''Using:''' block, which lists the steps to run the verification, and an '''Expect:''' block which lists what outcome is expected for the test to pass.
     18
     19== Status of test ==
     20
     21|| '''Step''' || '''State''' || '''Date completed''' || '''Open Tickets''' || '''Closed Tickets/Comments''' ||
     22|| 1A         ||             ||                      ||                    ||                               ||
     23|| 1B         ||             ||                      ||                    ||                               ||
     24|| 1C         ||             ||                      ||                    ||                               ||
     25
     26== High-level description from test plan ==
     27
     28In this test, we disconnect parts of the rack control network or its dependencies to test partial rack functionality in an outage situation.
     29
     30=== Procedure ===
     31
     32 * Simulate an outage of geni.renci.org by inserting a firewall rule on the GPO router blocking the rack from reaching it. Verify that an administrator can still access the rack, that rack monitoring to GMOC continues through the outage, and that some experimenter operations still succeed.
     33 * Simulate an outage of each of the rack head node and management switch by disabling their respective interfaces on the GPO's control network switch. Verify that GPO, ExoGENI, and GMOC monitoring all see the outage.
     34
     35=== Criteria to verify as part of this test ===
     36
     37 * V.09. When the rack control network is partially down or the rack vendor's home site is inaccessible from the rack, it is still possible to access the primary control network device and server for recovery. All devices/networks which must be operational in order for the control network switch and primary server to be reachable, are documented. (C.3.b)
     38 * VII.14. A site administrator can locate information about the network reachability of all rack infrastructure which should live on the control network, and can get alerts when any rack infrastructure control IP becomes unavailable from the rack server host, or when the rack server host cannot reach the commodity internet. (D.6.c)
     39
     40== Step 1: prevent 152.54.0.0/16 from sending traffic to the ExoGENI rack ==
     41
     42=== Step 1A: baseline before simulating control network outage ===
     43
     44==== Overview of Step 1A ====
     45
     46Run all checks before performing the disconnection test, so that if something is already down, we won't be confused:
     47 * Attempt to SSH to bbn-hn.exogeni.gpolab.bbn.com:
     48{{{
     49ssh bbn-hn.exogeni.gpolab.bbn.com
     50}}}
     51 * If SSH is successful, attempt to sudo on bbn-hn:
     52{{{
     53sudo -v
     54}}}
     55 * Browse to [https://bbn-hn.exogeni.net/rack_bbn/] and attempt to login to nagios
     56 * Run omni getversion and listresources against the BBN rack ORCA AM:
     57{{{
     58omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
     59omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc listresources
     60}}}
     61 * Run omni getversion and listresources against the BBN rack FOAM AM:
     62{{{
     63omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     64omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 listresources
     65}}}
     66 * Verify that [http://monitor.gpolab.bbn.com/connectivity/campus.html] currently shows a successful connection from argos to exogeni-vlan1750.bbn.dataplane.geni.net
     67
     68=== Step 1B: simulate a network problem ===
     69
     70==== Overview of Step 1B ====
     71
     72'''Using:'''
     73 * On maple (lab subnet router), do:
     74{{{
     75conf t
     76  ip access-list standard gst-4083-eg-adm-6
     77    deny 152.54.0.0 0.0.255.255
     78    permit any
     79  exit
     80  int gi 0/1.829
     81    ip access-group gst-4083-eg-adm-6 out
     82  exit
     83exit
     84}}}
     85 * Wait 5 minutes
     86 * Attempt to SSH to bbn-hn.exogeni.gpolab.bbn.com:
     87{{{
     88ssh bbn-hn.exogeni.gpolab.bbn.com
     89}}}
     90 * If SSH is successful, attempt to sudo on bbn-hn:
     91{{{
     92sudo -v
     93}}}
     94 * Browse to [https://bbn-hn.exogeni.net/rack_bbn/] and attempt to login to nagios
     95 * Run omni getversion and listresources against the BBN rack ORCA AM:
     96{{{
     97omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
     98omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc listresources
     99}}}
     100 * Run omni getversion and listresources against the BBN rack FOAM AM:
     101{{{
     102omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     103omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 listresources
     104}}}
     105 * Look at [http://monitor.gpolab.bbn.com/connectivity/campus.html] to see the status of the connection from argos to exogeni-vlan1750.bbn.dataplane.geni.net
     106
     107'''Verify:'''
     108 * Site admins should be able to login to bbn-hn and perform sudo operations
     109 * Rack nagios should be available (though it may show some errors based on the disconnection, obviously)
     110 * Rack ORCA and FOAM aggregates should respond to getversion and listresources
     111 * Existing experiments should continue to run, and should be able to respond to dataplane traffic
     112
     113=== Step 1C: undo the changes ===
     114
     115==== Overview of Step 1C ====
     116
     117'''Using:'''
     118 * On maple (lab subnet router), do:
     119{{{
     120conf t
     121  int gi 0/1.829
     122    no ip access-group gst-4083-eg-adm-6 out
     123  exit
     124  no ip access-list standard gst-4083-eg-adm-6
     125exit
     126}}}
     127 * Wait 5 minutes
     128 * Attempt to SSH to bbn-hn.exogeni.gpolab.bbn.com:
     129{{{
     130ssh bbn-hn.exogeni.gpolab.bbn.com
     131}}}
     132 * If SSH is successful, attempt to sudo on bbn-hn:
     133{{{
     134sudo -v
     135}}}
     136 * Browse to [https://bbn-hn.exogeni.net/rack_bbn/] and attempt to login to nagios
     137 * Run omni getversion and listresources against the BBN rack ORCA AM:
     138{{{
     139omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
     140omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc listresources
     141}}}
     142 * Run omni getversion and listresources against the BBN rack FOAM AM:
     143{{{
     144omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     145omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 listresources
     146}}}
     147 * Look at [http://monitor.gpolab.bbn.com/connectivity/campus.html] to see the status of the connection from argos to exogeni-vlan1750.bbn.dataplane.geni.net
     148
     149'''Verify:'''
     150 * Site admins should be able to login to bbn-hn and perform sudo operations
     151 * Rack nagios should be available (though it may show some errors based on the disconnection, obviously)
     152 * Rack ORCA and FOAM aggregates should respond to getversion and listresources
     153 * Existing experiments should continue to run, and should be able to respond to dataplane traffic
     154