Changes between Version 1 and Version 2 of GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-ADM-3


Ignore:
Timestamp:
03/08/13 15:31:29 (11 years ago)
Author:
Josh Smift
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-ADM-3

    v1 v2  
    1010 * The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
    1111 * The steps contain things i will actually do/verify:
    12    * Steps may be composed of related substeps where i find this useful for clarity 
     12   * Steps may be composed of related substeps where i find this useful for clarity
    1313   * Each step is either a preparatory step (identified by "(prep)") or a verification step (the default):
    1414     * Preparatory steps are just things we have to do.  They're not tests of the rack, but are prerequisites for subsequent verification steps
     
    2020
    2121|| '''Step''' || '''State'''                    || '''Date completed''' || '''Open Tickets''' || '''Closed Tickets/Comments''' ||
    22 || 1A         || [[Color(#63B8FF,In Progress)]] ||                      ||                    ||                               ||
    23 || 1B         || [[Color(#63B8FF,In Progress)]] ||                      ||                    ||                               ||
    24 || 2A         || [[Color(#63B8FF,In Progress)]] ||                      ||                    ||                               ||
    25 || 2B         || [[Color(#63B8FF,In Progress)]] ||                      ||                    ||                               ||
     22|| 1A         || [[Color(green,Pass)]]          || 2013-03-08           ||                    ||                               ||
     23|| 1B         || [[Color(green,Pass)]]          || 2013-03-08           ||                    ||                               ||
     24|| 2A         || [[Color(green,Pass)]]          || 2013-03-08           ||                    ||                               ||
     25|| 2B         || [[Color(green,Pass)]]          || 2013-03-08           ||                    ||                               ||
    2626
    2727[[BR]]
     
    3535 1. Review relevant rack documentation about shutdown options and make a plan for the order in which to shutdown each component.
    3636 2. Cleanly shutdown and/or hard-power-off all devices in the rack, and verify that everything in the rack is powered down.
    37  3. Power on all devices, bring all logical components back online, and use monitoring and comprehensive health tests to verify that the rack is healthy again. 
     37 3. Power on all devices, bring all logical components back online, and use monitoring and comprehensive health tests to verify that the rack is healthy again.
    3838
    3939== Criteria to verify as part of this test ==
     
    4343 * V.11. Site administrators can authenticate remotely and virtually power on, power off, or power-cycle all virtual rack resources, including server and experimental VMs. (C.3.c)
    4444 * VI.16. A procedure is documented for cleanly shutting down the entire rack in case of a scheduled site outage. (C.3.c)
    45  * VII.16. A public document explains how to perform comprehensive health checks for a rack (or, if those health checks are being run automatically, how to view the current/recent results). (F.8) 
     45 * VII.16. A public document explains how to perform comprehensive health checks for a rack (or, if those health checks are being run automatically, how to view the current/recent results). (F.8)
    4646
    4747= Step 1: Shut down the rack =
     
    6262}}}
    6363 * Browse to [https://www.instageni.gpolab.bbn.com/] and attempt to login
    64  * If successful, browse to https://boss.instageni.gpolab.bbn.com/nodecontrol_list.php3?showtype=dl360 and confirm that all five nodes are up (green dot in the "Up?" column of the table)
     64 * If successful, browse to https://instageni.gpolab.bbn.com/nodecontrol_list.php3?showtype=dl360 and confirm that all five nodes are up or free (green dot or white dot in the "Up?" column of the table)
    6565 * Browse to [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3] and ensure that GPO nagios is available
    6666 * If successful, enumerate any InstaGENI-relevant errors currently outstanding in GPO nagios
     
    8181=== Results of testing Step 1A on 2013-03-08 ===
    8282
    83 FIXME: Update the date to the date on which we actually do the test.
    84 
    85 FIXME: Fill this in when we actually do the test.
     83I was able to log in to control.instageni.gpolab.bbn.com, and sudo:
     84
     85{{{
     86[14:12:37] jbs@gpolab:/home/jbs
     87+$ hostname
     88gpolab.control-nodes.geniracks.net
     89
     90[14:12:46] jbs@gpolab:/home/jbs
     91+$ sudo whoami
     92root
     93}}}
     94
     95I browsed to https://www.instageni.gpolab.bbn.com/, logged in, and clicked on the green dot (which turned red) to enter "red dot mode".
     96
     97I browsed to https://boss.instageni.gpolab.bbn.com/nodecontrol_list.php3?showtype=dl360, and observed that all five nodes are up or free (green dot or white dot in the "Up?" column of the table).
     98
     99I browsed to http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3, and found no InstaGENI-relevant errors.
     100
     101I ran getversion and listresources against the BBN rack PG AM:
     102
     103{{{
     104omni -a https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am getversion
     105omni -a https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am listresources
     106}}}
     107
     108The output was quite long, so I didn't paste it here.
     109
     110I ran omni getversion and listresources against the BBN rack FOAM AM:
     111
     112{{{
     113omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     114omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources
     115}}}
     116
     117The output was quite long, so I didn't paste it here.
     118
     119I verified that http://monitor.gpolab.bbn.com/connectivity/campus.html showed that the connection from argos to instageni-vlan1750.bbn.dataplane.geni.net is OK.
    86120
    87121== Step 1B: Shut down all rack components ==
     
    91125Shut everything down:
    92126
    93  * Browse to https://boss.instageni.gpolab.bbn.com/showpool.php, and use the table a the top of the page to identify which nodes are in the shared pool, and which are exclusive nodes.
    94127 * Log in to boss, and from there:
    95128   * Shut down the testbed:
     
    101134for node in pc{1..5} ; do sudo ssh $node shutdown -H now ; done
    102135}}}
     136     Note that nodes which are not in use (free) may not be reachable (so the SSH connection may time out).
    103137 * Log out of boss.
    104138 * Log in to control, and from there:
     
    116150=== Results of testing Step 1B on 2013-03-08 ===
    117151
    118 FIXME: Update the date to the date on which we actually do the test.
    119 
    120 FIXME: Fill this in when we actually do the test.
     152On boss, I shut down the testbed:
     153
     154{{{
     155sudo testbed-control shutdown
     156}}}
     157
     158I then shut down the nodes:
     159
     160{{{
     161for node in pc{1..5} ; do sudo ssh $node shutdown -H now ; done
     162}}}
     163
     164This didn't actually seem to shut down pc1 or pc2, which I was still able to ping and SSH in to, even after five minutes. I logged in interactively to pc1, and it said
     165
     166{{{
     167System is going down.
     168
     169Last login: Fri Mar  8 14:25:04 2013 from boss.instageni.gpolab.bbn.com
     170[root@vhost1 ~]#
     171}}}
     172
     173I tried doing the shutdown command there:
     174
     175{{{
     176[root@vhost1 ~]# shutdown -H now
     177
     178Broadcast message from root@vhost1.shared-nodes.emulab-ops.instageni.gpolab.bbn.com on pts/0 (Fri, 08 Mar 2013 14:26:52 -0500):
     179
     180The system is going down for system halt NOW!
     181}}}
     182
     183But, the system still didn't shut down. I tried init 0 instead:
     184
     185{{{
     186[root@vhost1 ~]# sudo init 0
     187
     188Broadcast message from root@vhost1.shared-nodes.emulab-ops.instageni.gpolab.bbn.com on pts/0 (Fri, 08 Mar 2013 14:27:18 -0500):
     189
     190The system is going down for power-off NOW!
     191}}}
     192
     193But the system still didn't shut down.
     194
     195...ah, but then finally, about six or seven minutes after I'd first run the shutdown, they did shut down. Ok.
     196
     197I logged out of boss, and logged in to control, and shut down the VMs that run there:
     198
     199{{{
     200sudo xm shutdown -a -w
     201}}}
     202
     203I then shut down control itself:
     204
     205{{{
     206sudo init 0 && exit
     207}}}
     208
     209I was out of the office, but remote hands (aka "Peter") visually verified that only the control node and pc1 actually powered off; he powered everything else off.
    121210
    122211= Step 2: Start up the rack on =
     
    137226=== Results of testing Step 2A on 2013-03-08 ===
    138227
    139 FIXME: Update the date to the date on which we actually do the test.
    140 
    141 FIXME: Fill this in when we actually do the test.
     228Things generally came up ok, except that the ops VM didn't fully come up. Leigh fixed it, and explained that he "removed a bad script from your ops that we fixed right around the time the BBN rack was installed. It might have got caught in the mix. The script is unnecessary on the racks at this time." so that should be ok now.
    142229
    143230== Step 2B: Test functionality after starting up the rack ==
     
    156243}}}
    157244 * Browse to [https://www.instageni.gpolab.bbn.com/] and attempt to login
    158  * If successful, browse to https://boss.instageni.gpolab.bbn.com/nodecontrol_list.php3?showtype=dl360 and confirm that all five nodes are up (green dot in the "Up?" column of the table)
     245 * If successful, browse to https://instageni.gpolab.bbn.com/nodecontrol_list.php3?showtype=dl360 and confirm that all five nodes are up or free (green dot or white dot in the "Up?" column of the table)
    159246 * Browse to [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3] and ensure that GPO nagios is available
    160247 * If successful, enumerate any InstaGENI-relevant errors currently outstanding in GPO nagios
     
    175262=== Results of testing Step 2B on 2013-03-08 ===
    176263
    177 FIXME: Update the date to the date on which we actually do the test.
    178 
    179 FIXME: Fill this in when we actually do the test.
     264I was able to log in to control.instageni.gpolab.bbn.com, and sudo:
     265
     266{{{
     267[15:07:23] jbs@gpolab:/home/jbs
     268+$ hostname
     269gpolab.control-nodes.geniracks.net
     270
     271[15:07:24] jbs@gpolab:/home/jbs
     272+$ sudo whoami
     273root
     274}}}
     275
     276I browsed to https://www.instageni.gpolab.bbn.com/, and it said
     277
     278{{{
     279                  Web Interface Temporarily Unavailable
     280                          Please Try Again Later
     281              Testbed going offline; back in a little while
     282}}}
     283
     284Leigh reports that this was due to a missing instruction in the docs, so I logged in to boss, and ran
     285
     286{{{
     287sudo testbed-control boot
     288}}}
     289
     290I was then able to browse to https://www.instageni.gpolab.bbn.com/ and log in.
     291
     292I browsed to https://boss.instageni.gpolab.bbn.com/nodecontrol_list.php3?showtype=dl360, and observed that all five nodes are up or free (green dot or white dot in the "Up?" column of the table).
     293
     294I browsed to http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3, and found errors related to amcanary trying to getversion and listresources from FOAM and PG on the rack. I scheduled a manual check of those, and it passed.
     295
     296I ran getversion and listresources against the BBN rack PG AM:
     297
     298{{{
     299omni -a https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am getversion
     300omni -a https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am listresources
     301}}}
     302
     303The output was quite long, so I didn't paste it here.
     304
     305I ran omni getversion and listresources against the BBN rack FOAM AM:
     306
     307{{{
     308omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     309omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources
     310}}}
     311
     312The output was quite long, so I didn't paste it here.
     313
     314I verified that http://monitor.gpolab.bbn.com/connectivity/campus.html showed that the connection from argos to instageni-vlan1750.bbn.dataplane.geni.net is OK.