Changes between Initial Version and Version 1 of GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-ADM-3


Ignore:
Timestamp:
03/06/13 11:24:25 (11 years ago)
Author:
Josh Smift
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-ADM-3

    v1 v1  
     1[[PageOutline]]
     2
     3= Detailed test plan for IG-ADM-3: Full Rack Reboot Test =
     4
     5''This page is GPO's working page for performing IG-ADM-3.  It is public for informational purposes, but it is not an official status report.  See [wiki:GENIRacksHome/InstageniRacks/AcceptanceTestStatus] for the current status of InstaGENI acceptance tests.''
     6
     7= Page format =
     8
     9 * The status chart summarizes the state of this test
     10 * The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
     11 * The steps contain things i will actually do/verify:
     12   * Steps may be composed of related substeps where i find this useful for clarity
     13   * Each step is either a preparatory step (identified by "(prep)") or a verification step (the default):
     14     * Preparatory steps are just things we have to do.  They're not tests of the rack, but are prerequisites for subsequent verification steps
     15     * Verification steps are steps in which we will actually look at rack output and make sure it is as expected.  They contain a '''Using:''' block, which lists the steps to run the verification, and an '''Expect:''' block which lists what outcome is expected for the test to pass.
     16
     17= Status of test =
     18
     19See [wiki:GENIRacksHome/InstageniRacks/AcceptanceTestStatus#Legend] for the meanings of test states.
     20
     21|| '''Step''' || '''State'''                    || '''Date completed''' || '''Open Tickets''' || '''Closed Tickets/Comments''' ||
     22|| 1A         || [[Color(#63B8FF,In Progress)]] ||                      ||                    ||                               ||
     23|| 1B         || [[Color(#63B8FF,In Progress)]] ||                      ||                    ||                               ||
     24|| 2A         || [[Color(#63B8FF,In Progress)]] ||                      ||                    ||                               ||
     25|| 2B         || [[Color(#63B8FF,In Progress)]] ||                      ||                    ||                               ||
     26
     27[[BR]]
     28
     29= High-level description from test plan =
     30
     31In this test, a full rack reboot is performed as a drill of a procedure which a site administrator may need to perform for site maintenance.
     32
     33== Procedure ==
     34
     35 1. Review relevant rack documentation about shutdown options and make a plan for the order in which to shutdown each component.
     36 2. Cleanly shutdown and/or hard-power-off all devices in the rack, and verify that everything in the rack is powered down.
     37 3. Power on all devices, bring all logical components back online, and use monitoring and comprehensive health tests to verify that the rack is healthy again.
     38
     39== Criteria to verify as part of this test ==
     40
     41 * IV.01. All experimental hosts are configured to boot (rather than stay off pending manual intervention) when they are cleanly shut down and then remotely power-cycled. (C.3.c)
     42 * V.10. Site administrators can authenticate remotely and power on, power off, or power-cycle, all physical rack devices, including experimental hosts, servers, and network devices. (C.3.c)
     43 * V.11. Site administrators can authenticate remotely and virtually power on, power off, or power-cycle all virtual rack resources, including server and experimental VMs. (C.3.c)
     44 * VI.16. A procedure is documented for cleanly shutting down the entire rack in case of a scheduled site outage. (C.3.c)
     45 * VII.16. A public document explains how to perform comprehensive health checks for a rack (or, if those health checks are being run automatically, how to view the current/recent results). (F.8)
     46
     47= Step 1: Shut down the rack =
     48
     49== Step 1A: Test functionality before shutting down the rack ==
     50
     51=== Overview of Step 1A ===
     52
     53Check the state of the rack before shutting down:
     54
     55 * Attempt to SSH to control.instageni.gpolab.bbn.com:
     56{{{
     57ssh control.instageni.gpolab.bbn.com
     58}}}
     59 * If SSH is successful, attempt to sudo:
     60{{{
     61sudo whoami
     62}}}
     63 * Browse to [https://www.instageni.gpolab.bbn.com/] and attempt to login
     64 * If successful, browse to https://boss.instageni.gpolab.bbn.com/nodecontrol_list.php3?showtype=dl360 and confirm that all five nodes are up (green dot in the "Up?" column of the table)
     65 * Browse to [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3] and ensure that GPO nagios is available
     66 * If successful, enumerate any InstaGENI-relevant errors currently outstanding in GPO nagios
     67 * Run omni getversion and listresources against the BBN rack PG AM:
     68{{{
     69omni -a https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am getversion
     70omni -a https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am listresources
     71}}}
     72 * Run omni getversion and listresources against the BBN rack FOAM AM:
     73{{{
     74omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     75omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources
     76}}}
     77 * Verify that [http://monitor.gpolab.bbn.com/connectivity/campus.html] currently shows that the connection from argos to instageni-vlan1750.bbn.dataplane.geni.net is OK
     78
     79Make a note of anything that wasn't ok before shutting down, since it may not be ok after starting up, and if it isn't, that won't indicate a problem with the test.
     80
     81=== Results of testing Step 1A on 2013-03-08 ===
     82
     83FIXME: Update the date to the date on which we actually do the test.
     84
     85FIXME: Fill this in when we actually do the test.
     86
     87== Step 1B: Shut down all rack components ==
     88
     89=== Overview of Step 1B ===
     90
     91Shut everything down:
     92
     93 * Browse to https://boss.instageni.gpolab.bbn.com/showpool.php, and use the table a the top of the page to identify which nodes are in the shared pool, and which are exclusive nodes.
     94 * Log in to boss, and from there:
     95   * Shut down the testbed:
     96{{{
     97sudo testbed-control shutdown
     98}}}
     99   * Shut down the nodes:
     100{{{
     101for node in pc{1..5} ; do sudo ssh $node shutdown -H now ; done
     102}}}
     103 * Log out of boss.
     104 * Log in to control, and from there:
     105   * Shut down all of the VMs that run there:
     106{{{
     107sudo xm shutdown -a -w
     108}}}
     109   * Shut down control itself:
     110{{{
     111sudo init 0 && exit
     112}}}
     113
     114Visually verify that everything in the rack is powered off, and physically power off anything that isn't already.
     115
     116=== Results of testing Step 1B on 2013-03-08 ===
     117
     118FIXME: Update the date to the date on which we actually do the test.
     119
     120FIXME: Fill this in when we actually do the test.
     121
     122= Step 2: Start up the rack on =
     123
     124== Step 2A: Start up all rack components ==
     125
     126=== Overview of Step 2A ===
     127
     128Turn everything on:
     129
     130 * Power on the switches
     131 * Power on the control node
     132 * Wait for the contol node and its VMs to come up
     133 * Power on the experimenter nodes
     134
     135Verify that everything in the rack is powered on.
     136
     137=== Results of testing Step 2A on 2013-03-08 ===
     138
     139FIXME: Update the date to the date on which we actually do the test.
     140
     141FIXME: Fill this in when we actually do the test.
     142
     143== Step 2B: Test functionality after starting up the rack ==
     144
     145=== Overview of Step 2B ===
     146
     147Check the state of the rack after starting up:
     148
     149 * Attempt to SSH to control.instageni.gpolab.bbn.com:
     150{{{
     151ssh control.instageni.gpolab.bbn.com
     152}}}
     153 * If SSH is successful, attempt to sudo:
     154{{{
     155sudo whoami
     156}}}
     157 * Browse to [https://www.instageni.gpolab.bbn.com/] and attempt to login
     158 * If successful, browse to https://boss.instageni.gpolab.bbn.com/nodecontrol_list.php3?showtype=dl360 and confirm that all five nodes are up (green dot in the "Up?" column of the table)
     159 * Browse to [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3] and ensure that GPO nagios is available
     160 * If successful, enumerate any InstaGENI-relevant errors currently outstanding in GPO nagios
     161 * Run omni getversion and listresources against the BBN rack PG AM:
     162{{{
     163omni -a https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am getversion
     164omni -a https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am listresources
     165}}}
     166 * Run omni getversion and listresources against the BBN rack FOAM AM:
     167{{{
     168omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     169omni -a https://foam.instageni.gpolab.bbn.com:3626/foam/gapi/1 listresources
     170}}}
     171 * Verify that [http://monitor.gpolab.bbn.com/connectivity/campus.html] currently shows that the connection from argos to instageni-vlan1750.bbn.dataplane.geni.net is OK
     172
     173Make a note of anything that isn't ok now and *was* ok before shutting down.
     174
     175=== Results of testing Step 2B on 2013-03-08 ===
     176
     177FIXME: Update the date to the date on which we actually do the test.
     178
     179FIXME: Fill this in when we actually do the test.