Changes between Initial Version and Version 1 of GENIRacksHome/ExogeniRacks/AcceptanceTestStatus/EG-ADM-3


Ignore:
Timestamp:
08/29/12 13:39:37 (12 years ago)
Author:
chaos@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIRacksHome/ExogeniRacks/AcceptanceTestStatus/EG-ADM-3

    v1 v1  
     1[[PageOutline]]
     2
     3= Detailed test plan for EG-ADM-3: Full Rack Reboot Test =
     4
     5''This page is GPO's working page for performing EG-ADM-3.  It is public for informational purposes, but it is not an official status report.  See [wiki:GENIRacksHome/ExogeniRacks/AcceptanceTestStatus] for the current status of ExoGENI acceptance tests.''
     6
     7''Last substantive edit of this page: 2012-08-29''
     8
     9== Page format ==
     10
     11 * The status chart summarizes the state of this test
     12 * The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
     13 * The steps contain things i will actually do/verify:
     14   * Steps may be composed of related substeps where i find this useful for clarity
     15   * Each step is either a preparatory step (identified by "(prep)") or a verification step (the default):
     16     * Preparatory steps are just things we have to do.  They're not tests of the rack, but are prerequisites for subsequent verification steps
     17     * Verification steps are steps in which we will actually look at rack output and make sure it is as expected.  They contain a '''Using:''' block, which lists the steps to run the verification, and an '''Expect:''' block which lists what outcome is expected for the test to pass.
     18
     19== Status of test ==
     20
     21|| '''Step''' || '''State'''           || '''Date completed''' || '''Open Tickets''' || '''Closed Tickets/Comments''' ||
     22|| 1A         ||                       ||                      ||                    ||                               ||
     23
     24== High-level description from test plan ==
     25
     26In this test, a full rack reboot is performed as a drill of a procedure which a site administrator may need to perform for site maintenance.
     27
     28=== Procedure ===
     29
     30 1. Review relevant rack documentation about shutdown options and make a plan for the order in which to shutdown each component.
     31 2. Cleanly shutdown and/or hard-power-off all devices in the rack, and verify that everything in the rack is powered down.
     32 3. Power on all devices, bring all logical components back online, and use monitoring and comprehensive health tests to verify that the rack is healthy again.
     33
     34=== Criteria to verify as part of this test ===
     35
     36 * IV.01. All experimental hosts are configured to boot (rather than stay off pending manual intervention) when they are cleanly shut down and then remotely power-cycled. (C.3.c)
     37 * V.10. Site administrators can authenticate remotely and power on, power off, or power-cycle, all physical rack devices, including experimental hosts, servers, and network devices. (C.3.c)
     38 * V.11. Site administrators can authenticate remotely and virtually power on, power off, or power-cycle all virtual rack resources, including server and experimental VMs. (C.3.c)
     39 * VI.16. A procedure is documented for cleanly shutting down the entire rack in case of a scheduled site outage. (C.3.c)
     40 * VII.16. A public document explains how to perform comprehensive health checks for a rack (or, if those health checks are being run automatically, how to view the current/recent results). (F.8)
     41
     42== Step 1: shut down the rack ==
     43
     44=== Step 1A: baseline before shutting down the rack ===
     45
     46==== Overview of Step 1A ====
     47
     48Run all checks before performing the reboot test, so that if something is already down, we won't be confused:
     49 * Attempt to SSH to bbn-hn.exogeni.gpolab.bbn.com:
     50{{{
     51ssh bbn-hn.exogeni.gpolab.bbn.com
     52}}}
     53 * If SSH is successful, attempt to sudo on bbn-hn:
     54{{{
     55sudo -v
     56}}}
     57 * Browse to [https://bbn-hn.exogeni.net/rack_bbn/] and attempt to login to nagios
     58 * If successful, enumerate any errors currently outstanding in rack nagios
     59 * Browse to [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3] and ensure that GPO nagios is available
     60 * If successful, enumerate any exogeni-relevant errors currently outstanding in GPO nagios
     61 * Run omni getversion and listresources against the BBN rack ORCA AM:
     62{{{
     63omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
     64omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc listresources
     65}}}
     66 * Run omni getversion and listresources against the BBN rack FOAM AM:
     67{{{
     68omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     69omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 listresources
     70}}}
     71 * Verify that [http://monitor.gpolab.bbn.com/connectivity/campus.html] currently shows a successful connection from argos to exogeni-vlan1750.bbn.dataplane.geni.net
     72
     73=== Step 1B: shutdown all rack components ===
     74
     75==== Overview of Step 1B ====
     76
     77 * Login to bbn-hn from the console, and shut it down:
     78{{{
     79sudo init 0
     80}}}
     81 Wait for successful shutdown and poweroff.
     82 * Login to each worker node (bbn-w1 - bbn-w10) from their consoles, and shut each down:
     83{{{
     84sudo init 0
     85}}}
     86 Wait for successful shutdown and poweroff.
     87 * Power off the iSCSI
     88 * Power off bbn-8264
     89 * Power off bbn-ssg
     90 * Power off bbn-8052
     91
     92Verify that everything in the rack is powered off.
     93
     94== Step 2: power the rack back on ==
     95
     96=== Step 2A: power on all rack components ===
     97
     98==== Overview of Step 2A ====
     99
     100 * Power on bbn-8052
     101 * Power on bbn-ssg
     102 * Power on bbn-8264
     103 * Power on the iSCSI
     104 * Power on each worker (bbn-w1 - bbn-w10), and wait until all workers have booted to login prompts
     105 * Power on bbn-hn and wait until it has booted to a login prompt
     106
     107Verify that everything in the rack is powered on.
     108
     109=== Step 2B: RENCI performs manual ORCA startup steps ===
     110
     111==== Overview of Step 2B ====
     112
     113 * Notify exogeni-design that the rack is online and ready for ORCA to be brought online.
     114 * Wait for a response that the rack is now healthy
     115
     116=== Step 2C: test functionality after bringing up the rack ===
     117
     118==== Overview of Step 2C ====
     119
     120Run all checks again and report on any discrepancies:
     121 * Attempt to SSH to bbn-hn.exogeni.gpolab.bbn.com:
     122{{{
     123ssh bbn-hn.exogeni.gpolab.bbn.com
     124}}}
     125 * If SSH is successful, attempt to sudo on bbn-hn:
     126{{{
     127sudo -v
     128}}}
     129 * Browse to [https://bbn-hn.exogeni.net/rack_bbn/] and attempt to login to nagios
     130 * If successful, enumerate any errors currently outstanding in rack nagios
     131 * Browse to [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail&sorttype=2&sortoption=3] and ensure that GPO nagios is available
     132 * If successful, enumerate any exogeni-relevant errors currently outstanding in GPO nagios
     133 * Run omni getversion and listresources against the BBN rack ORCA AM:
     134{{{
     135omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc getversion
     136omni -a https://bbn-hn.exogeni.gpolab.bbn.com:11443/orca/xmlrpc listresources
     137}}}
     138 * Run omni getversion and listresources against the BBN rack FOAM AM:
     139{{{
     140omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 getversion
     141omni -a https://bbn-hn.exogeni.gpolab.bbn.com:3626/foam/gapi/1 listresources
     142}}}
     143 * Verify that [http://monitor.gpolab.bbn.com/connectivity/campus.html] currently shows a successful connection from argos to exogeni-vlan1750.bbn.dataplane.geni.net
     144