Changes between Initial Version and Version 1 of GENIRacksAdministration/ExoGENIShutdownProcedures


Ignore:
Timestamp:
10/03/13 13:29:04 (11 years ago)
Author:
lnevers@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIRacksAdministration/ExoGENIShutdownProcedures

    v1 v1  
     1[[PageOutline]]
     2
     3= ExoGENI Shutdown Procedures =
     4
     5Your ExoGENI rack may need to shut down in an orderly fashion (e.g. for a planned reboot or a scheduled power outage), or disconnect it from the network in an emergency.  We have captured our current procedure at the GPO for the BBN ExoGENI rack in this page, which can be used as a reference.
     6
     7== Emergency disconnect ==
     8
     9If the ExoGENI rack is exhibiting incorrect or strange behavior and we need to take an action to protect our lab, RENCI's preferred emergency response procedure is that we disconnect all rack interfaces except the SSG5 VPN (which they will use to access the rack and work on the problem), and contact `exogeni-ops@renci.org`.
     10
     11In GPO lab, as of 2012-02-28, this means taking down the following connections (note: check [gsw:OpsConnectionInventory] (GPO internal) to verify this before disconnecting anything):
     12 * Control network: the rack is connected to jalapeno on vlan 829.  The SSG5 (which should remain online in an emergency) is on port `gi0/18`.  We should disconnect the head node and control switch connections, on `gi0/19` and `gi0/20`.  On jalapeno:
     13{{{
     14conf t
     15  int gi 0/19
     16    shutdown
     17    exit
     18  int gi 0/20
     19    shutdown
     20    exit
     21  save
     22  exit
     23}}}
     24 * Dataplane network: the rack has a trunk interface to `poblano[gi0/15]`.  On poblano:
     25{{{
     26conf t
     27  int gi 0/15
     28    shutdown
     29    exit
     30  save
     31  exit
     32}}}
     33
     34== Orderly shutdown ==
     35
     36Here's how to shut down the entire rack in an orderly fashion.
     37
     38=== Shut down the worker nodes ===
     39
     40Log in to each worker node (bbn-w1 through bbn-w10) from the console, and shut each down:
     41
     42{{{
     43sudo init 0 && exit
     44}}}
     45
     46You don't need to wait for each one to shut down before shutting down the next one.
     47
     48DO wait for all of them to finish shutting down, and powering off, before continuing to the next step, though.
     49
     50=== Shut down bbn-hn ===
     51
     52Log in to bbn-hn from the console, and shut it down:
     53
     54{{{
     55sudo init 0 && exit
     56}}}
     57
     58Then wait for it to shut down and power off.
     59
     60=== Power off the iSCSI device ===
     61
     62Turn off the two power switches on the back of the iSCSI device.
     63
     64=== Power off bbn-8264 ===
     65
     66Unplug the power cable on the IBM 8264 switch.
     67
     68=== Power off bbn-8052 ===
     69
     70Unplug the power cable on the IBM 8052 switch.
     71
     72=== Power off bbn-ssg ===
     73
     74Unplug the power cable on the SSG appliance.
     75
     76= Startup procedures =
     77
     78Here's how to start up the rack in an orderly fashion after a power loss.
     79
     80=== Power on bbn-ssg ===
     81
     82Plug in the power cable on the SSG appliance, and wait for the status light to turn green (should take only a few seconds).
     83
     84=== Power on bbn-8052 ===
     85
     86Plug in the power cable on the IBM 8052 switch, and wait for the startup fans (which are very loud) to quiet down (should take about twenty seconds).
     87
     88=== Power on bbn-8264 ===
     89
     90Plug in the power cable on the IBM 8264 switch, and wait for the startup fans (which are very loud) to quiet down (should take about twenty seconds).
     91
     92=== Power on the iSCSI device ===
     93
     94Turn on the two power switches on the back of the iSCSI device, and wait for the startup fans (which are very loud) to quiet down (should take about twenty seconds).
     95
     96=== Boot up bbn-hn ===
     97
     98Power on bbn-hn, and watch the console, until it boots to a login prompt.
     99
     100=== Boot up the worker nodes ===
     101
     102Power on all of the worker node (bbn-w1 through bbn-w10).
     103
     104On bbn-hn, run this command until it succeeds for all nodes:
     105
     106{{{
     107for n in {1..8} ; do ssh bbn-w${n} hostname ; done
     108}}}
     109
     110Success looks like this:
     111
     112{{{
     113+$ for n in {1..8} ; do ssh bbn-w${n} hostname ; done
     114bbn-w1
     115bbn-w2
     116bbn-w3
     117bbn-w4
     118bbn-w5
     119bbn-w6
     120bbn-w7
     121bbn-w8
     122}}}
     123
     124If any are failing in a way that isn't obviously going to get better by waiting, debug.
     125
     126=== Hand off to RENCI ===
     127
     128Various pieces of software on the rack don't start automatically at boot time, so the last step is to contact the BBN ExoGENI list at RENCI (xo-bbn@renci.org) and hand off to them to bring everything else up.