wiki:GENIRacksAdministration/ExoGENIShutdownProcedures

Version 2 (modified by lnevers@bbn.com, 10 years ago) (diff)

--

ExoGENI Shutdown Procedures

Your ExoGENI rack may need to shut down in an orderly fashion (e.g. for a planned reboot or a scheduled power outage), or disconnect it from the network in an emergency. We have captured our current procedure at the GPO for the BBN ExoGENI rack in this page, which can be used as a reference.

This page is referenced in the ExoGENI Administrative FAQ page.

Emergency disconnect

If the ExoGENI rack is exhibiting incorrect or strange behavior and we need to take an action to protect our lab, RENCI's preferred emergency response procedure is that we disconnect all rack interfaces except the SSG5 VPN (which they will use to access the rack and work on the problem), and contact exogeni-ops@renci.org.

In GPO lab, as of 2012-02-28, this means taking down the following connections (note: check OpsConnectionInventory (GPO internal) to verify this before disconnecting anything):

  • Control network: the rack is connected to jalapeno on vlan 829. The SSG5 (which should remain online in an emergency) is on port gi0/18. We should disconnect the head node and control switch connections, on gi0/19 and gi0/20. On jalapeno:
    conf t
      int gi 0/19
        shutdown
        exit
      int gi 0/20
        shutdown
        exit
      save
      exit
    
  • Dataplane network: the rack has a trunk interface to poblano[gi0/15]. On poblano:
    conf t
      int gi 0/15
        shutdown
        exit
      save
      exit
    

Orderly shutdown

Here's how to shut down the entire rack in an orderly fashion.

Shut down the worker nodes

Log in to each worker node (bbn-w1 through bbn-w10) from the console, and shut each down:

sudo init 0 && exit 

You don't need to wait for each one to shut down before shutting down the next one.

DO wait for all of them to finish shutting down, and powering off, before continuing to the next step, though.

Shut down bbn-hn

Log in to bbn-hn from the console, and shut it down:

sudo init 0 && exit 

Then wait for it to shut down and power off.

Power off the iSCSI device

Turn off the two power switches on the back of the iSCSI device.

Power off bbn-8264

Unplug the power cable on the IBM 8264 switch.

Power off bbn-8052

Unplug the power cable on the IBM 8052 switch.

Power off bbn-ssg

Unplug the power cable on the SSG appliance.

Startup procedures

Here's how to start up the rack in an orderly fashion after a power loss.

Power on bbn-ssg

Plug in the power cable on the SSG appliance, and wait for the status light to turn green (should take only a few seconds).

Power on bbn-8052

Plug in the power cable on the IBM 8052 switch, and wait for the startup fans (which are very loud) to quiet down (should take about twenty seconds).

Power on bbn-8264

Plug in the power cable on the IBM 8264 switch, and wait for the startup fans (which are very loud) to quiet down (should take about twenty seconds).

Power on the iSCSI device

Turn on the two power switches on the back of the iSCSI device, and wait for the startup fans (which are very loud) to quiet down (should take about twenty seconds).

Boot up bbn-hn

Power on bbn-hn, and watch the console, until it boots to a login prompt.

Boot up the worker nodes

Power on all of the worker node (bbn-w1 through bbn-w10).

On bbn-hn, run this command until it succeeds for all nodes:

for n in {1..8} ; do ssh bbn-w${n} hostname ; done

Success looks like this:

+$ for n in {1..8} ; do ssh bbn-w${n} hostname ; done
bbn-w1
bbn-w2
bbn-w3
bbn-w4
bbn-w5
bbn-w6
bbn-w7
bbn-w8

If any are failing in a way that isn't obviously going to get better by waiting, debug.

Hand off to RENCI

Various pieces of software on the rack don't start automatically at boot time, so the last step is to contact the BBN ExoGENI list at RENCI (xo-bbn@renci.org) and hand off to them to bring everything else up.