Version 32 (modified by 12 years ago) (diff) | ,
---|
-
Detailed test plan for IG-ADM-1: Rack Receipt and Inventory Test
- Page format
- Status of test
- High-level description from test plan
- Step 1 (prep): InstaGENI and GPO power and wire the BBN rack
- Step 2: Configure and verify DNS
-
Step 3: GPO requests and receives administrator accounts
- Step 3A: GPO requests access to boss and ops nodes
- Step 3B: GPO requests access to FOAM VM
- Step 3C: GPO requests access to FlowVisor VM
- Step 3D: GPO requests access to infrastructure server
- Step 3E: GPO requests access to network devices
- Step 3F: GPO requests access to shared OpenVZ nodes
- Step 3G: GPO requests access to iLO remote management interfaces for …
- Step 3H: GPO gets access to allocated bare metal nodes by default
- Step 4: GPO inventories the rack based on our own processes
- Step 5: Configure operational alerting for the rack
- Step 6: Setup contact info and change control procedures
Detailed test plan for IG-ADM-1: Rack Receipt and Inventory Test
This page is GPO's working page for performing IG-ADM-1. It is public for informational purposes, but it is not an official status report. See GENIRacksHome/InstageniRacks/AcceptanceTestStatus for the current status of InstaGENI acceptance tests.
Last substantive edit of this page: 2012-11-12
Page format
- The status chart summarizes the state of this test
- The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
- The steps contain things i will actually do/verify:
- Steps may be composed of related substeps where i find this useful for clarity
- Each step is identified as either "(prep)" or "(verify)":
- Prep steps are just things we have to do. They're not tests of the rack, but are prerequisites for subsequent verification steps
- Verify steps are steps in which we will actually look at rack output and make sure it is as expected. They contain a Using: block, which lists the steps to run the verification, and an Expect: block which lists what outcome is expected for the test to pass.
Status of test
See GENIRacksHome/InstageniRacks/AcceptanceTestStatus for the meanings of test states.
Note: all steps of this test are blocked on the arrival of the BBN rack at BBN. However, we plan to do preliminary testing of some steps using the rack at Utah. For the time being, we need to differentiate between steps which are blocked until the BBN rack arrives, and steps which may be blocked from testing at Utah by some shorter-term requirement, as follows:
- Color(yellow,Blocked-site)?: A step which will not be tested on the Utah rack, and is blocked until the BBN site rack arrives.
- Color(orange,Blocked-Utah)?: A step which will be tested on the Utah rack, and is blocked on a requirement for access or configuration of the Utah rack.
Step | State | Date completed | Open Tickets | Closed Tickets/Comments |
1 | Color(yellow,Blocked-site)? | blocked on purchase and shipping of BBN rack | ||
2A | Color(yellow,Blocked-site)? | 23 | blocked on 1; i've done no testing of this yet, but opened a ticket anyway based on a DNS mismatch i found on the Utah rack | |
2B | Color(yellow,Blocked-site)? | blocked on 2A | ||
2C | Color(yellow,Blocked-site)? | blocked on 2B | ||
3A | Color(yellow,Complete)? | requires re-testing after 1 is completed | ||
3B | Color(yellow,Complete)? | requires re-testing after 1 is completed | ||
3C | Color(yellow,Complete)? | (42) requires re-testing after 1 is completed | ||
3D | Color(yellow,Complete)? | requires re-testing after 1 is completed | ||
3E | Color(yellow,Complete)? | requires re-testing after 1 is completed | ||
3F | Color(yellow,Complete)? | requires re-testing after 1 is completed | ||
3G | Color(yellow,Complete)? | (21) the SSL (issuer,serial) collision problem described in 21 will be a problem for site admins who use Firefox; requires re-testing after 1 is completed | ||
3H | Color(yellow,Complete)? | requires re-testing after 1 is completed | ||
4A | Color(yellow,Blocked-site)? | blocked on 1 | ||
4B | Color(yellow,Blocked-site)? | blocked on 1 | ||
4C | Color(yellow,Blocked-site)? | blocked on 1 | ||
4D | Color(yellow,Blocked-site)? | blocked on 1 | ||
5A | Color(yellow,Complete)? | 23 | may need to be revisited pending resolution of control host's hostname; blocked on 1 for testing of private switch IPs | |
5B | Color(yellow,Complete)? | requires re-testing after Utah mesoscale dataplane is connected via UEN; requires re-testing after 1 is completed | ||
5C | Color(red,Fail)? | 2012-05-27 | the InstaGENI model does not include a "rack health" high-level UI | |
5D | Color(yellow,Complete)? | requires re-testing after 1 is completed | ||
6A | Color(yellow,Complete)? | 43 | (43) requires re-testing after 1 is completed | |
6B | Color(#63B8FF,In progress)? | waiting for verification response from GMOC | ||
6C | Color(yellow,Complete)? | interim notifications during the test period have been agreed on; revisit longer-term plan for notifications later |
High-level description from test plan
This "test" uses BBN as an example site by verifying that we can do all the things we need to do to integrate the rack into our standard local procedures for systems we host.
Procedure
- InstaGENI and GPO power and wire the BBN rack
- GPO configures the instageni.gpolab.bbn.com DNS namespace and 192.1.242.128/25 IP space, and enters all public IP addresses used by the rack into DNS.
- GPO requests and receives administrator accounts on the rack and GPO sysadmins receive read access to all InstaGENI monitoring of the rack.
- GPO inventories the physical rack contents, network connections and VLAN configuration, and power connectivity, using our standard operational inventories.
- GPO, InstaGENI, and GMOC share information about contact information and change control procedures, and InstaGENI operators subscribe to GENI operations mailing lists and submit their contact information to GMOC.
- GPO reviews the documented parts list, power requirements, physical and logical network connectivity requirements, and site administrator community requirements, verifying that these documents should be sufficient for a new site to use when setting up a rack.
Criteria to verify as part of this test
- VI.02. A public document contains a parts list for each rack. (F.1)
- VI.03. A public document states the detailed power requirements of the rack, including how many PDUs are shipped with the rack, how many of the PDUs are required to power the minimal set of shipped equipment, the part numbers of the PDUs, and the NEMA input connector type needed by each PDU. (F.1)
- VI.04. A public document states the physical network connectivity requirements between the rack and the site network, including number, allowable bandwidth range, and allowed type of physical connectors, for each of the control and dataplane networks. (F.1)
- VI.05. A public document states the minimal public IP requirements for the rack, including: number of distinct IP ranges and size of each range, hostname to IP mappings which should be placed in site DNS, whether the last-hop routers for public IP ranges subnets sit within the rack or elsewhere on the site, and what firewall configuration is desired for the control network. (F.1)
- VI.06. A public document states the dataplane network requirements and procedures for a rack, including necessary core backbone connectivity and documentation, any switch configuration options needed for compatibility with the L2 core, and the procedure for connecting non-rack-controlled VLANs and resources to the rack dataplane. (F.1)
- VI.07. A public document explains the requirements that site administrators have to the GENI community, including how to join required mailing lists, how to keep their support contact information up-to-date, how and under what circumstances to work with Legal, Law Enforcement and Regulatory(LLR) Plan, how to best contact the rack vendor with operational problems, what information needs to be provided to GMOC to support emergency stop, and how to interact with GMOC when an Emergency Stop request is received. (F.3, C.3.d)
- VI.14. A procedure is documented for creating new site administrator and operator accounts. (C.3.a)
- VII.01. Using the provided documentation, GPO is able to successfully power and wire their rack, and to configure all needed IP space within a per-rack subdomain of gpolab.bbn.com. (F.1)
- VII.02. Site administrators can understand the physical power, console, and network wiring of components inside their rack and document this in their preferred per-site way. (F.1)
Step 1 (prep): InstaGENI and GPO power and wire the BBN rack
This step covers the physical delivery of the rack to BBN, the transport of the rack inside BBN to the GPO lab, and the cabling, powering, and initial configuration of the rack.
Step 2: Configure and verify DNS
Step 2A (verify): Find out what IP-to-hostname mapping to use
Using:
- If the rack IP requirements documentation for the rack exists:
- Review that documentation and determine what IP to hostname mappings should exist for
192.1.242.128/25
- Review that documentation and determine what IP to hostname mappings should exist for
- Otherwise:
- Iterate with
instageni-ops
to determine the IP to hostname mappings to use for192.1.242.128/25
- Iterate with
Expect:
- Reasonable IP-to-hostname mappings for 126 valid IPs allocated for InstaGENI use in
192.1.242.128/25
Step 2B (prep): Insert IP-to-hostname mapping in DNS
- Fully populate
192.1.242.128/25
PTR entries in GPO lab DNS - Fully populate
instageni.gpolab.bbn.com
A entries in GPO lab DNS
Step 2C (verify): Test all PTR records
Using:
- From a BBN desktop host:
for lastoct in {129..255}; do host 192.1.242.$lastoct done
Expect:
- All results look like:
$lastoct.242.1.192.in-addr.arpa domain name pointer <something reasonable>
and none look like:Host $lastoct.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Step 3: GPO requests and receives administrator accounts
Step 3A: GPO requests access to boss and ops nodes
Using:
- Request accounts for GPO ops staffers on boss.instageni.gpolab.bbn.com and ops.instageni.gpolab.bbn.com
- Chaos tries to SSH to chaos@boss.instageni.gpolab.bbn.com
- Josh tries to SSH to jbs@boss.instageni.gpolab.bbn.com
- Tim tries to SSH to tupty@boss.instageni.gpolab.bbn.com
- Chaos tries to SSH to chaos@ops.instageni.gpolab.bbn.com
- Josh tries to SSH to jbs@ops.instageni.gpolab.bbn.com
- Tim tries to SSH to tupty@ops.instageni.gpolab.bbn.com
- Chaos tries to run a minimal command as sudo on boss:
sudo whoami
- Chaos tries to run a minimal command as sudo on ops:
sudo whoami
Verify:
- Logins succeed for Chaos, Josh, and Tim on both nodes
- The commands work:
$ sudo whoami root
Results of testing step 3A: 2012-05-15
Note: this is being tested on the Utah GENI rack, where only Chaos has an account. So Tim and Josh will not be testing, and the hosts to test are boss.utah.geniracks.net
and ops.utah.geniracks.net
.
- Chaos successfully used public-key login and sudo from a BBN subnet (128.89.68.0/23) to boss:
capybara,[~],11:39(0)$ ssh chaos@boss.utah.geniracks.net Last login: Tue May 15 07:29:07 2012 from capybara.bbn.co Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 8.3-RC1 (XEN) #0: Tue Mar 13 16:27:12 MDT 2012 Welcome to FreeBSD! Need to see the calendar for this month? Simply type "cal". To see the whole year, type "cal -y". -- Dru <genesis@istar.ca> > bash boss,[~],09:39(0)$ sudo whoami root
- Chaos successfully used public-key login and sudo from a BBN subnet (128.89.68.0/23) to ops:
capybara,[~],11:40(0)$ ssh chaos@ops.utah.geniracks.net Last login: Sat May 12 15:41:57 2012 from capybara.bbn.co Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 8.3-RC1 (XEN) #0: Tue Mar 13 16:27:12 MDT 2012 Welcome to FreeBSD! ops,[~],09:40(0)$ sudo whoami root
Step 3B: GPO requests access to FOAM VM
- Request accounts for GPO ops staffers on foam.instageni.gpolab.bbn.com
- Chaos tries to SSH to chaos@foam.instageni.gpolab.bbn.com
- Josh tries to SSH to jbs@foam.instageni.gpolab.bbn.com
- Tim tries to SSH to tupty@foam.instageni.gpolab.bbn.com
- Chaos tries to run a minimal command as sudo on foam:
sudo whoami
Verify:
- Logins succeed for Chaos, Josh, and Tim on the FOAM VM
- The command works:
$ sudo whoami root
Results of testing step 3B: 2012-07-04
- Chaos can SSH to foam.utah.geniracks.net:
$ ssh foam.utah.geniracks.net Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) * Documentation: https://help.ubuntu.com/ Last login: Tue Jul 3 12:57:20 2012 from capybara.bbn.com foam,[~],09:33(0)$
- Chaos can sudo on foam:
foam,[~],09:33(0)$ sudo whoami root
Step 3C: GPO requests access to FlowVisor VM
- Request accounts for GPO ops staffers on flowvisor.instageni.gpolab.bbn.com
- Chaos tries to SSH to chaos@flowvisor.instageni.gpolab.bbn.com
- Josh tries to SSH to jbs@flowvisor.instageni.gpolab.bbn.com
- Tim tries to SSH to tupty@flowvisor.instageni.gpolab.bbn.com
- Chaos tries to run a minimal command as sudo on flowvisor:
sudo whoami
Verify:
- Logins succeed for Chaos, Josh, and Tim on the FlowVisor VM
- The command works:
$ sudo whoami root
Results of testing step 3C: 2012-11-12
- Chaos can SSH to flowvisor.utah.geniracks.net:
$ ssh flowvisor.utah.geniracks.net Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) * Documentation: https://help.ubuntu.com/ Last login: Fri Jul 27 08:09:08 2012 from capybara.bbn.com
- Chaos can sudo on flowvisor:
flowvisor,[~],10:14(0)$ sudo whoami root
Step 3D: GPO requests access to infrastructure server
- Request accounts for GPO ops staffers on the VM server node which runs boss, ops, foam, and flowvisor
- Chaos tries to SSH to the VM server node
- Josh tries to SSH to the VM server node
- Tim tries to SSH to the VM server node
- Chaos tries to run a minimal command as sudo on the VM server node:
sudo whoami
Verify:
- Logins succeed for Chaos, Josh, and Tim on the host
- The command works:
$ sudo whoami root
Results of testing step 3D: 2012-05-15
Note: this is being tested on the Utah GENI rack, where only Chaos has an account. So Tim and Josh will not be testing, and the host to test is utah.control.geniracks.net
.
- Chaos successfully used public-key login and sudo from a BBN subnet (128.89.68.0/23) to the control node:
capybara,[~],14:53(0)$ ssh utah.control.geniracks.net Welcome to Ubuntu precise (development branch) (GNU/Linux 3.2.0-23-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Tue May 15 12:53:17 MDT 2012 System load: 0.0 Users logged in: 1 Usage of /: 44.3% of 5.85GB IP address for xenbr0: 155.98.34.2 Memory usage: 1% IP address for xenbr1: 10.1.1.254 Swap usage: 0% IP address for xenbr2: 10.2.1.254 Processes: 142 IP address for xenbr3: 10.3.1.254 Graph this data and manage this system at https://landscape.canonical.com/ Last login: Tue May 15 12:52:31 2012 from capybara.bbn.com control,[~],12:53(0)$ sudo whoami root
Step 3E: GPO requests access to network devices
Using:
- Request accounts for GPO ops staffers on the InstaGENI rack control and dataplane network devices from instageni-ops
Verify:
- I know what hostname or IP address to login to to reach each of the control and dataplane switches
- I know what source IPs are allowed to remotely access the control and dataplane switches via SSH
- I can successfully perform those logins at least once
- I can successfully run a few test commands to verify enable mode:
show running-config show mac-address-table
Results of testing step 3E: 2012-05-16
Note: testing using Utah rack
- IP addresses:
- The control switch's IP is
10.1.1.253
(procurve1
in boss's/etc/hosts
file) - The dataplane switch's IP is
10.2.1.253
(procurve2
in boss's/etc/hosts
file)
- The control switch's IP is
- These control IPs are on a private network, so they can only be accessed from other things on that network. For our purposes, that is boss.utah.geniracks.net, which has an interface on each network.
- Login from boss to procurve1 using password in
/usr/testbed/etc/switch.pswd
:boss,[~],09:23(0)$ telnet procurve1 Trying 10.1.1.253... Connected to procurve1. Escape character is '^]'. ... Password: ... ProCurve Switch 2610-24# show running-config Running configuration: ; J9085A Configuration Editor; Created on release #R.11.70 ... ProCurve Switch 2610-24# show mac-address Status and Counters - Port Address Table ... ProCurve Switch 2610-24# exit ProCurve Switch 2610-24> exit Do you want to log out [y/n]? y
- Login from boss to procurve2 using password in
/usr/testbed/etc/switch.pswd
:boss,[~],10:23(0)$ telnet procurve2 Trying 10.2.1.253... Connected to procurve2. Escape character is '^]'. ... Password: ... ProCurve Switch 6600ml-48G-4XG# show running-config Running configuration: ; J9452A Configuration Editor; Created on release #K.14.41 ... ProCurve Switch 6600ml-48G-4XG# show mac-address Status and Counters - Port Address Table ... ProCurve Switch 6600ml-48G-4XG# exit ProCurve Switch 6600ml-48G-4XG> exit Do you want to log out [y/n]? y
Step 3F: GPO requests access to shared OpenVZ nodes
Using:
- Determine an experimental host which is currently configured as a shared OpenVZ node
- From boss.instageni.gpolab.bbn.com, try to SSH to the node
- On the node, try to run a minimal command as sudo:
sudo whoami
Verify:
- Login to the OpenVZ host is successful
- Access to the node is as root and/or it is possible to run a command via sudo
Results of testing step 3F: 2012-05-16
- pc5 is currently configured as a shared OpenVZ node: it is in the experiment
emulab-ops/shared-nodes
- From boss:
boss,[~],20:49(0)$ sudo ssh pc5 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that the RSA host key has just been changed. The fingerprint for the RSA key sent by the remote host is 46:63:92:67:c8:75:20:4e:52:9f:2d:f6:cb:58:16:77. Please contact your system administrator. Add correct host key in /root/.ssh/known_hosts to get rid of this message. Offending key in /root/.ssh/known_hosts:18 Password authentication is disabled to avoid man-in-the-middle attacks. Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks. [root@vhost1 ~]# whoami root
Step 3G: GPO requests access to iLO remote management interfaces for experimental nodes
Using:
- GPO requests access to the experimental node iLO management interfaces from instageni-ops
- Determine how to use these interfaces to access remote control and remote console interfaces for experimental nodes
- For each experimental node in the BBN rack:
- Access the iLO interface and view status information
- View the interface for remotely power-cycling the node
- Launch the remote console for the node
Verify:
- GPO is able to determine the procedure for accessing the iLO interfaces
- Login to each iLO succeeds
- The remote power-cycle interface exists and appears to be usable (don't try power-cycling any nodes during this test)
- Launching the remote console at each iLO succeeds
Results of testing step 3G: 2012-05-16
- Here's Utah's information about how to use the elabman consoles:
- The password for the elabman account is in
boss:/usr/testbed/etc/ilo.pswd
- The IP address for each node can be found via the web UI:
- In red dot mode, pull up the page for the individual node, e.g. https://boss.utah.geniracks.net/shownode.php3?node_id=pc3
- Find the "Management IP" line
- The password for the elabman account is in
- This worked for pc3:
$ ssh elabman@155.98.34.104 The authenticity of host '155.98.34.104 (155.98.34.104)' can't be established. DSA key fingerprint is 93:b0:c9:e7:bd:cd:bd:e4:94:b6:a2:12:aa:17:80:e8. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '155.98.34.104' (DSA) to the list of known hosts. elabman@155.98.34.104's password: User:elabman logged-in to ILOUSE211XXJS.utah.geniracks.net(155.98.34.104) iLO 3 Advanced 1.28 at Jan 13 2012 Server Name: USE211XXJS Server Power: On </>hpiLO->
- There is a command called
power
, and it seems clear that it could be used to reset the device:</>hpiLO-> help power status=0 status_tag=COMMAND COMPLETED Wed May 16 17:18:09 2012 POWER : The power command is used to change the power state of the server and is limited to users with the Power and Reset privilege Usage: power -- Displays the current server power state power on -- Turns the server on power off -- Turns the server off power off hard -- Force the server off using press and hold power reset -- Reset the server </>hpiLO-> power status=0 status_tag=COMMAND COMPLETED Wed May 16 17:18:15 2012 power: server power is currently: On
- I can also browse to https://155.98.34.104/ and login as elabman there too
- I can get MAC address information for 4x iSCSI connections and 4x NICs, as well as the iLO itself, from the web UI
- From the web UI, trying: Remote Console -> Remote Console -> Java Integrated Remote Console -> Launch
- This launches, but the console says "No Video"
- I can browse to https://155.98.34.103/ (supposed to be the iLO for pc1), but i get:
Secure Connection Failed An error occurred during a connection to 155.98.34.103. You have received an invalid certificate. Please contact the server administrator or email correspondent and give them the following information: Your certificate contains the same serial number as another certificate issued by the certificate authority. Please get a new certificate containing a unique serial number. (Error code: sec_error_reused_issuer_and_serial) The page you are trying to view can not be shown because the authenticity of the received data could not be verified. Please contact the web site owners to inform them of this problem. Alternatively, use the command found in the help menu to report this broken site.
- On the pc3 iLO, i can browse to: Administration -> Security -> SSL Certificate, and see various SSL information. Let's look into that on the various iLOs:
- pc1 (.103):
Issued To CN=ILOUSE211XXJR.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US Issued By C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust) Valid From Wed, 11 Jan 2012 Valid Until Mon, 12 Jan 2037 Serial Number 57
- pc2 (.102):
Issued To CN=ILOUSE211XXJP.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US Issued By C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust) Valid From Wed, 11 Jan 2012 Valid Until Mon, 12 Jan 2037 Serial Number 55
- pc3 (.104):
Issued To CN=ILOUSE211XXJS.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US Issued By C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust) Valid From Wed, 11 Jan 2012 Valid Until Mon, 12 Jan 2037 Serial Number 57
- pc4 (.105):
Issued To CN=ILOUSE211XXJT.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US Issued By C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust) Valid From Wed, 11 Jan 2012 Valid Until Mon, 12 Jan 2037 Serial Number 53
- pc5 (.101):
Issued To CN=ILOUSE211XXJN.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US Issued By C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust) Valid From Wed, 11 Jan 2012 Valid Until Mon, 12 Jan 2037 Serial Number 54
- pc1 (.103):
- Testing remote console:
- pc5 (.101), currently allocated into emulab-ops/shared-nodes experiment: says "No Video"
- pc2 (.102), currently idle: video works, shows PXE boot prompt
- pc1 (.103), currently idle: video works, shows PXE boot prompt
- pc3 (.104), currently allocated into pgeni-gpolab-bbn-com/ecgtest experiment: says "No Video"
- i did a deletesliver on ecgtest while looking at the console. After awhile, it rebooted, and i watched the BIOS splash screen and the load into frisbee. The frisbee MFS appears to support the VGA console as well: i was able to watch frisbee progress.
- pc4 (.105), currently idle: video works, shows PXE boot prompt
Summary: the Fedora OS images don't have VGA support, so they don't work with the remote consoles. The MFSes and the boot sequence itself do have such support.
Step 3H: GPO gets access to allocated bare metal nodes by default
Prerequisites:
- A bare metal node is available for allocation by InstaGENI
- Someone has successfully allocated the node for a bare metal experiment
Using:
- From boss, try to SSH into root on the allocated worker node
Verify:
- We find out the IP address/hostname at which to reach the allocated worker node
- Login to the node using root's SSH key succeeds
Results of testing step 3H: 2012-05-16
- The second prerequisite has not been met (all bare metal nodes are unallocated at this time). To meet it, i will try this via omni:
$ omni createslice ecgtest ... Result Summary: Created slice with Name ecgtest, URN urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest, Expiration 2012-05-17 17:32:24+00:00 ... $ omni createsliver -a http://www.utah.geniracks.net/protoge ni/xmlrpc/am ecgtest ~/omni/rspecs/request/misc/protogeni-any-one-node.rspec INFO:omni:Loading config file /home/chaos/omni/omni_pgeni INFO:omni:Using control framework pg INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest expires on 2012-05-17 17:32:24 UTC INFO:omni:Creating sliver(s) from rspec file /home/chaos/omni/rspecs/request/misc/protogeni-any-one-node.rspec for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest INFO:omni:Asked http://www.utah.geniracks.net/protogeni/xmlrpc/am to reserve resources. Result: INFO:omni:<?xml version="1.0" ?> INFO:omni:<!-- Reserved resources for: Slice: ecgtest At AM: URL: http://www.utah.geniracks.net/protogeni/xmlrpc/am --> INFO:omni:<rspec xmlns="http://protogeni.net/resources/rspec/0.2"> <node component_manager_urn="urn:publicid:IDN+utah.geniracks.net+authority+cm" component_manager_uuid="c133552e-688f-11e1-8314-00009b6224df" component_urn="urn:publicid:IDN+utah.geniracks.net+node+pc3" component_uuid="49574b15-753a-11e1-a16c-00009b6224df" exclusive="1" hostname="pc3.utah.geniracks.net" sliver_urn="urn:publicid:IDN+utah.geniracks.net+sliver+364" sliver_uuid="e326971a-9f74-11e1-af1c-00009b6224df" sshdport="22" virtual_id="geni1" virtualization_subtype="raw" virtualization_type="emulab-vnode"> <services> <login authentication="ssh-keys" hostname="pc3.utah.geniracks.net" port="22" username="chaos"/> </services> </node> </rspec> INFO:omni: ------------------------------------------------------------ INFO:omni: Completed createsliver: Options as run: aggregate: http://www.utah.geniracks.net/protogeni/xmlrpc/am configfile: /home/chaos/omni/omni_pgeni framework: pg native: True Args: createsliver ecgtest /home/chaos/omni/rspecs/request/misc/protogeni-any-one-node.rspec Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest expires on 2012-05-17 17:32:24 UTC Reserved resources on http://www.utah.geniracks.net/protogeni/xmlrpc/am. INFO:omni: ============================================================
That looks good, and implies that pc3.utah.geniracks.net was allocated for my experiment. - From boss, i can login to the allocated pc3 as root, and the correct key is used by default:
boss,[~],10:31(0)$ sudo ssh pc3 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that the RSA host key has just been changed. The fingerprint for the RSA key sent by the remote host is 46:63:92:67:c8:75:20:4e:52:9f:2d:f6:cb:58:16:77. Please contact your system administrator. Add correct host key in /root/.ssh/known_hosts to get rid of this message. Offending key in /root/.ssh/known_hosts:15 Password authentication is disabled to avoid man-in-the-middle attacks. Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks. [root@geni1 ~]#
Step 4: GPO inventories the rack based on our own processes
Step 4A: Inventory and label physical rack contents
Using:
- Enumerate all physical objects in the rack
- Use available rack documentation to determine the correct name of each object
- If any objects can't be found in public documentation, compare to internal notes, and iterate with InstaGENI
- Physically label each device in the rack with its name on front and back
- Inventory all hardware details for rack contents on OpsHardwareInventory
- Add an ascii rack diagram to OpsHardwareInventory
Verify:
- Public documentation and/or rack diagrams identify all rack objects
- There is a public parts list which matches the parts we received
- We succeed in labelling the devices and adding their hardware details and locations to our inventory
Step 4B: Inventory rack power requirements
Using:
- Add rack circuit information to OpsPowerConnectionInventory
Verify:
- We succeed in locating and documenting information about rack power circuits in use
Step 4C: Inventory rack network connections
Using:
- Add all rack ethernet and fiber connections and their VLAN configurations to OpsConnectionInventory
- Add static rack OpenFlow datapath information to OpsDpidInventory
Verify:
- We are able to identify and determine all rack network connections and VLAN configurations
- We are able to determine the OpenFlow configuration of the rack dataplane switch
Step 4D: Verify government property accounting for the rack
Using:
- Receive a completed DD1149 form from InstaGENI
- Receive and inventory a property tag number for the BBN InstaGENI rack
Verify:
- The DD1149 paperwork is complete to BBN government property standards
- We receive a single property tag for the rack, as expected
Step 5: Configure operational alerting for the rack
Step 5A: GPO installs active control network monitoring
Using:
- Add a monitored control network ping from ilian.gpolab.bbn.com to the boss VM
- Add a monitored control network ping from ilian.gpolab.bbn.com to the ops VM
- Add a monitored control network ping from ilian.gpolab.bbn.com to the foam VM
- Add a monitored control network ping from ilian.gpolab.bbn.com to the flowvisor VM
- Add a monitored control network ping from ilian.gpolab.bbn.com to the infrastructure VM host
- Add a monitored control network ping from ilian.gpolab.bbn.com to the control switch's management IP
- Add a monitored control network ping from ilian.gpolab.bbn.com to the dataplane switch's management IP
Verify:
- Active monitoring of the control network is successful
- Each monitored IPs is successfully available at least once
Results of testing step 5A: 2012-05-18
Note: this is partially blocked, but i want to get a basic ability to detect outages, so am installing what i can.
- Pings for boss and ops can be added right now, doing so
- Pings for FOAM and FV are blocked pending creation of those VMs
- I am adding a ping for
utah.control.geniracks.net
, but this may change depending on the conclusion in terms of what that host should be named. - The switch management IPs are private, so i think this is effectively blocked until the BBN rack arrives, since i don't want to install ganglia on Utah's rack (and i think the right thing to do here is to ping the devices from boss).
Results of testing step 5A: 2012-07-04
- Pings for boss and ops are still operating
- I am adding pings for foam and flowvisor now.
- Ping for
utah.control.geniracks.net
is still operating, but this may change depending on the conclusion in terms of what that host should be named. - The switch management IPs are private, so i think this is effectively blocked until the BBN rack arrives, since i don't want to install ganglia on Utah's rack (and i think the right thing to do here is to ping the devices from boss).
Step 5B: GPO installs active shared dataplane monitoring
Using:
- Add a monitored dataplane network ping from a lab dataplane test host on vlan 1750 to the rack dataplane
- If necessary, add an openflow controller to handle traffic for the monitoring subnet
Verify:
- Active monitoring of the dataplane network is successful
- The monitored IP is successfully available at least once
Results of testing step 5B: 2012-07-04
- Tim added a sliver for monitoring the dataplane subnet yesterday, under slice URN
urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+tuptymon
- Pings from BBN NLR to the Utah testpoint are showing up at http://monitor.gpolab.bbn.com/connectivity/campus.html, and have been successful at least once.
- When the UEN flowvisor is installed, this sliver will need to be updated to use the new path.
Step 5C: GPO gets access to monitoring information about the BBN rack
Using:
- GPO determines what monitoring tool InstaGENI will make available for site administrators
- GPO successfully accesses and views status data about the BBN rack
Verify:
- I can see general data about all devices in the BBN rack
- I can see detailed information about any services checked
Step 5D: GPO receives e-mail about BBN rack alerts
Using:
- Request e-mail notifications for BBN rack problems to be sent to GPO ops
- Collect a number of notifications
- Inspect three representative messages
Verify:
- E-mail messages about rack problems are received
- For each inspected message, i can determine:
- The affected device
- The affected service
- The type of problem being reported
- The duration of the outage
Results of testing step 5D: 2012-07-04
- I've been subscribed to
genirack-ops@flux.utah.edu
since 2012-05-24, and have received a number of e-mail messages. - Most of these messages are notifications of GENI operations
- There is also a nightly e-mail, subject "Testbed Audit Results", which reports on the number of experiments which have been swapped in for longer than a day.
- There are also notifications of account requests and changes in local projects
- This list does not detect and notify about rack problems: InstaGENI does not have a list for that purpose.
Step 6: Setup contact info and change control procedures
Step 6A: InstaGENI operations staff should subscribe to response-team
Using:
- Ask InstaGENI operators to subscribe
instageni-ops@flux.utah.edu
(or individual operators) toresponse-team@geni.net
Verify:
- This subscription has happened. On daulis:
sudo -u mailman /usr/lib/mailman/bin/find_member -l response-team utah.edu
Results of testing step 6A: 2012-07-04
Per daulis, Rob is subscribed, but no other InstaGENI operators, and not the list itself. I opened 43 for this.
Results of testing step 6A: 2012-11-12
Verified that instageni-ops@flux.utah.edu
is subscribed to response-team.
Step 6B: InstaGENI operations staff should provide contact info to GMOC
Using:
- Ask InstaGENI operators to submit primary and secondary e-mail and phone contact information to GMOC
Verify:
- E-mail
gmoc@grnoc.iu.edu
and request verification that the InstaGENI organization contact info is up-to-date.
Results of testing step 6B: 2012-07-04
- I e-mailed GMOC, and will follow up when i get a response.
Step 6C: Negotiate an interim change control notification procedure
Using:
- Ask InstaGENI operators to notify either instageni-design@geni.net or gpo-infra@geni.net about planned outages and changes.
Verify:
- InstaGENI agrees to send notifications about planned outages and changes.
Results of testing step 6C: 2012-05-29
InstaGENI has agreed to notify instageni-design when there are outages.
We will want to revisit this test when GMOC has workflows in place to handle notifications for rack outages, and before there are additional rack sites and users who may need to be notified.