wiki:GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-ADM-1

Version 41 (modified by Josh Smift, 7 years ago) (diff)

--

  1. Detailed test plan for IG-ADM-1: Rack Receipt and Inventory Test
  2. Page format
  3. Status of test
  4. High-level description from test plan
    1. Procedure
    2. Criteria to verify as part of this test
  5. Step 1 (prep): InstaGENI and GPO power and wire the BBN rack
  6. Step 2: Configure and verify DNS
    1. Step 2A (verify): Find out what IP-to-hostname mapping to use
      1. Results of testing step 2A: 2012-12-20
    2. Step 2B (prep): Insert IP-to-hostname mapping in DNS
    3. Step 2C (verify): Test all PTR records
      1. Results of testing step 2C: 2012-12-20
  7. Step 3: GPO requests and receives administrator accounts
    1. Step 3A: GPO requests access to boss and ops nodes
      1. Results of testing step 3A: 2012-12-20
      2. Results of testing step 3A: 2012-05-15
    2. Step 3B: GPO requests access to FOAM VM
      1. Results of testing step 3B: 2012-12-20
      2. Results of testing step 3B: 2012-07-04
    3. Step 3C: GPO requests access to FlowVisor VM
      1. Results of testing step 3C: 2012-12-20
      2. Results of testing step 3C: 2012-11-12
    4. Step 3D: GPO requests access to infrastructure server
      1. Results of testing step 3D: 2012-12-20
      2. Results of testing step 3D: 2012-05-15
    5. Step 3E: GPO requests access to network devices
      1. Results of testing step 3E: 2012-12-20
      2. Results of testing step 3E: 2012-05-16
    6. Step 3F: GPO requests access to shared OpenVZ nodes
      1. Results of testing step 3F: 2012-12-20
      2. Results of testing step 3F: 2012-05-16
    7. Step 3G: GPO requests access to iLO remote management interfaces for …
      1. Results of testing step 3G: 2012-12-20
      2. Results of testing step 3G: 2012-05-16
    8. Step 3H: GPO gets access to allocated bare metal nodes by default
      1. Results of testing step 3H: 2012-12-20
      2. Results of testing step 3H: 2012-05-16
  8. Step 4: GPO inventories the rack based on our own processes
    1. Step 4A: Inventory and label physical rack contents
      1. Results of testing step 4A: 2013-01-17
    2. Step 4B: Inventory rack power requirements
      1. Results of testing step 4B: 2013-01-17
    3. Step 4C: Inventory rack network connections
      1. Results of testing step 4C: 2013-01-17
    4. Step 4D: Verify government property accounting for the rack
      1. Results of testing step 4D: 2013-01-17
  9. Step 5: Configure operational alerting for the rack
    1. Step 5A: GPO installs active control network monitoring
      1. Results of testing step 5A: 2013-01-17
      2. Results of testing step 5A: 2012-05-18
      3. Results of testing step 5A: 2012-07-04
    2. Step 5B: GPO installs active shared dataplane monitoring
      1. Results of testing step 5B: 2013-01-17
      2. Results of testing step 5B: 2012-07-04
    3. Step 5C: GPO gets access to monitoring information about the BBN rack
      1. Results of testing step 5C: 2013-01-17
    4. Step 5D: GPO receives e-mail about BBN rack alerts
      1. Results of testing step 5D: 2013-01-17
      2. Results of testing step 5D: 2012-07-04
  10. Step 6: Setup contact info and change control procedures
    1. Step 6A: InstaGENI operations staff should subscribe to response-team
      1. Results of testing step 6A: 2012-07-04
      2. Results of testing step 6A: 2012-11-12
    2. Step 6B: InstaGENI operations staff should provide contact info to GMOC
      1. Results of testing step 6B: 2012-07-04
      2. Results of testing step 6B: 2012-11-12
    3. Step 6C: Negotiate an interim change control notification procedure
      1. Results of testing step 6C: 2012-05-29
      2. Results of testing step 6C: 2013-01-31

Detailed test plan for IG-ADM-1: Rack Receipt and Inventory Test

This page is GPO's working page for performing IG-ADM-1. It is public for informational purposes, but it is not an official status report. See GENIRacksHome/InstageniRacks/AcceptanceTestStatus for the current status of InstaGENI acceptance tests.

Last substantive edit of this page: 2013-01-31

Page format

  • The status chart summarizes the state of this test
  • The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
  • The steps contain things I will actually do/verify:
    • Steps may be composed of related substeps where I find this useful for clarity
    • Each step is identified as either "(prep)" or "(verify)":
      • Prep steps are just things we have to do. They're not tests of the rack, but are prerequisites for subsequent verification steps
      • Verify steps are steps in which we will actually look at rack output and make sure it is as expected. They contain a Using: block, which lists the steps to run the verification, and an Expect: block which lists what outcome is expected for the test to pass.

Status of test

Note: As of Version 35 of this page, this status chart only refers to the BBN rack. We did some of these tests on the Utah rack, which are still documented below, but they're no longer part of this table.

See GENIRacksHome/InstageniRacks/AcceptanceTestStatus for the meanings of test states.

Step State Date completed Open Tickets Closed Tickets/Comments
1 Color(green,Pass)? 2012-12-20
2A Color(green,Pass)? 2012-12-20
2B Color(green,Pass)? 2012-12-20
2C Color(green,Pass)? 2012-12-20
3A Color(green,Pass)? 2012-12-20
3B Color(green,Pass)? 2012-12-20
3C Color(green,Pass)? 2012-12-20
3D Color(green,Pass)? 2012-12-20
3E Color(green,Pass)? 2013-01-21 71
3F Color(green,Pass)? 2012-12-20
3G Color(green,Pass)? 2013-01-17 72
3H Color(green,Pass)? 2012-12-20
4A Color(green,Pass)? 2013-01-21 85
4B Color(green,Pass)? 2013-01-17
4C Color(green,Pass)? 2013-01-17
4D Color(orange,Blocked)? 90
5A Color(green,Pass)? 2013-01-17
5B Color(green,Pass)? 2013-01-17
5C Color(orange,Blocked)? 91
5D Color(green,Pass)? 2013-01-17
6A Color(green,Pass)? 2013-01-31
6B Color(green,Pass)? 2013-01-31
6C Color(green,Pass)? 2013-01-31

High-level description from test plan

This "test" uses BBN as an example site by verifying that we can do all the things we need to do to integrate the rack into our standard local procedures for systems we host.

Procedure

  • InstaGENI and GPO power and wire the BBN rack
  • GPO configures the instageni.gpolab.bbn.com DNS namespace and 192.1.242.128/25 IP space, and enters all public IP addresses used by the rack into DNS.
  • GPO requests and receives administrator accounts on the rack and GPO sysadmins receive read access to all InstaGENI monitoring of the rack.
  • GPO inventories the physical rack contents, network connections and VLAN configuration, and power connectivity, using our standard operational inventories.
  • GPO, InstaGENI, and GMOC share information about contact information and change control procedures, and InstaGENI operators subscribe to GENI operations mailing lists and submit their contact information to GMOC.
  • GPO reviews the documented parts list, power requirements, physical and logical network connectivity requirements, and site administrator community requirements, verifying that these documents should be sufficient for a new site to use when setting up a rack.

Criteria to verify as part of this test

  • VI.02. A public document contains a parts list for each rack. (F.1)
  • VI.03. A public document states the detailed power requirements of the rack, including how many PDUs are shipped with the rack, how many of the PDUs are required to power the minimal set of shipped equipment, the part numbers of the PDUs, and the NEMA input connector type needed by each PDU. (F.1)
  • VI.04. A public document states the physical network connectivity requirements between the rack and the site network, including number, allowable bandwidth range, and allowed type of physical connectors, for each of the control and dataplane networks. (F.1)
  • VI.05. A public document states the minimal public IP requirements for the rack, including: number of distinct IP ranges and size of each range, hostname to IP mappings which should be placed in site DNS, whether the last-hop routers for public IP ranges subnets sit within the rack or elsewhere on the site, and what firewall configuration is desired for the control network. (F.1)
  • VI.06. A public document states the dataplane network requirements and procedures for a rack, including necessary core backbone connectivity and documentation, any switch configuration options needed for compatibility with the L2 core, and the procedure for connecting non-rack-controlled VLANs and resources to the rack dataplane. (F.1)
  • VI.07. A public document explains the requirements that site administrators have to the GENI community, including how to join required mailing lists, how to keep their support contact information up-to-date, how and under what circumstances to work with Legal, Law Enforcement and Regulatory(LLR) Plan, how to best contact the rack vendor with operational problems, what information needs to be provided to GMOC to support emergency stop, and how to interact with GMOC when an Emergency Stop request is received. (F.3, C.3.d)
  • VI.14. A procedure is documented for creating new site administrator and operator accounts. (C.3.a)
  • VII.01. Using the provided documentation, GPO is able to successfully power and wire their rack, and to configure all needed IP space within a per-rack subdomain of gpolab.bbn.com. (F.1)
  • VII.02. Site administrators can understand the physical power, console, and network wiring of components inside their rack and document this in their preferred per-site way. (F.1)

Step 1 (prep): InstaGENI and GPO power and wire the BBN rack

This step covers the physical delivery of the rack to BBN, the transport of the rack inside BBN to the GPO lab, and the cabling, powering, and initial configuration of the rack.

Step 2: Configure and verify DNS

Step 2A (verify): Find out what IP-to-hostname mapping to use

Using:

  • If the rack IP requirements documentation for the rack exists:
    • Review that documentation and determine what IP to hostname mappings should exist for 192.1.242.128/25
  • Otherwise:
    • Iterate with instageni-ops to determine the IP to hostname mappings to use for 192.1.242.128/25

Expect:

  • Reasonable IP-to-hostname mappings for 126 valid IPs allocated for InstaGENI use in 192.1.242.128/25

Results of testing step 2A: 2012-12-20

We discussed this via e-mail, and concluded that we should create these DNS entries in gpolab.bbn.com:

;; 192.1.242.128/25: InstaGENI rack

; Delegate the whole subdomain to boss.instageni.gpolab.bbn.com, with
; ns.emulab.net as a secondary.
ns.instageni            IN      A       192.1.242.132
instageni               IN      NS      ns.instageni
instageni               IN      NS      ns.emulab.net.

And these in 242.1.192.in-addr.arpa:

;; 192.1.242.129/25: instageni.gpolab.bbn.com (InstaGENI rack control network)

; Delegate a subdomain to boss.instageni.gpolab.bbn.com, and generate
; CNAMEs pointing to it.
129/25               IN   NS      ns.instageni.gpolab.bbn.com.
129/25               IN   NS      ns.emulab.net.
$GENERATE 129-255 $  IN   CNAME   $.129/25.242.1.192.in-addr.arpa.

Step 2B (prep): Insert IP-to-hostname mapping in DNS

  • Fully populate 192.1.242.128/25 PTR entries in GPO lab DNS
  • Fully populate instageni.gpolab.bbn.com A entries in GPO lab DNS

Step 2C (verify): Test all PTR records

Using:

  • From a BBN desktop host:
    for lastoct in {129..255}; do
    host 192.1.242.$lastoct
    done
    

Expect:

  • All results look like:
    $lastoct.242.1.192.in-addr.arpa domain name pointer <something reasonable>
    
    and none look like:
    Host $lastoct.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
    

Results of testing step 2C: 2012-12-20

Many addresses aren't defined:

[13:46:15] jbs@anubis:/home/jbs
+$ for lastoct in {129..255} ; do host 192.1.242.$lastoct ; done
Host 129.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
130.242.1.192.in-addr.arpa is an alias for 130.129/25.242.1.192.in-addr.arpa.
130.129/25.242.1.192.in-addr.arpa domain name pointer control.instageni.gpolab.bbn.com.
131.242.1.192.in-addr.arpa is an alias for 131.129/25.242.1.192.in-addr.arpa.
131.129/25.242.1.192.in-addr.arpa domain name pointer control-ilo.instageni.gpolab.bbn.com.
132.242.1.192.in-addr.arpa is an alias for 132.129/25.242.1.192.in-addr.arpa.
132.129/25.242.1.192.in-addr.arpa domain name pointer boss.instageni.gpolab.bbn.com.
133.242.1.192.in-addr.arpa is an alias for 133.129/25.242.1.192.in-addr.arpa.
133.129/25.242.1.192.in-addr.arpa domain name pointer ops.instageni.gpolab.bbn.com.
134.242.1.192.in-addr.arpa is an alias for 134.129/25.242.1.192.in-addr.arpa.
134.129/25.242.1.192.in-addr.arpa domain name pointer foam.instageni.gpolab.bbn.com.
135.242.1.192.in-addr.arpa is an alias for 135.129/25.242.1.192.in-addr.arpa.
135.129/25.242.1.192.in-addr.arpa domain name pointer flowvisor.instageni.gpolab.bbn.com.
Host 136.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 137.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 138.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 139.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
140.242.1.192.in-addr.arpa is an alias for 140.129/25.242.1.192.in-addr.arpa.
140.129/25.242.1.192.in-addr.arpa domain name pointer pc1.instageni.gpolab.bbn.com.
141.242.1.192.in-addr.arpa is an alias for 141.129/25.242.1.192.in-addr.arpa.
141.129/25.242.1.192.in-addr.arpa domain name pointer pc2.instageni.gpolab.bbn.com.
142.242.1.192.in-addr.arpa is an alias for 142.129/25.242.1.192.in-addr.arpa.
142.129/25.242.1.192.in-addr.arpa domain name pointer pc3.instageni.gpolab.bbn.com.
143.242.1.192.in-addr.arpa is an alias for 143.129/25.242.1.192.in-addr.arpa.
143.129/25.242.1.192.in-addr.arpa domain name pointer pc4.instageni.gpolab.bbn.com.
144.242.1.192.in-addr.arpa is an alias for 144.129/25.242.1.192.in-addr.arpa.
144.129/25.242.1.192.in-addr.arpa domain name pointer pc5.instageni.gpolab.bbn.com.
Host 145.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 146.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 147.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 148.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 149.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 150.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 151.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 152.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 153.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 154.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 155.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 156.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 157.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 158.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 159.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 160.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 161.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 162.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 163.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 164.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 165.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 166.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 167.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 168.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 169.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 170.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 171.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 172.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 173.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 174.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 175.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 176.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 177.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 178.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 179.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 180.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 181.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 182.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 183.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 184.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 185.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 186.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 187.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 188.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 189.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 190.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 191.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 192.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 193.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 194.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 195.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 196.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 197.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 198.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 199.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 200.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 201.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 202.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 203.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 204.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 205.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 206.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 207.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 208.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 209.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 210.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 211.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 212.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 213.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 214.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 215.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 216.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 217.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 218.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 219.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 220.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 221.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 222.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 223.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 224.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 225.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 226.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 227.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 228.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 229.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 230.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 231.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 232.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 233.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 234.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 235.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 236.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 237.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 238.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 239.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 240.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 241.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 242.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 243.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 244.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 245.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 246.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 247.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 248.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 249.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 250.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 251.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 252.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 253.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 254.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)
Host 255.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)

We think that's normal: The in-use ones are in DNS, the not-in-use ones aren't.

I tried creating a VM with a public IP address, using this rspec:

<?xml version="1.0" encoding="UTF-8"?>
<rspec xmlns="http://www.geni.net/resources/rspec/3"
       xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:emulab="http://www.protogeni.net/resources/rspec/ext/emulab/1"
       xs:schemaLocation="http://www.geni.net/resources/rspec/3
           http://www.geni.net/resources/rspec/3/request.xsd"
       type="request">
  <node client_id="carlin" exclusive="false">
    <sliver_type name="emulab-openvz" />
    <emulab:routable_control_ip />
  </node>
</rspec>

According to my manifest rspec, I got

<emulab:vnode name="pcvm2-3"/>    <host name="carlin.jbs.pgeni-gpolab-bbn-com.instageni.gpolab.bbn.com"/>    <services>      <login authentication="ssh-keys" hostname="pcvm2-3.instageni.gpolab.bbn.com" port="22" username="jbs"/>    </services>  </node>

That hostname and IP address now resolve:

[15:03:32] jbs@anubis:/home/jbs/rspecs/request
+$ host pcvm2-3.instageni.gpolab.bbn.com
pcvm2-3.instageni.gpolab.bbn.com has address 192.1.242.150

[15:03:34] jbs@anubis:/home/jbs/rspecs/request
+$ host 192.1.242.150
150.242.1.192.in-addr.arpa is an alias for 150.129/25.242.1.192.in-addr.arpa.
150.129/25.242.1.192.in-addr.arpa domain name pointer pcvm2-3.instageni.gpolab.bbn.com.

After I delete my sliver:

[15:03:58] jbs@anubis:/home/jbs/rspecs/request
+$ omni -a $am deletesliver $slicename
[* snip *]
  Result Summary: Deleted sliver urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs on unspecified_AM_URN at https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
INFO:omni: ============================================================

[15:04:43] jbs@anubis:/home/jbs/rspecs/request
+$ host pcvm2-3.instageni.gpolab.bbn.com
Host pcvm2-3.instageni.gpolab.bbn.com not found: 3(NXDOMAIN)

[15:05:57] jbs@anubis:/home/jbs/rspecs/request
+$ host 192.1.242.150
150.242.1.192.in-addr.arpa is an alias for 150.129/25.242.1.192.in-addr.arpa.
150.129/25.242.1.192.in-addr.arpa domain name pointer pcvm2-3.instageni.gpolab.bbn.com.

That second one still works because it's cached on my local nameserver; if I ask the source, it's gone:

[15:32:13] jbs@ops.instageni.gpolab.bbn.com:/users/jbs
+$ host 192.1.242.150
Host 150.242.1.192.in-addr.arpa. not found: 3(NXDOMAIN)

So, I think this is fine: Records exist when they're in use, and not when they're not, and that's fine.

Step 3: GPO requests and receives administrator accounts

Step 3A: GPO requests access to boss and ops nodes

Using:

Verify:

  • Logins succeed for Chaos, Josh, and Tim on both nodes
  • The commands work:
    $ sudo whoami
    root
    

Results of testing step 3A: 2012-12-20

I followed the procedure at https://users.emulab.net/trac/protogeni/wiki/RackAdminAccounts#AdminAccountsinEmulab to join the emulab-ops project, and once the Utah folks approved that and made an admin, I was able to log in to boss and ops, and use sudo:

[15:50:40] jbs@anubis:/home/jbs
+$ ssh ops.instageni.gpolab.bbn.com sudo whoami
root

[15:50:50] jbs@anubis:/home/jbs
+$ ssh boss.instageni.gpolab.bbn.com sudo whoami
root

I asked Chaos and Tim to follow the procedure at that URL as well, and they did, and I approved their accounts by following the procedure at https://users.emulab.net/trac/protogeni/wiki/RackAdminAccounts#AddingmoreadminaccountstoEmulab, and they confirmed that they could log in to boss and ops.

Results of testing step 3A: 2012-05-15

Note: This test was run on the Utah rack, where only Chaos has an account. So Tim and Josh will not be testing, and the hosts to test are boss.utah.geniracks.net and ops.utah.geniracks.net.

  • Chaos successfully used public-key login and sudo from a BBN subnet (128.89.68.0/23) to boss:
    capybara,[~],11:39(0)$ ssh chaos@boss.utah.geniracks.net
    Last login: Tue May 15 07:29:07 2012 from capybara.bbn.co
    Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
            The Regents of the University of California.  All rights reserved.
    
    FreeBSD 8.3-RC1 (XEN) #0: Tue Mar 13 16:27:12 MDT 2012
    
    Welcome to FreeBSD!
    
    Need to see the calendar for this month? Simply type "cal".  To see the
    whole year, type "cal -y".
                    -- Dru <genesis@istar.ca>
    > bash
    boss,[~],09:39(0)$ sudo whoami
    root
    
  • Chaos successfully used public-key login and sudo from a BBN subnet (128.89.68.0/23) to ops:
    capybara,[~],11:40(0)$ ssh chaos@ops.utah.geniracks.net
    Last login: Sat May 12 15:41:57 2012 from capybara.bbn.co
    Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
            The Regents of the University of California.  All rights reserved.
    
    FreeBSD 8.3-RC1 (XEN) #0: Tue Mar 13 16:27:12 MDT 2012
    
    Welcome to FreeBSD!
    
    ops,[~],09:40(0)$ sudo whoami
    root
    

Step 3B: GPO requests access to FOAM VM

Verify:

  • Logins succeed for Chaos, Josh, and Tim on the FOAM VM
  • The command works:
    $ sudo whoami
    root
    

Results of testing step 3B: 2012-12-20

I was named as the site admin in the site survey, and was given an account on the FOAM VM. I was able to log in and use sudo:

[15:57:46] jbs@anubis:/home/jbs
+$ ssh foam.instageni.gpolab.bbn.com sudo whoami
root

I then created accounts for Chaos and Tim, following the procedure at https://users.emulab.net/trac/protogeni/wiki/RackAdminAccounts#AdminAccountsonInstaGeniRacks. I got their public keys from their Emulab accounts, and put them into chaos.keys and tupty.keys in my homedir, and then:

sudo /usr/local/bin/mkadmin.pl chaos chaos.keys
sudo /usr/local/bin/mkadmin.pl tupty tupty.keys

They then confirmed that they could log in, and run 'sudo whoami'.

Results of testing step 3B: 2012-07-04

Note: This test was run on the Utah rack."

  • Chaos can SSH to foam.utah.geniracks.net:
    $ ssh foam.utah.geniracks.net
    Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com/
    Last login: Tue Jul  3 12:57:20 2012 from capybara.bbn.com
    foam,[~],09:33(0)$
    
  • Chaos can sudo on foam:
    foam,[~],09:33(0)$ sudo whoami
    root
    

Step 3C: GPO requests access to FlowVisor VM

Verify:

  • Logins succeed for Chaos, Josh, and Tim on the FlowVisor VM
  • The command works:
    $ sudo whoami
    root
    

Results of testing step 3C: 2012-12-20

I was named as the site admin in the site survey, and was given an account on the FlowVisor VM. I was able to log in and use sudo:

[16:11:41] jbs@anubis:/home/jbs
+$ ssh flowvisor.instageni.gpolab.bbn.com sudo whoami
root

I then created accounts for Chaos and Tim, following the procedure at https://users.emulab.net/trac/protogeni/wiki/RackAdminAccounts#AdminAccountsonInstaGeniRacks. I got their public keys from their Emulab accounts, and put them into chaos.keys and tupty.keys in my homedir, and then:

sudo /usr/local/bin/mkadmin.pl chaos chaos.keys
sudo /usr/local/bin/mkadmin.pl tupty tupty.keys

They then confirmed that they could log in, and run 'sudo whoami'.

Results of testing step 3C: 2012-11-12

Note: This test was run on the Utah rack."

  • Chaos can SSH to flowvisor.utah.geniracks.net:
    $ ssh flowvisor.utah.geniracks.net
    Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com/
    Last login: Fri Jul 27 08:09:08 2012 from capybara.bbn.com
    
  • Chaos can sudo on flowvisor:
    flowvisor,[~],10:14(0)$ sudo whoami
    root
    

Step 3D: GPO requests access to infrastructure server

  • Request accounts for GPO ops staffers on the VM server node which runs boss, ops, foam, and flowvisor
  • Chaos tries to SSH to the VM server node
  • Josh tries to SSH to the VM server node
  • Tim tries to SSH to the VM server node
  • Chaos tries to run a minimal command as sudo on the VM server node:
    sudo whoami
    

Verify:

  • Logins succeed for Chaos, Josh, and Tim on the host
  • The command works:
    $ sudo whoami
    root
    

Results of testing step 3D: 2012-12-20

I was named as the site admin in the site survey, and was given an account on the control host. I was able to log in and use sudo:

[16:14:44] jbs@anubis:/home/jbs
+$ ssh control.instageni.gpolab.bbn.com sudo whoami
root

I then created accounts for Chaos and Tim, following the procedure at https://users.emulab.net/trac/protogeni/wiki/RackAdminAccounts#AdminAccountsonInstaGeniRacks. I got their public keys from their Emulab accounts, and put them into chaos.keys and tupty.keys in my homedir, and then:

sudo /usr/local/bin/mkadmin.pl chaos chaos.keys
sudo /usr/local/bin/mkadmin.pl tupty tupty.keys

They then confirmed that they could log in, and run 'sudo whoami'.

Results of testing step 3D: 2012-05-15

Note: This test was run on the Utah rack, where only Chaos has an account. So Tim and Josh will not be testing, and the host to test is utah.control.geniracks.net.

  • Chaos successfully used public-key login and sudo from a BBN subnet (128.89.68.0/23) to the control node:
    capybara,[~],14:53(0)$ ssh utah.control.geniracks.net
    Welcome to Ubuntu precise (development branch) (GNU/Linux 3.2.0-23-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com/
    
      System information as of Tue May 15 12:53:17 MDT 2012
    
      System load:  0.0               Users logged in:       1
      Usage of /:   44.3% of 5.85GB   IP address for xenbr0: 155.98.34.2
      Memory usage: 1%                IP address for xenbr1: 10.1.1.254
      Swap usage:   0%                IP address for xenbr2: 10.2.1.254
      Processes:    142               IP address for xenbr3: 10.3.1.254
    
      Graph this data and manage this system at https://landscape.canonical.com/
    Last login: Tue May 15 12:52:31 2012 from capybara.bbn.com
    
    control,[~],12:53(0)$ sudo whoami
    root
    

Step 3E: GPO requests access to network devices

Using:

  • Request accounts for GPO ops staffers on the InstaGENI rack control and dataplane network devices from instageni-ops

Verify:

  • I know what hostname or IP address to login to to reach each of the control and dataplane switches
  • I know what source IPs are allowed to remotely access the control and dataplane switches via SSH
  • I can successfully perform those logins at least once
  • I can successfully run a few test commands to verify enable mode:
    show running-config
    show mac-address
    

Results of testing step 3E: 2012-12-20

I logged in to boss, and found the switch IP addresses in /etc/hosts:

[16:24:49] jbs@boss.instageni.gpolab.bbn.com:/users/jbs
+$ grep procurve /etc/hosts
10.2.1.253      procurve1-alt
10.1.1.253      procurve1
10.3.1.253      procurve2

I predict that I can reach these from boss, which has an interface on 10.1.1.0/24 and 10.3.1.0/24.

I got the switch passwords out of /usr/testbed/etc/switch.pswd, and logged in and ran those commands.

On procurve1, the control switch:

HP-E2620-24# show running-config

Running configuration:

; J9623A Configuration Editor; Created on release #RA.15.05.0006
; Ver #01:01:00

hostname "HP-E2620-24"
ip default-gateway 10.1.1.254
vlan 1
   name "DEFAULT_VLAN"
   untagged 1-22,25-28
   ip address 10.254.254.253 255.255.255.0
   no untagged 23-24
   ip igmp
   exit
vlan 11
   name "control-alternate"
   untagged 24
   ip address 10.2.1.253 255.255.255.0
   ip igmp
   exit
vlan 10
   name "control-hardware"
   untagged 23
   ip address 10.1.1.253 255.255.255.0
   exit
no web-management
snmp-server community "e8074ebc557d" unrestricted
aaa authentication ssh login public-key
aaa authentication ssh enable public-key
management-vlan 10
no dhcp config-file-update
password manager
password operator

and

HP-E2620-24# show mac-address

 Status and Counters - Port Address Table

  MAC Address   Port  VLAN
  ------------- ----- ----
  00163e-f0a000 25    1
  00163e-f0a100 25    1
  02072d-51f46a 25    1
  022b51-e42941 1     1
  02325e-a545df 3     1
  0240fd-8370c0 3     1
  026c2b-9278af 1     1
  029e26-f15299 25    1
  02e3b1-81dd08 3     1
  10604b-9717cc 6     1
  10604b-976a78 22    1
  10604b-97dae2 2     1
  10604b-97f7e4 4     1
  10604b-980386 10    1
  10604b-9b47fc 1     1
  10604b-9b69b8 25    1
  10604b-9b8214 3     1
  b4b52f-69db40 8     1
  ccef48-7a7aa9 26    1
  000099-989701 23    10
  10604b-9b69ba 23    10

On procurve2, the dataplane switch:

HP-E5406zl# show running-config

Running configuration:

; J8697A Configuration Editor; Created on release #K.15.06.5008
; Ver #02:10.0d:1f

hostname "HP-E5406zl"
module 1 type J9550A
module 5 type J9550A
interface E1
   speed-duplex auto-1000
exit
interface E2
   speed-duplex auto-1000
exit

[* snip -- long *]

and

HP-E5406zl# show mac-address

 Status and Counters - Port Address Table

  MAC Address   Port   VLAN
  ------------- ------ ----
  000099-989703 E20    10
  00163e-f0a103 E20    10
  10604b-9b69ce E20    10
  0012e2-b8a5d0 E24    1750

It wasn't entirely obvious how to find the switch names, usernames, and passwords; I created InstaGENI ticket 71 to track the task of documenting that. This is otherwise complete.

2013-01-21 update: They've added a page with switch login information (http://www.protogeni.net/wiki/rack-deployment), closing InstaGENI ticket 71, so this is all set.

Results of testing step 3E: 2012-05-16

Note: testing using Utah rack

  • IP addresses:
    • The control switch's IP is 10.1.1.253 (procurve1 in boss's /etc/hosts file)
    • The dataplane switch's IP is 10.2.1.253 (procurve2 in boss's /etc/hosts file)
  • These control IPs are on a private network, so they can only be accessed from other things on that network. For our purposes, that is boss.utah.geniracks.net, which has an interface on each network.
  • Login from boss to procurve1 using password in /usr/testbed/etc/switch.pswd:
    boss,[~],09:23(0)$ telnet procurve1
    Trying 10.1.1.253...
    Connected to procurve1.
    Escape character is '^]'.
    ...
    Password:
    ...
    ProCurve Switch 2610-24# show running-config
    
    Running configuration:
    
    ; J9085A Configuration Editor; Created on release #R.11.70
    ...
    ProCurve Switch 2610-24# show mac-address
    
     Status and Counters - Port Address Table
    ...
    ProCurve Switch 2610-24# exit
    ProCurve Switch 2610-24> exit
    Do you want to log out [y/n]? y
    
  • Login from boss to procurve2 using password in /usr/testbed/etc/switch.pswd:
    boss,[~],10:23(0)$ telnet procurve2
    Trying 10.2.1.253...
    Connected to procurve2.
    Escape character is '^]'.
    ...
    Password:
    ...
    ProCurve Switch 6600ml-48G-4XG# show running-config
    
    Running configuration:
    
    ; J9452A Configuration Editor; Created on release #K.14.41
    ...
    ProCurve Switch 6600ml-48G-4XG# show mac-address
    
     Status and Counters - Port Address Table
    ...
    ProCurve Switch 6600ml-48G-4XG# exit
    ProCurve Switch 6600ml-48G-4XG> exit
    Do you want to log out [y/n]? y
    

Step 3F: GPO requests access to shared OpenVZ nodes

Using:

  • Determine an experimental host which is currently configured as a shared OpenVZ node
  • From boss.instageni.gpolab.bbn.com, try to SSH to the node
  • On the node, try to run a minimal command as sudo:
    sudo whoami
    

Verify:

  • Login to the OpenVZ host is successful
  • Access to the node is as root and/or it is possible to run a command via sudo

Results of testing step 3F: 2012-12-20

I looked at https://boss.instageni.gpolab.bbn.com/nodecontrol_list.php3?showtype=dl360 to find a shared node, and found that pc1 is one (it's in EID 'shared-nodes').

On boss, I was able to to run 'sudo whoami' on pc1:

[16:51:06] jbs@boss.instageni.gpolab.bbn.com:/users/jbs
+$ ssh pc1 sudo whoami
root

And I was able to log in directly to pc1 as root:

[16:51:50] jbs@boss.instageni.gpolab.bbn.com:/users/jbs
+$ sudo ssh pc1 whoami
root

Results of testing step 3F: 2012-05-16

Note: This test was run on the Utah rack."

  • pc5 is currently configured as a shared OpenVZ node: it is in the experiment emulab-ops/shared-nodes
  • From boss:
    boss,[~],20:49(0)$ sudo ssh pc5
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
    Someone could be eavesdropping on you right now (man-in-the-middle attack)!
    It is also possible that the RSA host key has just been changed.
    The fingerprint for the RSA key sent by the remote host is
    46:63:92:67:c8:75:20:4e:52:9f:2d:f6:cb:58:16:77.
    Please contact your system administrator.
    Add correct host key in /root/.ssh/known_hosts to get rid of this message.
    Offending key in /root/.ssh/known_hosts:18
    Password authentication is disabled to avoid man-in-the-middle attacks.
    Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
    [root@vhost1 ~]# whoami
    root
    

Step 3G: GPO requests access to iLO remote management interfaces for experimental nodes

Using:

  • GPO requests access to the experimental node iLO management interfaces from instageni-ops
  • Determine how to use these interfaces to access remote control and remote console interfaces for experimental nodes
  • For each experimental node in the BBN rack:
    • Access the iLO interface and view status information
    • View the interface for remotely power-cycling the node
    • Launch the remote console for the node

Verify:

  • GPO is able to determine the procedure for accessing the iLO interfaces
  • Login to each iLO succeeds
  • The remote power-cycle interface exists and appears to be usable (don't try power-cycling any nodes during this test)
  • Launching the remote console at each iLO succeeds

Results of testing step 3G: 2012-12-20

I'm not sure why this test only mentions the experiment nodes, and not the control node; I'd think that we'd especially want ILO access to that.

The control node's ILO IP address is in DNS:

[23:22:39] jbs@boss.instageni.gpolab.bbn.com:/users/jbs
+$ host -l 129/25.242.1.192.in-addr.arpa | grep ilo
131.129/25.242.1.192.in-addr.arpa domain name pointer control-ilo.instageni.gpolab.bbn.com.

I was able to SSH to elabman@control-ilo.instageni.gpolab.bbn.com, using the password in boss:/usr/testbed/etc/ilo.pswd.

Once there, I used 'help power' to show some power-control type commands. I created InstaGENI ticket 72 to track the task of documenting this somewhere, although 'help' and then 'help power' is pretty intuitive. I didn't actually try any power commands on the control node.

I then used the 'textcons' command to get a text console, where I saw a bunch of syslog messages, hit return a few times, and found myself at a "gpolab.control-nodes.geniracks.net login:" prompt. I wasn't able to log in as 'jbs' with my ProtoGENI password, which makes sense; how would my password have gotten set? And in fact I don't have one:

[23:31:44] jbs@gpolab:/home/jbs
+$ sudo grep jbs /etc/shadow
jbs:*:15682:0:99999:7:::

I set myself one, with 'sudo passwd jbs', but still wasn't able to log in. Also, the login prompt seemed to change from "gpolab.control-nodes.geniracks.net login:" to "gpolab login:", and back again, at seemingly random times. I mentioned this on IG ticket 72 too.

I was able to SSH to elabman@192.1.242.182, the ILO for pc3, and observe the console in boot wait mode. I reset the system with 'power reset', and that seemed to work. I noted on IG ticket 72 that the ILO addresses for the experiment nodes aren't in DNS.

2013-01-17 update: They've added a page about the ILOs (http://www.protogeni.net/ProtoGeni/wiki/InstaGeni/ILO), closing InstaGENI ticket 72, so this is all set.

Results of testing step 3G: 2012-05-16

Note: This test was run on the Utah rack."

  • Here's Utah's information about how to use the elabman consoles:
  • This worked for pc3:
    $ ssh elabman@155.98.34.104
    The authenticity of host '155.98.34.104 (155.98.34.104)' can't be established.
    DSA key fingerprint is 93:b0:c9:e7:bd:cd:bd:e4:94:b6:a2:12:aa:17:80:e8.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added '155.98.34.104' (DSA) to the list of known hosts.
    elabman@155.98.34.104's password:
    User:elabman logged-in to ILOUSE211XXJS.utah.geniracks.net(155.98.34.104)
    iLO 3 Advanced 1.28 at  Jan 13 2012
    Server Name: USE211XXJS
    Server Power: On
    
    </>hpiLO->
    
  • There is a command called power, and it seems clear that it could be used to reset the device:
    </>hpiLO-> help power
    
    status=0
    status_tag=COMMAND COMPLETED
    Wed May 16 17:18:09 2012
    
    
    POWER  : The power command is used to change the power state of the server
     and is limited to users with the Power and Reset privilege
    Usage:
      power -- Displays the current server power state
      power on -- Turns the server on
      power off -- Turns the server off
      power off hard -- Force the server off using press and hold
      power reset -- Reset the server
    
    
    </>hpiLO-> power
    
    status=0
    status_tag=COMMAND COMPLETED
    Wed May 16 17:18:15 2012
    
    
    
    power: server power is currently: On
    
  • I can also browse to https://155.98.34.104/ and login as elabman there too
  • I can get MAC address information for 4x iSCSI connections and 4x NICs, as well as the iLO itself, from the web UI
  • From the web UI, trying: Remote Console -> Remote Console -> Java Integrated Remote Console -> Launch
    • This launches, but the console says "No Video"
  • I can browse to https://155.98.34.103/ (supposed to be the iLO for pc1), but i get:
    Secure Connection Failed
    
    An error occurred during a connection to 155.98.34.103.
    
    You have received an invalid certificate.  Please contact the server
    administrator or email correspondent and give them the following
    information:
    
    Your certificate contains the same serial number as another certificate
    issued by the certificate authority.  Please get a new certificate
    containing a unique serial number.
    
    (Error code: sec_error_reused_issuer_and_serial)
    
     The page you are trying to view can not be shown because the
     authenticity of the received data could not be verified.
    
    Please contact the web site owners to inform them of this problem.
    Alternatively, use the command found in the help menu to report
    this broken site.
    
  • On the pc3 iLO, i can browse to: Administration -> Security -> SSL Certificate, and see various SSL information. Let's look into that on the various iLOs:
    • pc1 (.103):
      Issued To       CN=ILOUSE211XXJR.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US
      Issued By       C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust)
      Valid From      Wed, 11 Jan 2012
      Valid Until     Mon, 12 Jan 2037
      Serial Number   57
      
    • pc2 (.102):
      Issued To       CN=ILOUSE211XXJP.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US
      Issued By       C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust)
      Valid From      Wed, 11 Jan 2012
      Valid Until     Mon, 12 Jan 2037
      Serial Number   55
      
    • pc3 (.104):
      Issued To       CN=ILOUSE211XXJS.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US
      Issued By       C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust)
      Valid From      Wed, 11 Jan 2012
      Valid Until     Mon, 12 Jan 2037
      Serial Number   57
      
    • pc4 (.105):
      Issued To       CN=ILOUSE211XXJT.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US
      Issued By       C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust)
      Valid From      Wed, 11 Jan 2012
      Valid Until     Mon, 12 Jan 2037
      Serial Number   53
      
    • pc5 (.101):
      Issued To       CN=ILOUSE211XXJN.utah.geniracks.net, OU=ISS, O=Hewlett-Packard Company, L=Houston, ST=Texas, C=US
      Issued By       C=US, ST=TX, L=Houston, O=Hewlett-Packard Company, OU=ISS, CN=iLO3 Default Issuer (Do not trust)
      Valid From      Wed, 11 Jan 2012
      Valid Until     Mon, 12 Jan 2037
      Serial Number   54
      
  • Testing remote console:
    • pc5 (.101), currently allocated into emulab-ops/shared-nodes experiment: says "No Video"
    • pc2 (.102), currently idle: video works, shows PXE boot prompt
    • pc1 (.103), currently idle: video works, shows PXE boot prompt
    • pc3 (.104), currently allocated into pgeni-gpolab-bbn-com/ecgtest experiment: says "No Video"
      • i did a deletesliver on ecgtest while looking at the console. After awhile, it rebooted, and i watched the BIOS splash screen and the load into frisbee. The frisbee MFS appears to support the VGA console as well: i was able to watch frisbee progress.
    • pc4 (.105), currently idle: video works, shows PXE boot prompt

Summary: the Fedora OS images don't have VGA support, so they don't work with the remote consoles. The MFSes and the boot sequence itself do have such support.

Step 3H: GPO gets access to allocated bare metal nodes by default

Prerequisites:

  • A bare metal node is available for allocation by InstaGENI
  • Someone has successfully allocated the node for a bare metal experiment

Using:

  • From boss, try to SSH into root on the allocated worker node

Verify:

  • We find out the IP address/hostname at which to reach the allocated worker node
  • Login to the node using root's SSH key succeeds

Results of testing step 3H: 2012-12-20

I allocated an exclusive experiment node, with this rspec:

<?xml version="1.0" encoding="UTF-8"?>
<!-- 
This rspec will reserve one PG host in the InstaGENI rack at BBN, running
Ubuntu 10.04.

AM: https://instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
--> 

<rspec xmlns="http://www.geni.net/resources/rspec/3"
       xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
       xs:schemaLocation="http://www.geni.net/resources/rspec/3
           http://www.geni.net/resources/rspec/3/request.xsd"
       type="request">

  <node client_id="carlin" exclusive="true">
    <sliver_type name="raw">
      <disk_image name="urn:publicid:IDN+instageni.gpolab.bbn.com+image+emulab-ops:UBUNTU12-64-STD" /> 
    </sliver_type>
  </node>

</rspec>

My manifest rspec said that I got pc3.instageni.gpolab.bbn.com.

After it became ready, I confirmed that I could log in to it as myself as usual, i.e. that sshd was up and running properly.

I then confirmed that I could log in from boss to pc3, and run commands as root:

[00:13:04] jbs@boss.instageni.gpolab.bbn.com:/users/jbs
+$ sudo ssh pc3 whoami
root

[00:13:11] jbs@boss.instageni.gpolab.bbn.com:/users/jbs
+$ sudo ssh pc3 uname -a
Linux carlin.jbs.pgeni-gpolab-bbn-com.instageni.gpolab.bbn.com 2.6.38.7-1.0emulab #1 SMP Wed Aug 24 09:55:34 MDT 2011 x86_64 x86_64 x86_64 GNU/Linux

Results of testing step 3H: 2012-05-16

Note: This test was run on the Utah rack."

  • The second prerequisite has not been met (all bare metal nodes are unallocated at this time). To meet it, i will try this via omni:
    $ omni createslice ecgtest
    ...
      Result Summary: Created slice with Name ecgtest, URN urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest, Expiration 2012-05-17 17:32:24+00:00
    ...
    $ omni createsliver -a http://www.utah.geniracks.net/protoge
    ni/xmlrpc/am ecgtest ~/omni/rspecs/request/misc/protogeni-any-one-node.rspec
    INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
    INFO:omni:Using control framework pg
    INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest expires on 2012-05-17 17:32:24 UTC
    INFO:omni:Creating sliver(s) from rspec file /home/chaos/omni/rspecs/request/misc/protogeni-any-one-node.rspec for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest
    INFO:omni:Asked http://www.utah.geniracks.net/protogeni/xmlrpc/am to reserve resources. Result:
    INFO:omni:<?xml version="1.0" ?>
    INFO:omni:<!-- Reserved resources for:
            Slice: ecgtest
            At AM:
            URL: http://www.utah.geniracks.net/protogeni/xmlrpc/am
     -->
    INFO:omni:<rspec xmlns="http://protogeni.net/resources/rspec/0.2">
        <node component_manager_urn="urn:publicid:IDN+utah.geniracks.net+authority+cm" component_manager_uuid="c133552e-688f-11e1-8314-00009b6224df" component_urn="urn:publicid:IDN+utah.geniracks.net+node+pc3" component_uuid="49574b15-753a-11e1-a16c-00009b6224df" exclusive="1" hostname="pc3.utah.geniracks.net" sliver_urn="urn:publicid:IDN+utah.geniracks.net+sliver+364" sliver_uuid="e326971a-9f74-11e1-af1c-00009b6224df" sshdport="22" virtual_id="geni1" virtualization_subtype="raw" virtualization_type="emulab-vnode">
          <services>      <login authentication="ssh-keys" hostname="pc3.utah.geniracks.net" port="22" username="chaos"/>    </services>  </node>
    </rspec>
    INFO:omni: ------------------------------------------------------------
    INFO:omni: Completed createsliver:
    
      Options as run:
                    aggregate: http://www.utah.geniracks.net/protogeni/xmlrpc/am
                    configfile: /home/chaos/omni/omni_pgeni
                    framework: pg
                    native: True
    
      Args: createsliver ecgtest /home/chaos/omni/rspecs/request/misc/protogeni-any-one-node.rspec
    
      Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest expires on 2012-05-17 17:32:24 UTC
    Reserved resources on http://www.utah.geniracks.net/protogeni/xmlrpc/am.
    INFO:omni: ===========================================================
    
    That looks good, and implies that pc3.utah.geniracks.net was allocated for my experiment.
  • From boss, i can login to the allocated pc3 as root, and the correct key is used by default:
    boss,[~],10:31(0)$ sudo ssh pc3
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
    Someone could be eavesdropping on you right now (man-in-the-middle attack)!
    It is also possible that the RSA host key has just been changed.
    The fingerprint for the RSA key sent by the remote host is
    46:63:92:67:c8:75:20:4e:52:9f:2d:f6:cb:58:16:77.
    Please contact your system administrator.
    Add correct host key in /root/.ssh/known_hosts to get rid of this message.
    Offending key in /root/.ssh/known_hosts:15
    Password authentication is disabled to avoid man-in-the-middle attacks.
    Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
    [root@geni1 ~]#
    

Step 4: GPO inventories the rack based on our own processes

Step 4A: Inventory and label physical rack contents

Using:

  • Enumerate all physical objects in the rack
  • Use available rack documentation to determine the correct name of each object
  • If any objects can't be found in public documentation, compare to internal notes, and iterate with InstaGENI
  • Physically label each device in the rack with its name on front and back
  • Inventory all hardware details for rack contents on OpsHardwareInventory
  • Add an ascii rack diagram to OpsHardwareInventory

Verify:

  • Public documentation and/or rack diagrams identify all rack objects
  • There is a public parts list which matches the parts we received
  • We succeed in labelling the devices and adding their hardware details and locations to our inventory

Results of testing step 4A: 2013-01-17

It wasn't obvious to me how to find out the proper names of the things in the rack, so I created InstaGENI ticket 85 to find out.

http://groups.geni.net/geni/wiki/GENIRacksHome#InstaGENISpecifications is enough of a "public parts list" for our purposes.

I'd previously added an ASCII rack diagram to OpsHardwareInventory, and I've now also added the details of the hardware to a table under OpsHardwareInventory#HardwareforInstaGENIrack.

2013-01-21 update: http://www.protogeni.net/wiki/rack-deployment now documents the names of the objects in the rack, closing InstaGENI ticket 85. And I've labeled each device in the rack, so this is all set.

Step 4B: Inventory rack power requirements

Using:

  • Add rack circuit information to OpsPowerConnectionInventory

Verify:

  • We succeed in locating and documenting information about rack power circuits in use

Results of testing step 4B: 2013-01-17

We added this information to OpsPowerConnectionInventory#InstaGENIrackcircuits on 2012-12-04.

Step 4C: Inventory rack network connections

Using:

  • Add all rack ethernet and fiber connections and their VLAN configurations to OpsConnectionInventory
  • Add static rack OpenFlow datapath information to OpsDpidInventory

Verify:

  • We are able to identify and determine all rack network connections and VLAN configurations
  • We are able to determine the OpenFlow configuration of the rack dataplane switch

Results of testing step 4C: 2013-01-17

I had VLAN information for procurve1 from step 3E; the VLAN configuration on procurve2 is managed by ProtoGENI.

I was able to log in to procurve2 and show its OpenFlow information:

HP-E5406zl# show openflow

 Openflow Configuration

  Openflow aggregate VLANs [Disabled] :           
  Openflow aggregate management VlanId [0] : 0     
  Openflow second aggregate management VlanId [0] : 0     
  Openflow aggregate configuration VlanId [0] : 0     

  VID  State HW  Active controller Pseudo-URL                       Conn
  ---- ----- --- -------------------------------------------------- ----
  1750 On    On  tcp:10.3.1.7:6633                                  Yes 
 

HP-E5406zl# show openflow 1750

 Openflow Configuration - VLAN 1750

  Openflow state [Disabled] : Enabled   
  Controller pseudo-URL : tcp:10.3.1.7:6633 *                               
  Listener pseudo-URL :                                                   
  Openflow software rate limit [100] : 100   
  Openflow connecting max backoff [60] : 60       
  Openflow hardware acceleration [Enabled] : Enabled   
  Openflow hardware rate limit [0] : 0        
  Openflow hardware stats max refresh rate [0] : 0        
  Openflow fail-secure [Disabled] : Enabled   
  Second Controller pseudo-URL :                                              
  Third Controller pseudo-URL :                                              

 Openflow Status - VLAN 1750

  Switch MAC address : 84:34:97:C6:C9:00
  Openflow datapath ID : 06D6843497C6C900
  Controller connection status (1/1) : connected ; state: ACTIVE
  Listening connection status : listening (1 connections)
  SW Dpif n_flows: 1 ; cur_capacity:63 ; n_lost: 0
          n_hit: 67958 ; n_missed: 984994 ; n_frags: 0
  Number of hardware rules: 5

I added the connection, VLAN, and datapath information to OpsConnectionInventory and OpsDpidInventory.

Step 4D: Verify government property accounting for the rack

Using:

  • Receive a completed DD1149 form from InstaGENI
  • Receive and inventory a property tag number for the BBN InstaGENI rack

Verify:

  • The DD1149 paperwork is complete to BBN government property standards
  • We receive a single property tag for the rack, as expected

Results of testing step 4D: 2013-01-17

We have not yet received these forms and tags; I created InstaGENI ticket 90 to track that.

Step 5: Configure operational alerting for the rack

Step 5A: GPO installs active control network monitoring

Using:

  • Add a monitored control network ping from ilian.gpolab.bbn.com to the boss VM
  • Add a monitored control network ping from ilian.gpolab.bbn.com to the ops VM
  • Add a monitored control network ping from ilian.gpolab.bbn.com to the foam VM
  • Add a monitored control network ping from ilian.gpolab.bbn.com to the flowvisor VM
  • Add a monitored control network ping from ilian.gpolab.bbn.com to the infrastructure VM host
  • Add a monitored control network ping from ilian.gpolab.bbn.com to the control switch's management IP
  • Add a monitored control network ping from ilian.gpolab.bbn.com to the dataplane switch's management IP

Verify:

  • Active monitoring of the control network is successful
  • Each monitored IPs is successfully available at least once

Results of testing step 5A: 2013-01-17

I added a monitored control network ping to control, boss, ops, foam, and flowvisor, and it's working fine; see http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?host=ilian.gpolab.bbn.com and http://monitor.gpolab.bbn.com/connectivity/instageni.html for example.

Results of testing step 5A: 2012-05-18

Note: This test was run on the Utah rack."

Note: this is partially blocked, but i want to get a basic ability to detect outages, so am installing what i can.

  • Pings for boss and ops can be added right now, doing so
  • Pings for FOAM and FV are blocked pending creation of those VMs
  • I am adding a ping for utah.control.geniracks.net, but this may change depending on the conclusion in terms of what that host should be named.
  • The switch management IPs are private, so i think this is effectively blocked until the BBN rack arrives, since i don't want to install ganglia on Utah's rack (and i think the right thing to do here is to ping the devices from boss).

Results of testing step 5A: 2012-07-04

Note: This test was run on the Utah rack."

  • Pings for boss and ops are still operating
  • I am adding pings for foam and flowvisor now.
  • Ping for utah.control.geniracks.net is still operating, but this may change depending on the conclusion in terms of what that host should be named.
  • The switch management IPs are private, so i think this is effectively blocked until the BBN rack arrives, since i don't want to install ganglia on Utah's rack (and i think the right thing to do here is to ping the devices from boss).

Step 5B: GPO installs active shared dataplane monitoring

Using:

  • Add a monitored dataplane network ping from a lab dataplane test host on vlan 1750 to the rack dataplane
  • If necessary, add an openflow controller to handle traffic for the monitoring subnet

Verify:

  • Active monitoring of the dataplane network is successful
  • The monitored IP is successfully available at least once

Results of testing step 5B: 2013-01-17

Tim added a BBN InstaGENI sliver to his 'tuptymon' slice for this; the Nagios check is http://monitor.gpolab.bbn.com/nagios/cgi-bin/extinfo.cgi?type=2&host=argos.gpolab.bbn.com&service=ping_instageni-vlan1750.bbn.dataplane.geni.net and http://monitor.gpolab.bbn.com/connectivity/campus.html has a graph.

Results of testing step 5B: 2012-07-04

Note: This test was run on the Utah rack."

  • Tim added a sliver for monitoring the dataplane subnet yesterday, under slice URN urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+tuptymon
  • Pings from BBN NLR to the Utah testpoint are showing up at http://monitor.gpolab.bbn.com/connectivity/campus.html, and have been successful at least once.
  • When the UEN flowvisor is installed, this sliver will need to be updated to use the new path.

Step 5C: GPO gets access to monitoring information about the BBN rack

Using:

  • GPO determines what monitoring tool InstaGENI will make available for site administrators
  • GPO successfully accesses and views status data about the BBN rack

Verify:

  • I can see general data about all devices in the BBN rack
  • I can see detailed information about any services checked

Results of testing step 5C: 2013-01-17

We haven't been told by the InstaGENI team about any monitoring tools, so I created InstaGENI ticket 91 to track that.

Step 5D: GPO receives e-mail about BBN rack alerts

Using:

  • Request e-mail notifications for BBN rack problems to be sent to GPO ops
  • Collect a number of notifications
  • Inspect three representative messages

Verify:

  • E-mail messages about rack problems are received
  • For each inspected message, i can determine:
    • The affected device
    • The affected service
    • The type of problem being reported
    • The duration of the outage

Results of testing step 5D: 2013-01-17

We created three ProtoGENI notification aliases (instageni-ops, instageni-logs, and instageni-stated), and we've been receiving the usual array of ProtoGENI e-mail about them.

We also created a FOAM alias, instageni-foam-admin, and have gotten FOAM-related notifications there.

None of these track outages or rack problems.

Results of testing step 5D: 2012-07-04

Note: This test was run on the Utah rack."

  • I've been subscribed to genirack-ops@flux.utah.edu since 2012-05-24, and have received a number of e-mail messages.
  • Most of these messages are notifications of GENI operations
  • There is also a nightly e-mail, subject "Testbed Audit Results", which reports on the number of experiments which have been swapped in for longer than a day.
  • There are also notifications of account requests and changes in local projects
  • This list does not detect and notify about rack problems: InstaGENI does not have a list for that purpose.

Step 6: Setup contact info and change control procedures

Step 6A: InstaGENI operations staff should subscribe to response-team

Using:

  • Ask InstaGENI operators to subscribe instageni-ops@flux.utah.edu (or individual operators) to response-team@geni.net

Verify:

  • This subscription has happened. On daulis:
    sudo -u mailman /usr/lib/mailman/bin/find_member -l response-team utah.edu
    

Results of testing step 6A: 2012-07-04

Per daulis, Rob is subscribed, but no other InstaGENI operators, and not the list itself. I opened 43 for this.

Results of testing step 6A: 2012-11-12

Verified that instageni-ops@flux.utah.edu is subscribed to response-team.

Step 6B: InstaGENI operations staff should provide contact info to GMOC

Using:

  • Ask InstaGENI operators to submit primary and secondary e-mail and phone contact information to GMOC

Verify:

  • E-mail gmoc@grnoc.iu.edu and request verification that the InstaGENI organization contact info is up-to-date.

Results of testing step 6B: 2012-07-04

  • I e-mailed GMOC, and will follow up when i get a response.

Results of testing step 6B: 2012-11-12

  • I iterated with Eldar at GMOC, and determined that GMOC has primary and escalation e-mail contacts for InstaGENI Utah, and that we determined that it isn't a requirement to have phone contacts if nothing appropriate exists.

Step 6C: Negotiate an interim change control notification procedure

Using:

Verify:

  • InstaGENI agrees to send notifications about planned outages and changes.

Results of testing step 6C: 2012-05-29

InstaGENI has agreed to notify instageni-design when there are outages.

We will want to revisit this test when GMOC has workflows in place to handle notifications for rack outages, and before there are additional rack sites and users who may need to be notified.

Results of testing step 6C: 2013-01-31

We revisited this test now that GMOC has workflows in place for notifications of rack outages, and Rob and Eldar confirmed via e-mail that they were all set with that, so this is done.