wiki:GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-MON-3

Version 7 (modified by chaos@bbn.com, 12 years ago) (diff)

--

Detailed test plan for IG-MON-3: GENI Active Experiment Inspection Test

This page is GPO's working page for performing IG-MON-3. It is public for informational purposes, but it is not an official status report. See GENIRacksHome/InstageniRacks/AcceptanceTestStatus for the current status of InstaGENI acceptance tests.

Last substantive edit of this page: 2012-05-18

Page format

  • The status chart summarizes the state of this test
  • The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
  • The steps contain things i will actually do/verify:
    • Steps may be composed of related substeps where i find this useful for clarity
    • Each step is either a preparatory step (identified by "(prep)") or a verification step (the default):
      • Preparatory steps are just things we have to do. They're not tests of the rack, but are prerequisites for subsequent verification steps
      • Verification steps are steps in which we will actually look at rack output and make sure it is as expected. They contain a Using: block, which lists the steps to run the verification, and an Expect: block which lists what outcome is expected for the test to pass.

Status of test

Step State Date completed Tickets Comments
1 Color(yellow,Completed)? needs retesting when 3 is retested
2 needs retesting when 3 is retested
3 Color(yellow,Completed)? needs retesting once OpenFlow resources are available from InstaGENI AM
4 Color(orange,Blocked)? instaticket:26 blocked on resolution of MAC reporting issue
5 ready to test non-OpenFlow functionality
6 Color(orange,Blocked)? ready to test non-OpenFlow functionality
7 Color(orange,Blocked)? ready to test non-OpenFlow functionality
8 Color(orange,Blocked)? ready to test non-OpenFlow functionality

High-level description from test plan

This test inspects the state of the rack data plane and control networks when experiments are running, and verifies that a site administrator can find information about running experiments.

Procedure

  • An experimenter from the GPO starts up experiments to ensure there is data to look at:
    • An experimenter runs an experiment containing at least one rack OpenVZ VM, and terminates it.
    • An experimenter runs an experiment containing at least one rack OpenVZ VM, and leaves it running.
  • A site administrator uses available system and experiment data sources to determine current experimental state, including:
    • How many VMs are running and which experimenters own them
    • How many physical hosts are in use by experiments, and which experimenters own them
    • How many VMs were terminated within the past day, and which experimenters owned them
    • What OpenFlow controllers the data plane switch, the rack FlowVisor, and the rack FOAM are communicating with
  • A site administrator examines the switches and other rack data sources, and determines:
    • What MAC addresses are currently visible on the data plane switch and what experiments do they belong to?
    • For some experiment which was terminated within the past day, what data plane and control MAC and IP addresses did the experiment use?
    • For some experimental data path which is actively sending traffic on the data plane switch, do changes in interface counters show approximately the expected amount of traffic into and out of the switch?

Criteria to verify as part of this test

  • VII.09. A site administrator can determine the MAC addresses of all physical host interfaces, all network device interfaces, all active experimental VMs, and all recently-terminated experimental VMs. (C.3.f)
  • VII.11. A site administrator can locate current configuration of flowvisor, FOAM, and any other OpenFlow services, and find logs of recent activity and changes. (D.6.a)
  • VII.18. Given a public IP address and port, an exclusive VLAN, a sliver name, or a piece of user-identifying information such as e-mail address or username, a site administrator or GMOC operator can identify the email address, username, and affiliation of the experimenter who controlled that resource at a particular time. (D.7)

Step 1 (prep): start a VM experiment and terminate it

  • An experimenter requests an experiment from the InstaGENI AM containing two rack VMs and a dataplane VLAN
  • The experimenter logs into a VM, and sends dataplane traffic
  • The experimenter terminates the experiment

Results of testing: 2012-05-18

  • I'll use the following rspec to get two VMs:
    jericho,[~],05:29(0)$ cat IG-MON-nodes-C.rspec 
    <?xml version="1.0" encoding="UTF-8"?>
    <!-- This rspec will reserve two openvz nodes, each with no OS specified,
         and create a single dataplane link between them.  It should work
         on any Emulab which has nodes available and supports OpenVZ.  -->
    <rspec xmlns="http://www.geni.net/resources/rspec/3"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
           xsi:schemaLocation="http://www.geni.net/resources/rspec/3
                               http://www.geni.net/resources/rspec/3/request.xsd" 
           type="request">
    
      <node client_id="virt1" exclusive="false">
        <sliver_type name="emulab-openvz" />
        <interface client_id="virt1:if0" />
      </node>
      <node client_id="virt2" exclusive="false">
        <sliver_type name="emulab-openvz" />
        <interface client_id="virt2:if0" />
      </node>
    
      <link client_id="virt1-virt2-0">
        <interface_ref client_id="virt1:if0"/>
        <interface_ref client_id="virt2:if0"/>
        <property source_id="virt1:if0" dest_id="virt2:if0"/>
        <property source_id="virt2:if0" dest_id="virt1:if0"/>
      </link>
    </rspec>
    
  • Then create a slice:
    omni createslice ecgtest2
    
  • Then create a sliver using that rspec:
    jericho,[~],05:31(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am createsliver ecgtest2 ~/IG-MON-nodes-C.rspec
    INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
    INFO:omni:Using control framework pg
    ERROR:omni.protogeni:Call for Get Slice Cred for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 failed.: Exception: PG Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 does not exist.
    ERROR:omni.protogeni:    ..... Run with --debug for more information
    ERROR:omni:Cannot create sliver urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2: Could not get slice credential: Exception: PG Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 does not exist.
    
  • It looks like the slice just wasn't ready yet. Trying again after a minute, the same thing worked:
    jericho,[~],05:31(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am createsliver ecgtest2 ~/IG-MON-nodes-C.rspec
    INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
    INFO:omni:Using control framework pg
    INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires on 2012-05-19 10:30:51 UTC
    INFO:omni:Creating sliver(s) from rspec file /home/chaos/IG-MON-nodes-C.rspec for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2
    INFO:omni:Asked http://www.utah.geniracks.net/protogeni/xmlrpc/am to reserve resources. Result:
    INFO:omni:<?xml version="1.0" ?>
    INFO:omni:<!-- Reserved resources for:
            Slice: ecgtest2
            At AM:
            URL: http://www.utah.geniracks.net/protogeni/xmlrpc/am
     -->
    INFO:omni:<rspec type="manifest" xmlns="http://www.geni.net/resources/rspec/3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.geni.net/resources/rspec/3                            http://www.geni.net/resources/rspec/3/manifest.xsd">  
    
        <node client_id="virt1" component_id="urn:publicid:IDN+utah.geniracks.net+node+pc2" component_manager_id="urn:publicid:IDN+utah.geniracks.net+authority+cm" exclusive="true" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+372">    
            <sliver_type name="emulab-openvz"/>    
            <interface client_id="virt1:if0" component_id="urn:publicid:IDN+utah.geniracks.net+interface+pc2:eth1" mac_address="00000a0a0101" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+375">      <ip address="10.10.1.1" type="ipv4"/>    </interface>    
          <rs:vnode name="pcvm2-1" xmlns:rs="http://www.protogeni.net/resources/rspec/ext/emulab/1"/>    <host name="virt1.ecgtest2.pgeni-gpolab-bbn-com.utah.geniracks.net"/>    <services>      <login authentication="ssh-keys" hostname="pc2.utah.geniracks.net" port="30266" username="chaos"/>    </services>  </node>  
        <node client_id="virt2" component_id="urn:publicid:IDN+utah.geniracks.net+node+pc5" component_manager_id="urn:publicid:IDN+utah.geniracks.net+authority+cm" exclusive="false" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+373">    
            <sliver_type name="emulab-openvz"/>    
            <interface client_id="virt2:if0" component_id="urn:publicid:IDN+utah.geniracks.net+interface+pc5:eth1" mac_address="00000a0a0102" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+376">      <ip address="10.10.1.2" type="ipv4"/>    </interface>    
          <rs:vnode name="pcvm5-2" xmlns:rs="http://www.protogeni.net/resources/rspec/ext/emulab/1"/>    <host name="virt2.ecgtest2.pgeni-gpolab-bbn-com.utah.geniracks.net"/>    <services>      <login authentication="ssh-keys" hostname="pc5.utah.geniracks.net" port="30266" username="chaos"/>    </services>  </node>  
    
        <link client_id="virt1-virt2-0" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+374" vlantag="260">    
            <interface_ref client_id="virt1:if0" component_id="urn:publicid:IDN+utah.geniracks.net+interface+pc2:eth1" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+375"/>    
            <interface_ref client_id="virt2:if0" component_id="urn:publicid:IDN+utah.geniracks.net+interface+pc5:eth1" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+376"/>    
            <property dest_id="virt2:if0" source_id="virt1:if0"/>    
            <property dest_id="virt1:if0" source_id="virt2:if0"/>    
        </link>  
    </rspec>
    INFO:omni: ------------------------------------------------------------
    INFO:omni: Completed createsliver:
    
      Options as run:
                    aggregate: http://www.utah.geniracks.net/protogeni/xmlrpc/am
                    configfile: /home/chaos/omni/omni_pgeni
                    framework: pg
                    native: True
    
      Args: createsliver ecgtest2 /home/chaos/IG-MON-nodes-C.rspec
    
      Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires on 2012-05-19 10:30:51 UTC
    Reserved resources on http://www.utah.geniracks.net/protogeni/xmlrpc/am.  
    INFO:omni: ============================================================
    
  • According to sliverstatus, my nodes are:
    pc2.utah.geniracks.net port 30266
    pc5.utah.geniracks.net port 30266
    
  • However, pc2 needs to run frisbee before this is ready. Wait awhile.
  • Login to pc2.utah.geniracks.net on port 30266 with agent forwarding
  • Find that it is virt1 and has eth1=10.10.1.1
  • Find a big file:
    [chaos@virt1 ~]$ ls -l /usr/lib/locale/locale-archive-rpm 
    -rw-r--r-- 1 root root 99154656 May 20  2011 /usr/lib/locale/locale-archive-rpm
    
  • Copy the big file over the dataplane:
    [chaos@virt1 ~]$ scp /usr/lib/locale/locale-archive 10.10.1.2:/tmp/
    The authenticity of host '10.10.1.2 (10.10.1.2)' can't be established.
    RSA key fingerprint is 6d:1d:76:53:a5:25:99:39:e2:89:ea:b0:99:e3:d3:b9.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added '10.10.1.2' (RSA) to the list of known hosts.
    locale-archive                                100%   95MB  11.8MB/s   00:08    
    
  • Look at the arps table on virt1 and virt2:
    [chaos@virt1 ~]$ /sbin/arp -a
    virt2-virt1-virt2-0 (10.10.1.2) at 82:02:0a:0a:01:02 [ether] on mv1.1
    pc2.utah.geniracks.net (155.98.34.12) at 00:01:ac:11:02:01 [ether] on eth999
    boss.utah.geniracks.net (155.98.34.4) at 00:01:ac:11:02:01 [ether] on eth999
    
    [chaos@virt1 ~]$ ssh 10.10.1.2
    Last login: Fri May 18 13:35:41 2012 from capybara.bbn.com
    
    [chaos@virt2 ~]$ /sbin/arp -a
    virt1-virt1-virt2-0 (10.10.1.1) at 82:01:0a:0a:01:01 [ether] on mv2.2
    boss.utah.geniracks.net (155.98.34.4) at 00:01:ac:11:05:02 [ether] on eth999
    pc5.utah.geniracks.net (155.98.34.15) at 00:01:ac:11:05:02 [ether] on eth999
    
  • Delete the sliver:
    jericho,[~],05:53(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am deletesliver ecgtest2
    

Step 2 (prep): start a bare metal node experiment and terminate it

  • An experimenter requests an experiment from the InstaGENI AM containing two rack hosts and a dataplane VLAN
  • The experimenter logs into a host, and sends dataplane traffic
  • The experimenter terminates the experiment

Results of testing: 2012-05-18

  • Here is an rspec for two physical nodes with no OS specified:
    jericho,[~],05:39(0)$ cat IG-MON-nodes-D.rspec 
    <?xml version="1.0" encoding="UTF-8"?>
    <!-- This rspec will reserve two physical node, each with no OS specified,
         and create a single dataplane link between them.  It should work
         on any Emulab which has nodes available and supports OpenVZ.  -->
    <rspec xmlns="http://www.geni.net/resources/rspec/3"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
           xsi:schemaLocation="http://www.geni.net/resources/rspec/3
                               http://www.geni.net/resources/rspec/3/request.xsd" 
           type="request">
    
      <node client_id="phys1" exclusive="true">
        <sliver_type name="raw" />
        <interface client_id="phys1:if0" />
      </node>
      <node client_id="phys2" exclusive="true">
        <sliver_type name="raw" />
        <interface client_id="phys2:if0" />
      </node>
    
      <link client_id="phys1-phys2-0">
        <interface_ref client_id="phys1:if0"/>
        <interface_ref client_id="phys2:if0"/>
        <property source_id="phys1:if0" dest_id="phys2:if0"/>
        <property source_id="phys2:if0" dest_id="phys1:if0"/>
      </link>
    </rspec>
    
  • Create a slice for this experiment:
    omni createslice ecgtest3
    
  • Create a sliver using this rspec:
    jericho,[~],05:40(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am createsliver ecgtest3 ~/IG-MON-nodes-D.rspec
    INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
    INFO:omni:Using control framework pg
    INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest3 expires on 2012-05-19 10:40:34 UTC
    INFO:omni:Creating sliver(s) from rspec file /home/chaos/IG-MON-nodes-D.rspec for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest3
    INFO:omni:Asked http://www.utah.geniracks.net/protogeni/xmlrpc/am to reserve resources. Result:
    INFO:omni:<?xml version="1.0" ?>
    INFO:omni:<!-- Reserved resources for:
            Slice: ecgtest3
            At AM:
            URL: http://www.utah.geniracks.net/protogeni/xmlrpc/am
     -->
    INFO:omni:<rspec type="manifest" xmlns="http://www.geni.net/resources/rspec/3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.geni.net/resources/rspec/3                            http://www.geni.net/resources/rspec/3/manifest.xsd">  
    
        <node client_id="phys1" component_id="urn:publicid:IDN+utah.geniracks.net+node+pc4" component_manager_id="urn:publicid:IDN+utah.geniracks.net+authority+cm" exclusive="true" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+378">    
            <sliver_type name="raw-pc"/>    
            <interface client_id="phys1:if0" component_id="urn:publicid:IDN+utah.geniracks.net+interface+pc4:eth1" mac_address="e83935b1ec9e" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+381">      <ip address="10.10.1.1" type="ipv4"/>    </interface>    
          <rs:vnode name="pc4" xmlns:rs="http://www.protogeni.net/resources/rspec/ext/emulab/1"/>    <host name="phys1.ecgtest3.pgeni-gpolab-bbn-com.utah.geniracks.net"/>    <services>      <login authentication="ssh-keys" hostname="pc4.utah.geniracks.net" port="22" username="chaos"/>    </services>  </node>  
        <node client_id="phys2" component_id="urn:publicid:IDN+utah.geniracks.net+node+pc1" component_manager_id="urn:publicid:IDN+utah.geniracks.net+authority+cm" exclusive="true" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+379">    
            <sliver_type name="raw-pc"/>    
            <interface client_id="phys2:if0" component_id="urn:publicid:IDN+utah.geniracks.net+interface+pc1:eth1" mac_address="e83935b10f96" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+382">      <ip address="10.10.1.2" type="ipv4"/>    </interface>    
          <rs:vnode name="pc1" xmlns:rs="http://www.protogeni.net/resources/rspec/ext/emulab/1"/>    <host name="phys2.ecgtest3.pgeni-gpolab-bbn-com.utah.geniracks.net"/>    <services>      <login authentication="ssh-keys" hostname="pc1.utah.geniracks.net" port="22" username="chaos"/>    </services>  </node>  
    
        <link client_id="phys1-phys2-0" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+380" vlantag="261">    
            <interface_ref client_id="phys1:if0" component_id="urn:publicid:IDN+utah.geniracks.net+interface+pc4:eth1" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+381"/>    
            <interface_ref client_id="phys2:if0" component_id="urn:publicid:IDN+utah.geniracks.net+interface+pc1:eth1" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+382"/>    
            <property dest_id="phys2:if0" source_id="phys1:if0"/>    
            <property dest_id="phys1:if0" source_id="phys2:if0"/>    
        </link>  
    </rspec>
    INFO:omni: ------------------------------------------------------------
    INFO:omni: Completed createsliver:
    
      Options as run:
                    aggregate: http://www.utah.geniracks.net/protogeni/xmlrpc/am
                    configfile: /home/chaos/omni/omni_pgeni
                    framework: pg
                    native: True
    
      Args: createsliver ecgtest3 /home/chaos/IG-MON-nodes-D.rspec
    
      Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest3 expires on 2012-05-19 10:40:34 UTC
    Reserved resources on http://www.utah.geniracks.net/protogeni/xmlrpc/am.  
    INFO:omni: ============================================================
    
  • According to sliverstatus, my nodes are pc1 and pc4.
  • Login to pc1.utah.geniracks.net with agent forwarding
  • Find that it is phys2 and has eth1=10.10.1.2
  • Find a big file:
    [chaos@phys2 ~]$ ls -l /usr/lib/locale/locale-archive
    -rw-r--r-- 1 root root 104997424 Aug 10  2011 /usr/lib/locale/locale-archive
    
  • Copy the big file over the dataplane in a loop:
    [chaos@phys2 ~]$ while [ 1 ]; do scp /usr/lib/locale/locale-archive 10.10.1.1:/tmp/; done
    locale-archive                                100%  100MB  50.1MB/s   00:02
    locale-archive                                100%  100MB  50.1MB/s   00:02
    locale-archive                                100%  100MB  50.1MB/s   00:02
    ...
    
  • After a bit of that, delete the sliver:
    jericho,[~],05:53(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am deletesliver ecgtest3
    

Step 3 (prep): start an experiment and leave it running

  • An experimenter requests an experiment from the InstaGENI AM containing two rack VMs connected by an OpenFlow-controlled dataplane VLAN
  • The experimenter configures a simple OpenFlow controller to pass dataplane traffic between the VMs
  • The experimenter logs into one VM, and begins sending a continuous stream of dataplane traffic

Results of testing: 2012-05-18

Note: per discussion on instageni-design on 2012-05-17, request of an OpenFlow-controlled dataplane is not yet possible. So this will need to be retested once OpenFlow control is available.

  • Not creating a new experiment here, but instead reusing my experiment, ecgtest, created yesterday for IG-MON-1.
  • Login to pc3, whose eth1 is 10.10.1.1
  • Make a bigger dataplane file by catting the other a few times, then start copying it around again:
    [chaos@phys1 ~]$ ls -l /tmp/locale-archive 
    -rw-r--r-- 1 chaos pgeni-gpolab-bbn 3149922720 May 18 04:14 /tmp/locale-archive
    
    while [ 1 ]; do scp /tmp/locale-archive 10.10.1.2:/tmp/; done
    
  • This lets me see that the first instance of the file copy takes about a minute, at about 55MBps:
    [chaos@phys1 ~]$ while [ 1 ]; do scp /tmp/locale-archive 10.10.1.2:/tmp/; done
    locale-archive                                100% 3004MB  55.6MB/s   00:54    
    
  • Leave this running.

Step 4: view running VMs

Using:

  • On boss, use AM state, logs, or administrator interfaces to determine:
    • What experiments are running right now
    • How many VMs are allocated for those experiments
    • Which OpenVZ node is each VM running on
  • On OpenVZ nodes, use system state, logs, or administrative interfaces to determine what VMs are running right now, and look at any available configuration or logs of each.

Verify:

  • A site administrator can determine what experiments are running on the InstaGENI AM
  • A site administrator can determine the mapping of VMs to active experiments
  • A site administrator can view some state of running VMs on the VM server

Results of testing: 2012-05-18

  • Per-host view of current state:
  • Per-experiment view of current state:
    • Browse to https://boss.utah.geniracks.net/genislices.php and find one slice running on the Component Manager:
      ID   HRN                         Created             Expires
      362  bbn-pgeni.ecgtest (ecgtest) 2012-05-17 08:12:37 2012-05-18 18:00:00
      
    • Click (ecgtest) to view the details of that experiment at https://boss.utah.geniracks.net/showexp.php3?experiment=363#details.
    • This shows what nodes it's using, including that its VM has been put on pc5:
      Physical Node Mapping:
      ID              Type         OS              Physical
      --------------- ------------ --------------- ------------
      phys1           dl360        FEDORA15-STD    pc3
      virt1           pcvm         OPENVZ-STD      pcvm5-1 (pc5)
      
    • Here are some other interesting things:
      IP Port allocation:
      Low             High
      --------------- ------------
      30000           30255
      
      SSHD Port allocation ('ssh -p portnum'):
      ID              Port       SSH command
      --------------- ---------- ----------------------
      
      Physical Lan/Link Mapping:
      ID              Member          IP              MAC                  NodeID
      --------------- --------------- --------------- -------------------- ---------
      phys1-virt1-0   phys1:0         10.10.1.1       e8:39:35:b1:4e:8a    pc3
                                                      1/1 <-> 1/34         procurve2
      phys1-virt1-0   virt1:0         10.10.1.2                            pcvm5-1
      
    • That last one is mysterious, because the experimenter's sliverstatus command contains:
        { 'attributes':
          { 'client_id': 'phys1:if0',
            'component_id': 'urn:publicid:IDN+utah.geniracks.net+interface+pc3:eth1',
            'mac_address': 'e83935b14e8a',
      ...
        { 'attributes':
          { 'client_id': 'virt1:if0',
            'component_id': 'urn:publicid:IDN+utah.geniracks.net+interface+pc5:eth1',
            'mac_address': '00000a0a0102',
      
    • So i think it should be possible for the admin interface to know that virtual mac address too.
    • Huh, but also, that mac address reported in sliverstatus is in fact wrong. Let me summarize:
      MAC addrs reported for phys1:0 == 10.10.1.1
        E8:39:35:B1:4E:8A: from /sbin/ifconfig eth1 run on phys1 (authoritative)
        e83935b14e8a:      from sliverstatus as experimenter (correct)
        e8:39:35:b1:4e:8a: from: https://boss.utah.geniracks.net/showexp.php3?experiment=363#details (correct)
      
      MAC addrs reported for virt1:0 == 10.10.1.2
        82:01:0A:0A:01:02: from /sbin/ifconfig mv1.1 run on virt1 (authoritative)
        00000a0a0102:      from sliverstatus as experimenter (incorrect: first four digits are wrong)
        -                : from https://boss.utah.geniracks.net/showexp.php3?experiment=363#details (not reported)
      
      I opened 26 for this issue.
  • Now, use the OpenVZ host itself to view activity:
    • As an admin, login to pc5.utah.geniracks.net
    • Poking around, i was led to a couple of prospective data sources:
      • Logs in /var/emulab
      • The vzctl RPM, containing a number of OpenVZ control commands
    • The latter seems to give a list of running VMs easily:
      vhost1,[/var/emulab],05:00(1)$ sudo vzlist -a
            CTID      NPROC STATUS    IP_ADDR         HOSTNAME
               1         15 running   -               virt1.ecgtest.pgeni-gpolab-bbn-com.utah.geniracks.net
      
    • I also see a command to figure out which container is running a given PID. Suppose i run top and am concerned about an sshd process chewing up all system CPU:
          PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND         
        51817 20001     20   0  116m 3780  872 R 94.4  0.0   0:05.74 sshd             
      
    • Since the user is numeric, i can assume this process is probably running in a container, so find out which one:
      vhost1,[/var/emulab],05:05(0)$ sudo vzpid 51766
      Pid     CTID    Name
      51766   1       sshd
      chaos      51804   51163  0 05:04 pts/0    00:00:00 grep --color=auto ssh
      
    • and then look up the container info as above.
    • The files in /var/emulab give details about how each experiment was created. In particular:
      Information about experiment startup attributes:
        /var/emulab/boot/tmcc.pcvm5-1/
        /var/emulab/boot/tmcc.pcvm5-2/
      
      Logs of experiment progress:
        /var/emulab/logs/tbvnode-pcvm5-1.log
        /var/emulab/logs/tbvnode-pcvm5-2.log
        /var/emulab/logs/tmccproxy.pcvm5-1.log
        /var/emulab/logs/tmccproxy.pcvm5-2.log
      
    • These may be useful for running and terminated experiments if the context IDs are unique.

Side test: are experiment context IDs unique over time on an OpenVZ server?

  • rspec to create a single OpenVZ container:
    jericho,[~],07:12(0)$ cat IG-MON-nodes-E.rspec 
    <?xml version="1.0" encoding="UTF-8"?>
    <!-- This rspec will reserve one openvz node.  It should work on any
         Emulab which has nodes available and supports OpenVZ.  -->
    <rspec xmlns="http://www.geni.net/resources/rspec/3"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
           xsi:schemaLocation="http://www.geni.net/resources/rspec/3
                               http://www.geni.net/resources/rspec/3/request.xsd" 
           type="request">
    
      <node client_id="virt1" exclusive="false">
        <sliver_type name="emulab-openvz" />
      </node>
    </rspec>
    
  • use existing slice ecgtest2 to create a sliver:
    jericho,[~],07:13(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am 
    createsliver ecgtest2 IG-MON-nodes-E.rspec 
    INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
    INFO:omni:Using control framework pg
    INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires within 1 day on 2012-05-19 10:30:51 UTC
    INFO:omni:Creating sliver(s) from rspec file IG-MON-nodes-E.rspec for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2
    INFO:omni:Asked http://www.utah.geniracks.net/protogeni/xmlrpc/am to reserve resources. Result:
    INFO:omni:<?xml version="1.0" ?>
    INFO:omni:<!-- Reserved resources for:
            Slice: ecgtest2
            At AM:
            URL: http://www.utah.geniracks.net/protogeni/xmlrpc/am
     -->
    INFO:omni:<rspec type="manifest" xmlns="http://www.geni.net/resources/rspec/3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.geni.net/resources/rspec/3                            http://www.geni.net/resources/rspec/3/manifest.xsd">  
    
        <node client_id="virt1" component_id="urn:publicid:IDN+utah.geniracks.net+node+pc5" component_manager_id="urn:publicid:IDN+utah.geniracks.net+authority+cm" exclusive="false" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+384">    
            <sliver_type name="emulab-openvz"/>    
          <rs:vnode name="pcvm5-2" xmlns:rs="http://www.protogeni.net/resources/rspec/ext/emulab/1"/>    <host name="virt1.ecgtest2.pgeni-gpolab-bbn-com.utah.geniracks.net"/>    <services>      <login authentication="ssh-keys" hostname="pc5.utah.geniracks.net" port="30266" username="chaos"/>    </services>  </node>  
    </rspec>
    INFO:omni: ------------------------------------------------------------
    INFO:omni: Completed createsliver:
    
      Options as run:
                    aggregate: http://www.utah.geniracks.net/protogeni/xmlrpc/am
                    configfile: /home/chaos/omni/omni_pgeni
                    framework: pg
                    native: True
    
      Args: createsliver ecgtest2 IG-MON-nodes-E.rspec
    
      Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires within 1 day(s) on 2012-05-19 10:30:51 UTC
    Reserved resources on http://www.utah.geniracks.net/protogeni/xmlrpc/am.  
    INFO:omni: ============================================================
    

Summary: this means that VM IDs are reused.

At this point, i was going to gather more information about logs, when the Utah rack became totally unavailable: i was no longer able to use my shell sessions to any machines in the rack, and got ping timeouts to boss.

After about 8 minutes, things became available again. I went looking for logs of my dataplane file copy activity to see whether the dataplane had been interrupted, at which point i found out that sshd on the dataplane does not appear to be logged anywhere, either in /var/log within the container or on pc5 itself. That's not a rack requirement, but it seems non-ideal for experimenters. I opened 27 to report it.

Results of testing: 2012-05-18 13:00

I briefly revisited this test at 13:00, because Luisa had started 45 experiments on pc5, consuming all available resources, so i wanted to look at pc5 again briefly.

  • For the record, the machine's load average is high, but the machine is responsive, and the CPU is still somewhat idle. Here's the header of top output:
    top - 11:06:22 up 1 day, 20:06,  1 user,  load average: 10.13, 9.29, 7.58
    Tasks: 1283 total,   2 running, 1281 sleeping,   0 stopped,   0 zombie
    Cpu(s):  7.7%us,  1.2%sy,  0.0%ni, 34.2%id, 56.3%wa,  0.0%hi,  0.6%si,  0.0%st
    Mem:  49311612k total,  7634748k used, 41676864k free,   255532k buffers
    Swap:  1050168k total,        0k used,  1050168k free,  4821552k cached
    
  • Here is the output of vzlist:
    vhost1,[/var/emulab],11:07(0)$ sudo vzlist
          CTID      NPROC STATUS    IP_ADDR         HOSTNAME
             1         19 running   -               virt1.ecgtest.pgeni-gpolab-bbn-com.utah.geniracks.net
             2         11 running   -               virt1.ecgtest2.pgeni-gpolab-bbn-com.utah.geniracks.net
             3         11 running   -               host2.what-image.pgeni-gpolab-bbn-com.utah.geniracks.net
             4         11 running   -               host1.what-image2.pgeni-gpolab-bbn-com.utah.geniracks.net
             5         11 running   -               host2.what-image3.pgeni-gpolab-bbn-com.utah.geniracks.net
             6         15 running   -               host1.singlevm-1.pgeni-gpolab-bbn-com.utah.geniracks.net
             7         15 running   -               host1.singlevm-2.pgeni-gpolab-bbn-com.utah.geniracks.net
             8         15 running   -               host1.singlevm-3.pgeni-gpolab-bbn-com.utah.geniracks.net
             9         15 running   -               host1.singlevm-4.pgeni-gpolab-bbn-com.utah.geniracks.net
            10         15 running   -               host1.singlevm-5.pgeni-gpolab-bbn-com.utah.geniracks.net
            11         15 running   -               host1.singlevm-6.pgeni-gpolab-bbn-com.utah.geniracks.net
            12         15 running   -               host1.singlevm-7.pgeni-gpolab-bbn-com.utah.geniracks.net
            13         15 running   -               host1.singlevm-8.pgeni-gpolab-bbn-com.utah.geniracks.net
            14         15 running   -               host1.singlevm-9.pgeni-gpolab-bbn-com.utah.geniracks.net
            15         15 running   -               host1.singlevm-10.pgeni-gpolab-bbn-com.utah.geniracks.net
            16         15 running   -               host1.singlevm-11.pgeni-gpolab-bbn-com.utah.geniracks.net
            17         15 running   -               host1.singlevm-12.pgeni-gpolab-bbn-com.utah.geniracks.net
            18         15 running   -               host1.singlevm-13.pgeni-gpolab-bbn-com.utah.geniracks.net
            19         11 running   -               host1.singlevm-14.pgeni-gpolab-bbn-com.utah.geniracks.net
            20         15 running   -               host1.singlevm-15.pgeni-gpolab-bbn-com.utah.geniracks.net
            21         15 running   -               host1.singlevm-16.pgeni-gpolab-bbn-com.utah.geniracks.net
            22         15 running   -               host1.singlevm-17.pgeni-gpolab-bbn-com.utah.geniracks.net
            23         15 running   -               host1.singlevm-18.pgeni-gpolab-bbn-com.utah.geniracks.net
            24         15 running   -               host1.singlevm-19.pgeni-gpolab-bbn-com.utah.geniracks.net
            25         15 running   -               host1.singlevm-20.pgeni-gpolab-bbn-com.utah.geniracks.net
            26         16 running   -               host1.singlevm-21.pgeni-gpolab-bbn-com.utah.geniracks.net
            27         16 running   -               host1.singlevm-22.pgeni-gpolab-bbn-com.utah.geniracks.net
            28         15 running   -               host1.singlevm-23.pgeni-gpolab-bbn-com.utah.geniracks.net
            29         15 running   -               host1.singlevm-24.pgeni-gpolab-bbn-com.utah.geniracks.net
            30         15 running   -               host1.singlevm-25.pgeni-gpolab-bbn-com.utah.geniracks.net
            31         15 running   -               host1.singlevm-26.pgeni-gpolab-bbn-com.utah.geniracks.net
            32         15 running   -               host1.singlevm-27.pgeni-gpolab-bbn-com.utah.geniracks.net
            33         15 running   -               host1.singlevm-28.pgeni-gpolab-bbn-com.utah.geniracks.net
            34         15 running   -               host1.singlevm-29.pgeni-gpolab-bbn-com.utah.geniracks.net
            35         15 running   -               host1.singlevm-30.pgeni-gpolab-bbn-com.utah.geniracks.net
            36         15 running   -               host1.singlevm-31.pgeni-gpolab-bbn-com.utah.geniracks.net
            37         15 running   -               host1.singlevm-32.pgeni-gpolab-bbn-com.utah.geniracks.net
            38         15 running   -               host1.singlevm-33.pgeni-gpolab-bbn-com.utah.geniracks.net
            39         15 running   -               host1.singlevm-34.pgeni-gpolab-bbn-com.utah.geniracks.net
            40         15 running   -               host1.singlevm-35.pgeni-gpolab-bbn-com.utah.geniracks.net
            41         15 running   -               host1.singlevm-36.pgeni-gpolab-bbn-com.utah.geniracks.net
            42         15 running   -               host1.singlevm-37.pgeni-gpolab-bbn-com.utah.geniracks.net
            43         15 running   -               host1.singlevm-38.pgeni-gpolab-bbn-com.utah.geniracks.net
            44         15 running   -               host1.singlevm-39.pgeni-gpolab-bbn-com.utah.geniracks.net
            45         15 running   -               host1.singlevm-40.pgeni-gpolab-bbn-com.utah.geniracks.net
            46         15 running   -               host1.singlevm-41.pgeni-gpolab-bbn-com.utah.geniracks.net
            47         15 running   -               host1.singlevm-42.pgeni-gpolab-bbn-com.utah.geniracks.net
            48         15 running   -               host1.singlevm-43.pgeni-gpolab-bbn-com.utah.geniracks.net
            49         15 running   -               host1.singlevm-44.pgeni-gpolab-bbn-com.utah.geniracks.net
            50         11 running   -               host1.singlevm-45.pgeni-gpolab-bbn-com.utah.geniracks.net
    
  • Here is the iptables nat table, which is being used to forward sshd to each VM:
    vhost1,[/var/emulab],11:08(0)$ sudo iptables -L -n -t nat
    Chain PREROUTING (policy ACCEPT)
    target     prot opt source               destination         
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:30010 to:172.17.5.1:30010 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:30266 to:172.17.5.2:30266 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:30522 to:172.17.5.3:30522 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:30778 to:172.17.5.4:30778 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:31034 to:172.17.5.5:31034 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:31290 to:172.17.5.6:31290 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:31546 to:172.17.5.7:31546 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:31802 to:172.17.5.8:31802 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:32058 to:172.17.5.9:32058 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:32314 to:172.17.5.10:32314 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:32570 to:172.17.5.11:32570 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:32826 to:172.17.5.12:32826 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:33082 to:172.17.5.13:33082 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:33338 to:172.17.5.14:33338 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:33594 to:172.17.5.15:33594 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:33850 to:172.17.5.16:33850 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:34106 to:172.17.5.17:34106 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:34362 to:172.17.5.18:34362 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:34618 to:172.17.5.19:34618 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:34874 to:172.17.5.20:34874 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:35130 to:172.17.5.21:35130 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:35386 to:172.17.5.22:35386 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:35642 to:172.17.5.23:35642 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:35898 to:172.17.5.24:35898 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:36154 to:172.17.5.25:36154 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:36410 to:172.17.5.26:36410 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:36666 to:172.17.5.27:36666 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:36922 to:172.17.5.28:36922 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:37178 to:172.17.5.29:37178 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:37434 to:172.17.5.30:37434 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:37690 to:172.17.5.31:37690 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:37946 to:172.17.5.32:37946 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:38202 to:172.17.5.33:38202 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:38458 to:172.17.5.34:38458 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:38714 to:172.17.5.35:38714 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:38970 to:172.17.5.36:38970 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:39226 to:172.17.5.37:39226 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:39482 to:172.17.5.38:39482 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:39738 to:172.17.5.39:39738 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:39994 to:172.17.5.40:39994 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:40250 to:172.17.5.41:40250 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:40506 to:172.17.5.42:40506 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:40762 to:172.17.5.43:40762 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:41018 to:172.17.5.44:41018 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:41274 to:172.17.5.45:41274 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:41530 to:172.17.5.46:41530 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:41786 to:172.17.5.47:41786 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:42042 to:172.17.5.48:42042 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:42298 to:172.17.5.49:42298 
    DNAT       tcp  --  0.0.0.0/0            155.98.34.15        tcp dpt:42554 to:172.17.5.50:42554 
    
    Chain POSTROUTING (policy ACCEPT)
    target     prot opt source               destination         
    ACCEPT     all  --  172.16.0.0/12        155.98.34.0/24      
    ACCEPT     all  --  172.16.0.0/12        172.16.0.0/12       
    SNAT       all  --  172.16.0.0/12        0.0.0.0/0           to:155.98.34.15 
    
    Chain OUTPUT (policy ACCEPT)
    target     prot opt source               destination         
    
  • As expected, /var/emulab/{boot,logs,vms} contain subdirectories or files related to each of the 50 running VMs.
  • As expected, a randomly-sampled experiment with dataplane interfaces does not list any mac addresses in the admin UI https://boss.utah.geniracks.net/showexp.php3?experiment=379#details. This is consistent with instaticket:26.

Step 5: get information about terminated experiments

Using:

  • On boss, use AM state, logs, or administrator interfaces to find evidence of the two terminated experiments.
  • Determine how many other experiments were run in the past day.
  • Determine which GENI user created each of the terminated experiments.
  • Determine the mapping of experiments to OpenVZ or exclusive hosts for each of the terminated experiments.
  • Determine the control and dataplane MAC addresses assigned to each VM in each terminated experiment.
  • Determine any IP addresses assigned by InstaGENI to each VM in each terminated experiment.

Verify:

  • A site administrator can get ownership and resource allocation information for recently-terminated experiments which used OpenVZ VMs.
  • A site administrator can get ownership and resource allocation information for recently-terminated experiments which used physical hosts.
  • A site administrator can get information about MAC addresses and IP addresses used by recently-terminated experiments.

Step 6: get OpenFlow state information

Using:

  • On the dataplane switch, get a list of controllers, and see if any additional controllers are serving experiments.
  • On the flowvisor VM, get a list of active FV slices from the FlowVisor
  • On the FOAM VM, get a list of active slivers from FOAM
  • Use FV, FOAM, or the switch to list the flowspace of a running OpenFlow experiment.

Verify:

  • A site administrator can get information about the OpenFlow resources used by running experiments.
  • When an OpenFlow experiment is started by InstaGENI, a new controller is added directly to the switch.
  • No new FlowVisor slices are added for new OpenFlow experiments started by InstaGENI.
  • No new FOAM slivers are added for new OpenFlow experiments started by InstaGENI.

Step 7: verify MAC addresses on the rack dataplane switch

Using:

  • Establish a privileged login to the dataplane switch
  • Obtain a list of the full MAC address table of the switch
  • On boss and the experimental hosts, use available data sources to determine which host or VM owns each MAC address.

Verify:

  • It is possible to identify and classify every MAC address visible on the switch

Step 8: verify active dataplane traffic

Using:

  • Establish a privileged login to the dataplane switch
  • Based on the information from Step 7, determine which interfaces are carrying traffic between the experimental VMs
  • Collect interface counters for those interfaces over a period of 10 minutes
  • Estimate the rate at which the experiment is sending traffic

Verify:

  • The switch reports interface counters, and an administrator can obtain plausible estimates of dataplane traffic quantities by looking at them.

Attachments (8)