[[PageOutline]] = Detailed test plan for IG-MON-3: GENI Active Experiment Inspection Test = ''This page is GPO's working page for performing IG-MON-3. It is public for informational purposes, but it is not an official status report. See [wiki:GENIRacksHome/InstageniRacks/AcceptanceTestStatus] for the current status of InstaGENI acceptance tests.'' ''Last substantive edit of this page: 2012-05-18'' == Page format == * The status chart summarizes the state of this test * The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages. * The steps contain things i will actually do/verify: * Steps may be composed of related substeps where i find this useful for clarity * Each step is either a preparatory step (identified by "(prep)") or a verification step (the default): * Preparatory steps are just things we have to do. They're not tests of the rack, but are prerequisites for subsequent verification steps * Verification steps are steps in which we will actually look at rack output and make sure it is as expected. They contain a '''Using:''' block, which lists the steps to run the verification, and an '''Expect:''' block which lists what outcome is expected for the test to pass. == Status of test == || '''Step''' || '''State''' || '''Date completed''' || '''Tickets''' || '''Comments''' || || 1 || || || || ready to test || || 2 || || || || ready to test || || 3 || [[Color(orange,Blocked)]] || || || blocked on ability to request OpenFlow resources from InstaGENI AM || || 4 || [[Color(orange,Blocked)]] || || || blocked on 3 || || 5 || [[Color(orange,Blocked)]] || || || blocked on 3 || || 6 || [[Color(orange,Blocked)]] || || || blocked on 3, availability of FOAM || || 7 || [[Color(orange,Blocked)]] || || || blocked on 3 || || 8 || [[Color(orange,Blocked)]] || || || blocked on 3 || == High-level description from test plan == This test inspects the state of the rack data plane and control networks when experiments are running, and verifies that a site administrator can find information about running experiments. === Procedure === * An experimenter from the GPO starts up experiments to ensure there is data to look at: * An experimenter runs an experiment containing at least one rack OpenVZ VM, and terminates it. * An experimenter runs an experiment containing at least one rack OpenVZ VM, and leaves it running. * A site administrator uses available system and experiment data sources to determine current experimental state, including: * How many VMs are running and which experimenters own them * How many physical hosts are in use by experiments, and which experimenters own them * How many VMs were terminated within the past day, and which experimenters owned them * What !OpenFlow controllers the data plane switch, the rack !FlowVisor, and the rack FOAM are communicating with * A site administrator examines the switches and other rack data sources, and determines: * What MAC addresses are currently visible on the data plane switch and what experiments do they belong to? * For some experiment which was terminated within the past day, what data plane and control MAC and IP addresses did the experiment use? * For some experimental data path which is actively sending traffic on the data plane switch, do changes in interface counters show approximately the expected amount of traffic into and out of the switch? === Criteria to verify as part of this test === * VII.09. A site administrator can determine the MAC addresses of all physical host interfaces, all network device interfaces, all active experimental VMs, and all recently-terminated experimental VMs. (C.3.f) * VII.11. A site administrator can locate current configuration of flowvisor, FOAM, and any other OpenFlow services, and find logs of recent activity and changes. (D.6.a) * VII.18. Given a public IP address and port, an exclusive VLAN, a sliver name, or a piece of user-identifying information such as e-mail address or username, a site administrator or GMOC operator can identify the email address, username, and affiliation of the experimenter who controlled that resource at a particular time. (D.7) == Step 1 (prep): start a VM experiment and terminate it == * An experimenter requests an experiment from the InstaGENI AM containing two rack VMs and a dataplane VLAN * The experimenter logs into a VM, and sends dataplane traffic * The experimenter terminates the experiment === Results of testing: 2012-05-18 === * I'll use the following rspec to get two VMs: {{{ jericho,[~],05:29(0)$ cat IG-MON-nodes-C.rspec }}} * Then create a slice: {{{ omni createslice ecgtest2 }}} * Then create a sliver using that rspec: {{{ jericho,[~],05:31(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am createsliver ecgtest2 ~/IG-MON-nodes-C.rspec INFO:omni:Loading config file /home/chaos/omni/omni_pgeni INFO:omni:Using control framework pg ERROR:omni.protogeni:Call for Get Slice Cred for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 failed.: Exception: PG Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 does not exist. ERROR:omni.protogeni: ..... Run with --debug for more information ERROR:omni:Cannot create sliver urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2: Could not get slice credential: Exception: PG Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 does not exist. }}} * It looks like the slice just wasn't ready yet. Trying again after a minute, the same thing worked: {{{ jericho,[~],05:31(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am createsliver ecgtest2 ~/IG-MON-nodes-C.rspec INFO:omni:Loading config file /home/chaos/omni/omni_pgeni INFO:omni:Using control framework pg INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires on 2012-05-19 10:30:51 UTC INFO:omni:Creating sliver(s) from rspec file /home/chaos/IG-MON-nodes-C.rspec for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 INFO:omni:Asked http://www.utah.geniracks.net/protogeni/xmlrpc/am to reserve resources. Result: INFO:omni: INFO:omni: INFO:omni: INFO:omni: ------------------------------------------------------------ INFO:omni: Completed createsliver: Options as run: aggregate: http://www.utah.geniracks.net/protogeni/xmlrpc/am configfile: /home/chaos/omni/omni_pgeni framework: pg native: True Args: createsliver ecgtest2 /home/chaos/IG-MON-nodes-C.rspec Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires on 2012-05-19 10:30:51 UTC Reserved resources on http://www.utah.geniracks.net/protogeni/xmlrpc/am. INFO:omni: ============================================================ }}} * According to sliverstatus, my nodes are: {{{ pc2.utah.geniracks.net port 30266 pc5.utah.geniracks.net port 30266 }}} * However, pc2 needs to run frisbee before this is ready. Wait awhile. * Login to pc2.utah.geniracks.net on port 30266 with agent forwarding * Find that it is virt1 and has eth1=10.10.1.1 * Find a big file: {{{ [chaos@virt1 ~]$ ls -l /usr/lib/locale/locale-archive-rpm -rw-r--r-- 1 root root 99154656 May 20 2011 /usr/lib/locale/locale-archive-rpm }}} * Copy the big file over the dataplane: {{{ [chaos@virt1 ~]$ scp /usr/lib/locale/locale-archive 10.10.1.2:/tmp/ The authenticity of host '10.10.1.2 (10.10.1.2)' can't be established. RSA key fingerprint is 6d:1d:76:53:a5:25:99:39:e2:89:ea:b0:99:e3:d3:b9. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '10.10.1.2' (RSA) to the list of known hosts. locale-archive 100% 95MB 11.8MB/s 00:08 }}} * Look at the arps table on virt1 and virt2: {{{ [chaos@virt1 ~]$ /sbin/arp -a virt2-virt1-virt2-0 (10.10.1.2) at 82:02:0a:0a:01:02 [ether] on mv1.1 pc2.utah.geniracks.net (155.98.34.12) at 00:01:ac:11:02:01 [ether] on eth999 boss.utah.geniracks.net (155.98.34.4) at 00:01:ac:11:02:01 [ether] on eth999 [chaos@virt1 ~]$ ssh 10.10.1.2 Last login: Fri May 18 13:35:41 2012 from capybara.bbn.com [chaos@virt2 ~]$ /sbin/arp -a virt1-virt1-virt2-0 (10.10.1.1) at 82:01:0a:0a:01:01 [ether] on mv2.2 boss.utah.geniracks.net (155.98.34.4) at 00:01:ac:11:05:02 [ether] on eth999 pc5.utah.geniracks.net (155.98.34.15) at 00:01:ac:11:05:02 [ether] on eth999 }}} * Delete the sliver: {{{ jericho,[~],05:53(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am deletesliver ecgtest2 }}} == Step 2 (prep): start a bare metal node experiment and terminate it == * An experimenter requests an experiment from the InstaGENI AM containing two rack hosts and a dataplane VLAN * The experimenter logs into a host, and sends dataplane traffic * The experimenter terminates the experiment === Results of testing: 2012-05-18 === * Here is an rspec for two physical nodes with no OS specified: {{{ jericho,[~],05:39(0)$ cat IG-MON-nodes-D.rspec }}} * Create a slice for this experiment: {{{ omni createslice ecgtest3 }}} * Create a sliver using this rspec: {{{ jericho,[~],05:40(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am createsliver ecgtest3 ~/IG-MON-nodes-D.rspec INFO:omni:Loading config file /home/chaos/omni/omni_pgeni INFO:omni:Using control framework pg INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest3 expires on 2012-05-19 10:40:34 UTC INFO:omni:Creating sliver(s) from rspec file /home/chaos/IG-MON-nodes-D.rspec for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest3 INFO:omni:Asked http://www.utah.geniracks.net/protogeni/xmlrpc/am to reserve resources. Result: INFO:omni: INFO:omni: INFO:omni: INFO:omni: ------------------------------------------------------------ INFO:omni: Completed createsliver: Options as run: aggregate: http://www.utah.geniracks.net/protogeni/xmlrpc/am configfile: /home/chaos/omni/omni_pgeni framework: pg native: True Args: createsliver ecgtest3 /home/chaos/IG-MON-nodes-D.rspec Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest3 expires on 2012-05-19 10:40:34 UTC Reserved resources on http://www.utah.geniracks.net/protogeni/xmlrpc/am. INFO:omni: ============================================================ }}} * According to sliverstatus, my nodes are pc1 and pc4. * Login to pc1.utah.geniracks.net with agent forwarding * Find that it is phys2 and has eth1=10.10.1.2 * Find a big file: {{{ [chaos@phys2 ~]$ ls -l /usr/lib/locale/locale-archive -rw-r--r-- 1 root root 104997424 Aug 10 2011 /usr/lib/locale/locale-archive }}} * Copy the big file over the dataplane in a loop: {{{ [chaos@phys2 ~]$ while [ 1 ]; do scp /usr/lib/locale/locale-archive 10.10.1.1:/tmp/; done locale-archive 100% 100MB 50.1MB/s 00:02 locale-archive 100% 100MB 50.1MB/s 00:02 locale-archive 100% 100MB 50.1MB/s 00:02 ... }}} * After a bit of that, delete the sliver: {{{ jericho,[~],05:53(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am deletesliver ecgtest3 }}} == Step 3 (prep): start an experiment and leave it running == * An experimenter requests an experiment from the InstaGENI AM containing two rack VMs connected by an OpenFlow-controlled dataplane VLAN * The experimenter configures a simple OpenFlow controller to pass dataplane traffic between the VMs * The experimenter logs into one VM, and begins sending a continuous stream of dataplane traffic === Results of testing: 2012-05-18 === ''Note: per discussion on instageni-design on 2012-05-17, request of an OpenFlow-controlled dataplane is not yet possible. So this will need to be retested once OpenFlow control is available.'' * Not creating a new experiment here, but instead reusing my experiment, ecgtest, created yesterday for `IG-MON-1`. * Login to pc3, whose eth1 is 10.10.1.1 * Make a bigger dataplane file by catting the other a few times, then start copying it around again: {{{ [chaos@phys1 ~]$ ls -l /tmp/locale-archive -rw-r--r-- 1 chaos pgeni-gpolab-bbn 3149922720 May 18 04:14 /tmp/locale-archive while [ 1 ]; do scp /tmp/locale-archive 10.10.1.2:/tmp/; done }}} * This lets me see that the first instance of the file copy takes about a minute, at about 55MBps: {{{ [chaos@phys1 ~]$ while [ 1 ]; do scp /tmp/locale-archive 10.10.1.2:/tmp/; done locale-archive 100% 3004MB 55.6MB/s 00:54 }}} * Leave this running. == Step 4: view running VMs == '''Using:''' * On boss, use AM state, logs, or administrator interfaces to determine: * What experiments are running right now * How many VMs are allocated for those experiments * Which OpenVZ node is each VM running on * On OpenVZ nodes, use system state, logs, or administrative interfaces to determine what VMs are running right now, and look at any available configuration or logs of each. '''Verify:''' * A site administrator can determine what experiments are running on the InstaGENI AM * A site administrator can determine the mapping of VMs to active experiments * A site administrator can view some state of running VMs on the VM server === Results of testing: 2012-05-18 === == Step 5: get information about terminated experiments == '''Using:''' * On boss, use AM state, logs, or administrator interfaces to find evidence of the two terminated experiments. * Determine how many other experiments were run in the past day. * Determine which GENI user created each of the terminated experiments. * Determine the mapping of experiments to OpenVZ or exclusive hosts for each of the terminated experiments. * Determine the control and dataplane MAC addresses assigned to each VM in each terminated experiment. * Determine any IP addresses assigned by InstaGENI to each VM in each terminated experiment. '''Verify:''' * A site administrator can get ownership and resource allocation information for recently-terminated experiments which used OpenVZ VMs. * A site administrator can get ownership and resource allocation information for recently-terminated experiments which used physical hosts. * A site administrator can get information about MAC addresses and IP addresses used by recently-terminated experiments. == Step 6: get !OpenFlow state information == '''Using:''' * On the dataplane switch, get a list of controllers, and see if any additional controllers are serving experiments. * On the flowvisor VM, get a list of active FV slices from the !FlowVisor * On the FOAM VM, get a list of active slivers from FOAM * Use FV, FOAM, or the switch to list the flowspace of a running !OpenFlow experiment. '''Verify:''' * A site administrator can get information about the !OpenFlow resources used by running experiments. * When an !OpenFlow experiment is started by InstaGENI, a new controller is added directly to the switch. * No new !FlowVisor slices are added for new !OpenFlow experiments started by InstaGENI. * No new FOAM slivers are added for new !OpenFlow experiments started by InstaGENI. == Step 7: verify MAC addresses on the rack dataplane switch == '''Using:''' * Establish a privileged login to the dataplane switch * Obtain a list of the full MAC address table of the switch * On boss and the experimental hosts, use available data sources to determine which host or VM owns each MAC address. '''Verify:''' * It is possible to identify and classify every MAC address visible on the switch == Step 8: verify active dataplane traffic == '''Using:''' * Establish a privileged login to the dataplane switch * Based on the information from Step 7, determine which interfaces are carrying traffic between the experimental VMs * Collect interface counters for those interfaces over a period of 10 minutes * Estimate the rate at which the experiment is sending traffic '''Verify:''' * The switch reports interface counters, and an administrator can obtain plausible estimates of dataplane traffic quantities by looking at them.