Changes between Version 4 and Version 5 of GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-MON-3


Ignore:
Timestamp:
05/18/12 07:57:59 (12 years ago)
Author:
chaos@bbn.com
Comment:

add results of partial testing on the morning of 2012-05-18

Legend:

Unmodified
Added
Removed
Modified
  • GENIRacksHome/InstageniRacks/AcceptanceTestStatus/IG-MON-3

    v4 v5  
    1919== Status of test ==
    2020
    21 || '''Step''' || '''State'''               || '''Date completed''' || '''Tickets''' || '''Comments'''                                                     ||
    22 || 1          ||                           ||                      ||               || ready to test                                                      ||
    23 || 2          ||                           ||                      ||               || ready to test                                                      ||
    24 || 3          || [[Color(orange,Blocked)]] ||                      ||               || blocked on ability to request OpenFlow resources from InstaGENI AM ||
    25 || 4          || [[Color(orange,Blocked)]] ||                      ||               || blocked on 3                                                       ||
    26 || 5          || [[Color(orange,Blocked)]] ||                      ||               || blocked on 3                                                       ||
    27 || 6          || [[Color(orange,Blocked)]] ||                      ||               || blocked on 3, availability of FOAM                                 ||
    28 || 7          || [[Color(orange,Blocked)]] ||                      ||               || blocked on 3                                                       ||
    29 || 8          || [[Color(orange,Blocked)]] ||                      ||               || blocked on 3                                                       ||
     21|| '''Step''' || '''State'''                 || '''Date completed''' || '''Tickets'''  || '''Comments'''                                                          ||
     22|| 1          || [[Color(yellow,Completed)]] ||                      ||                || needs retesting when 3 is retested                                      ||
     23|| 2          ||                             ||                      ||                || needs retesting when 3 is retested                                      ||
     24|| 3          || [[Color(yellow,Completed)]] ||                      ||                || needs retesting once OpenFlow resources are available from InstaGENI AM ||
     25|| 4          || [[Color(orange,Blocked)]]   ||                      || instaticket:26 || blocked on resolution of MAC reporting issue                            ||
     26|| 5          || [[Color(orange,Blocked)]]   ||                      ||                || blocked on 3                                                            ||
     27|| 6          || [[Color(orange,Blocked)]]   ||                      ||                || blocked on 3, availability of FOAM                                      ||
     28|| 7          || [[Color(orange,Blocked)]]   ||                      ||                || blocked on 3                                                            ||
     29|| 8          || [[Color(orange,Blocked)]]   ||                      ||                || blocked on 3                                                            ||
    3030== High-level description from test plan ==
    3131
     
    345345=== Results of testing: 2012-05-18 ===
    346346
     347 * Per-host view of current state:
     348   * From [https://boss.utah.geniracks.net/nodecontrol_list.php3?showtype=dl360] in red dot mode, i can once again see that pc3 is allocated as phys1 to `pgeni-gpolab-bbn-com/ecgtest`.
     349   * I can see that pc5 is configured as an OpenVZ shared host, but i can't see how many experiments it is running.
     350 * Per-experiment view of current state:
     351   * Browse to [https://boss.utah.geniracks.net/genislices.php] and find one slice running on the Component Manager:
     352{{{
     353ID   HRN                         Created             Expires
     354362  bbn-pgeni.ecgtest (ecgtest) 2012-05-17 08:12:37 2012-05-18 18:00:00
     355}}}
     356   * Click `(ecgtest)` to view the details of that experiment at [https://boss.utah.geniracks.net/showexp.php3?experiment=363#details].
     357   * This shows what nodes it's using, including that its VM has been put on pc5:
     358{{{
     359Physical Node Mapping:
     360ID              Type         OS              Physical
     361--------------- ------------ --------------- ------------
     362phys1           dl360        FEDORA15-STD    pc3
     363virt1           pcvm         OPENVZ-STD      pcvm5-1 (pc5)
     364}}}
     365   * Here are some other interesting things:
     366{{{
     367IP Port allocation:
     368Low             High
     369--------------- ------------
     37030000           30255
     371
     372SSHD Port allocation ('ssh -p portnum'):
     373ID              Port       SSH command
     374--------------- ---------- ----------------------
     375
     376Physical Lan/Link Mapping:
     377ID              Member          IP              MAC                  NodeID
     378--------------- --------------- --------------- -------------------- ---------
     379phys1-virt1-0   phys1:0         10.10.1.1       e8:39:35:b1:4e:8a    pc3
     380                                                1/1 <-> 1/34         procurve2
     381phys1-virt1-0   virt1:0         10.10.1.2                            pcvm5-1
     382}}}
     383   * That last one is mysterious, because the experimenter's sliverstatus command contains:
     384{{{
     385  { 'attributes':
     386    { 'client_id': 'phys1:if0',
     387      'component_id': 'urn:publicid:IDN+utah.geniracks.net+interface+pc3:eth1',
     388      'mac_address': 'e83935b14e8a',
     389...
     390  { 'attributes':
     391    { 'client_id': 'virt1:if0',
     392      'component_id': 'urn:publicid:IDN+utah.geniracks.net+interface+pc5:eth1',
     393      'mac_address': '00000a0a0102',
     394}}}
     395   * So i think it should be possible for the admin interface to know that virtual mac address too.
     396   * Huh, but also, that mac address reported in sliverstatus is in fact wrong.  Let me summarize:
     397{{{
     398MAC addrs reported for phys1:0 == 10.10.1.1
     399  E8:39:35:B1:4E:8A: from /sbin/ifconfig eth1 run on phys1 (authoritative)
     400  e83935b14e8a:      from sliverstatus as experimenter (correct)
     401  e8:39:35:b1:4e:8a: from: https://boss.utah.geniracks.net/showexp.php3?experiment=363#details (correct)
     402
     403MAC addrs reported for virt1:0 == 10.10.1.2
     404  82:01:0A:0A:01:02: from /sbin/ifconfig mv1.1 run on virt1 (authoritative)
     405  00000a0a0102:      from sliverstatus as experimenter (incorrect: first four digits are wrong)
     406  -                : from https://boss.utah.geniracks.net/showexp.php3?experiment=363#details (not reported)
     407}}}
     408   I opened [instaticket:26] for this issue.
     409 * Now, use the OpenVZ host itself to view activity:
     410   * As an admin, login to pc5.utah.geniracks.net
     411   * Poking around, i was led to a couple of prospective data sources:
     412     * Logs in `/var/emulab`
     413     * The `vzctl` RPM, containing a number of OpenVZ control commands
     414   * The latter seems to give a list of running VMs easily:
     415{{{
     416vhost1,[/var/emulab],05:00(1)$ sudo vzlist -a
     417      CTID      NPROC STATUS    IP_ADDR         HOSTNAME
     418         1         15 running   -               virt1.ecgtest.pgeni-gpolab-bbn-com.utah.geniracks.net
     419}}}
     420   * I also see a command to figure out which container is running a given PID.  Suppose i run top and am concerned about an sshd process chewing up all system CPU:
     421{{{
     422    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND         
     423  51817 20001     20   0  116m 3780  872 R 94.4  0.0   0:05.74 sshd             
     424}}}
     425   * Since the user is numeric, i can assume this process is probably running in a container, so find out which one:
     426{{{
     427vhost1,[/var/emulab],05:05(0)$ sudo vzpid 51766
     428Pid     CTID    Name
     42951766   1       sshd
     430chaos      51804   51163  0 05:04 pts/0    00:00:00 grep --color=auto ssh
     431}}}
     432   * and then look up the container info as above.
     433   * The files in `/var/emulab` give details about how each experiment was created.  In particular:
     434{{{
     435Information about experiment startup attributes:
     436  /var/emulab/boot/tmcc.pcvm5-1/
     437  /var/emulab/boot/tmcc.pcvm5-2/
     438
     439Logs of experiment progress:
     440  /var/emulab/logs/tbvnode-pcvm5-1.log
     441  /var/emulab/logs/tbvnode-pcvm5-2.log
     442  /var/emulab/logs/tmccproxy.pcvm5-1.log
     443  /var/emulab/logs/tmccproxy.pcvm5-2.log
     444}}}
     445   * These may be useful for running and terminated experiments ''if'' the context IDs are unique.
     446
     447==== Side test: are experiment context IDs unique over time on an OpenVZ server? ====
     448
     449 * rspec to create a single OpenVZ container:
     450{{{
     451jericho,[~],07:12(0)$ cat IG-MON-nodes-E.rspec
     452<?xml version="1.0" encoding="UTF-8"?>
     453<!-- This rspec will reserve one openvz node.  It should work on any
     454     Emulab which has nodes available and supports OpenVZ.  -->
     455<rspec xmlns="http://www.geni.net/resources/rspec/3"
     456       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     457       xsi:schemaLocation="http://www.geni.net/resources/rspec/3
     458                           http://www.geni.net/resources/rspec/3/request.xsd"
     459       type="request">
     460
     461  <node client_id="virt1" exclusive="false">
     462    <sliver_type name="emulab-openvz" />
     463  </node>
     464</rspec>
     465}}}
     466 * use existing slice `ecgtest2` to create a sliver:
     467{{{
     468jericho,[~],07:13(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am
     469createsliver ecgtest2 IG-MON-nodes-E.rspec
     470INFO:omni:Loading config file /home/chaos/omni/omni_pgeni
     471INFO:omni:Using control framework pg
     472INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires within 1 day on 2012-05-19 10:30:51 UTC
     473INFO:omni:Creating sliver(s) from rspec file IG-MON-nodes-E.rspec for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2
     474INFO:omni:Asked http://www.utah.geniracks.net/protogeni/xmlrpc/am to reserve resources. Result:
     475INFO:omni:<?xml version="1.0" ?>
     476INFO:omni:<!-- Reserved resources for:
     477        Slice: ecgtest2
     478        At AM:
     479        URL: http://www.utah.geniracks.net/protogeni/xmlrpc/am
     480 -->
     481INFO:omni:<rspec type="manifest" xmlns="http://www.geni.net/resources/rspec/3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.geni.net/resources/rspec/3                            http://www.geni.net/resources/rspec/3/manifest.xsd"> 
     482
     483    <node client_id="virt1" component_id="urn:publicid:IDN+utah.geniracks.net+node+pc5" component_manager_id="urn:publicid:IDN+utah.geniracks.net+authority+cm" exclusive="false" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+384">   
     484        <sliver_type name="emulab-openvz"/>   
     485      <rs:vnode name="pcvm5-2" xmlns:rs="http://www.protogeni.net/resources/rspec/ext/emulab/1"/>    <host name="virt1.ecgtest2.pgeni-gpolab-bbn-com.utah.geniracks.net"/>    <services>      <login authentication="ssh-keys" hostname="pc5.utah.geniracks.net" port="30266" username="chaos"/>    </services>  </node> 
     486</rspec>
     487INFO:omni: ------------------------------------------------------------
     488INFO:omni: Completed createsliver:
     489
     490  Options as run:
     491                aggregate: http://www.utah.geniracks.net/protogeni/xmlrpc/am
     492                configfile: /home/chaos/omni/omni_pgeni
     493                framework: pg
     494                native: True
     495
     496  Args: createsliver ecgtest2 IG-MON-nodes-E.rspec
     497
     498  Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires within 1 day(s) on 2012-05-19 10:30:51 UTC
     499Reserved resources on http://www.utah.geniracks.net/protogeni/xmlrpc/am. 
     500INFO:omni: ============================================================
     501}}}
     502
     503Summary: this means that VM IDs are reused.
     504
     505At this point, i was going to gather more information about logs, when the Utah rack became totally unavailable: i was no longer able to use my shell sessions to any machines in the rack, and got ping timeouts to boss.
     506
     507After about 8 minutes, things became available again.  I went looking for logs of my dataplane file copy activity to see whether the dataplane had been interrupted, at which point i found out that sshd on the dataplane does not appear to be logged anywhere, either in `/var/log` within the container or on pc5 itself.  That's not a rack requirement, but it seems non-ideal for experimenters.  I opened [instaticket:27] to report it.
     508
    347509== Step 5: get information about terminated experiments ==
    348510