Changes between Version 4 and Version 5 of PlasticSlices/BaselineEvaluation/Baseline6Details


Ignore:
Timestamp:
07/11/11 15:48:50 (11 years ago)
Author:
Josh Smift
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PlasticSlices/BaselineEvaluation/Baseline6Details

    v4 v5  
    55The raw logs of each experiment are at http://www.gpolab.bbn.com/plastic-slices/baseline-logs/baseline-6/.
    66
     7This baseline was plagued by a variety of outages. Here's a timeline of when we observed problems:
     8
     9 * 2011-06-09 @ 18:10: All experiments started.
     10
     11 * 2011-06-10 @ 17:15: We observed problems with all connections involving TCP and Stanford. We believe that load on the Stanford !FlowVisor was the underlying cause.
     12
     13 * 2011-06-10 @ 17:30: A critical VM server at BBN needed to be rebooted, which included the server where the controllers for the slices were running. In the course of that, the Stanford and Internet2 !FlowVisors died completely. We haven't yet tried to reproduce this; we don't know of any reason why shutting down an experimenter's controller (even in a sudden emergency fashion) should affect upstream !FlowVisors.
     14
     15 * 2011-06-10 @ 21:00: The !FlowVisors at Indiana and NLR also crashed, when the VM server that hosts them crashed. This again seems oddly coincidental, but there's no obvious causal chain. We discovered in here that an unrelated test slice at BBN had partially cross-connected VLAN 3715 and 3716, which created a loop when the Indiana switches failed open rather than closed. All of these problems were eventually corrected.
     16
     17 * 2011-06-11 @ 00:00: The NLR and Internet2 !FlowVisors came back online, and we were able to revive all of the experiments, except for a few involving Indiana.
     18
     19 * 2011-06-12 @ 01:45: The Stanford !FlowVisor was down again.
     20
     21 * 2011-06-12 @ 13:30: One of the Rutgers MyPLC nodes (orbitplc2) rebooted, and lost its static ARP table, killing all of the experiments that used that node.
     22
     23 * 2011-06-13 @ 12:30: Stanford upgraded their !FlowVisor to a specific Git commit that addressed bugs that were affecting them, and Indiana's switch/!FlowVisor configuration problems were corrected. orbitpclc2 still didn't have its static ARP table.
     24
     25 * 2011-06-13 @ 15:15: orbitplc2's static ARP table was fixed; everything was running smoothly again at this point (for the first time in three days).
     26
     27 * 2011-06-14 @ 11:30: We changed the controller configuration to not include a !FlowVisor between the switches and the controllers; shortly thereafter, Stanford's !FlowVisor crashed, and all traffic (including e.g. between hosts at BBN) stopped flowing reliably. As before, we don't see any reason why one of these events should have caused the others, but the timing is very suspicious. Once Stanford restarted their !FlowVisor, traffic within BBN (and elsewhere) returned to normal, and we were able to revive all of the experiments.
     28
     29 * 2011-06-14 @ 17:20: A routing configuration in Internet2 cut off the I2 OpenFlow switches from the I2 !FlowVisor, which was running in Indiana University testlab IP space; we worked with I2 engineers to change the I2 switches to point to a new OpenFlow software stack in I2 production IP space (which we had planned to do at some point anyway, but this proved an opportune time). All traffic involving Internet2 was down at this point.
     30
     31 * 2011-06-15 @ 14:10: The Internet2 move was complete, I2 traffic resumed flowing, and things continued to run smoothly for the brief remainder of the baseline, once we revived the experiments.
     32
     33 * 2011-06-15 @ 19:30: all experiments shut down. We discover shortly thereafter that we've lost some log data (see below for more details).
     34
     35That isn't a comprehensive record of when things actually went down and came back; we plan to add that. We believe that these outages explain most (if not all) of the anomalies in the results below.
     36
    737= plastic-101 =
    838
    9 [PlasticSlices/Experiments#GigaPing GigaPing], using count=100000, and this table of client/server pairs:
     39[PlasticSlices/Experiments#SteadyPing SteadyPing], using interval=.006, and this table of client/server pairs:
    1040
    1141|| '''client'''                   || '''server'''                   || '''server address''' ||
     
    4272(no output)
    4373}}}
     74
    4475planetlab4.clemson.edu:
    4576
     
    114145== Analysis ==
    115146
    116 ''(forthcoming)''
     147We lost the final stats for BBN - Clemson, Clemson - GT, Washinton - Wisconsin, and Wisconsin - BBN, and haven't gone back and tried to reconstruct them from the full logs.
    117148
    118149= plastic-102 =
    119150
    120 [PlasticSlices/Experiments#GigaPing GigaPing], using count=100000, and this table of client/server pairs:
     151[PlasticSlices/Experiments#SteadyPing SteadyPing], using interval=.006, and this table of client/server pairs:
    121152
    122153|| '''client'''                   || '''server'''                   || '''server address''' ||
     
    206237== Analysis ==
    207238
    208 ''(forthcoming)''
     239We lost the final stats for BBN - Clemson, Clemson - GT, and Indiana - Rutgers, Stanford - Washinton, Washington - Wisconsin, and Wisconsin - BBN, and haven't gone back and tried to reconstruct them from the full logs.
    209240
    210241= plastic-103 =
    211242
    212 [PlasticSlices/Experiments#GigaPerfTCP GigaPerf TCP], using port=5103, size=350, and this table of client/server pairs:
     243[PlasticSlices/Experiments#SteadyPerfTCP SteadyPerf TCP], using port=5103, time=518400, and this table of client/server pairs:
    213244
    214245|| '''client'''                   || '''server'''                   || '''server address''' ||
     
    579610= plastic-104 =
    580611
    581 [PlasticSlices/Experiments#GigaPerfUDP GigaPerf UDP], using port=5104, size=500, rate=100, and this table of client/server pairs:
     612[PlasticSlices/Experiments#SteadyPerfUDP SteadyPerf UDP], using port=5104, time=518400, rate=3, and this table of client/server pairs:
    582613
    583614|| '''client'''                   || '''server'''                   || '''server address''' ||
     
    739770= plastic-105 =
    740771
    741 [PlasticSlices/Experiments#GigaPerfTCP GigaPerf TCP], using port=5105, size=250, and this table of client/server pairs:
     772[PlasticSlices/Experiments#SteadyPerfTCP SteadyPerf TCP], using port=5105, time=518400, and this table of client/server pairs:
    742773
    743774|| '''client'''                   || '''server'''                   || '''server address''' ||
     
    945976= plastic-106 =
    946977
    947 [PlasticSlices/Experiments#GigaPerfUDP GigaPerf UDP], using port=5106, size=500, rate=100, and this table of client/server pairs:
     978[PlasticSlices/Experiments#SteadyPerfUDP SteadyPerf UDP], using port=5106, time=518400, rate=3, and this table of client/server pairs:
    948979
    949980|| '''client'''                   || '''server'''                   || '''server address''' ||
     
    10541085= plastic-107 =
    10551086
    1056 [PlasticSlices/Experiments#GigaWeb GigaWeb], using count=30, port=4107, file=substrate.doc, md5sum=d4fcf71833327fbfef98be09deef8bfb, and this table of client/server pairs:
     1087[PlasticSlices/Experiments#SteadyWeb SteadyWeb], using port=4107, file=substrate.doc, md5sum=d4fcf71833327fbfef98be09deef8bfb, and this table of client/server pairs:
    10571088
    10581089|| '''client'''                   || '''server'''                   || '''server address''' ||
     
    11531184= plastic-108 =
    11541185
    1155 [PlasticSlices/Experiments#GigaWeb GigaWeb], using count=40, port=4108, file=substrate.doc, md5sum=d4fcf71833327fbfef98be09deef8bfb, and this table of client/server pairs:
     1186[PlasticSlices/Experiments#SteadyWeb SteadyWeb], using port=4108, file=substrate.doc, md5sum=d4fcf71833327fbfef98be09deef8bfb, and this table of client/server pairs:
    11561187
    11571188|| '''client'''                   || '''server'''                   || '''server address''' ||
     
    12521283= plastic-109 =
    12531284
    1254 [PlasticSlices/Experiments#GigaNetcat GigaNetcat], using count=25, port=6109, file=substrate.doc, and this table of client/server pairs:
     1285[PlasticSlices/Experiments#SteadyNetcat SteadyNetcat], using port=6109, file=substrate.doc, md5sum=d4fcf71833327fbfef98be09deef8bfb, and this table of client/server pairs:
    12551286
    12561287|| '''client'''                   || '''server'''                   || '''server address''' ||
     
    13551386= plastic-110 =
    13561387
    1357 [PlasticSlices/Experiments#GigaNetcat GigaNetcat], using count=25, port=6110, file=substrate.doc, and this table of client/server pairs:
     1388[PlasticSlices/Experiments#SteadyNetcat SteadyNetcat], using port=6110, file=substrate.doc, md5sum=d4fcf71833327fbfef98be09deef8bfb, and this table of client/server pairs:
    13581389
    13591390|| '''client'''                   || '''server'''                   || '''server address''' ||