wiki:PlasticSlices/FinalReport

Version 3 (modified by hdempsey@bbn.com, 11 years ago) (diff)

typo and add link to wiki in abstract

This is the final report on the conclusions and results of the Plastic Slices project, which ran ten GENI slices continously over a period of three months, in order to gain experience managing and operating production-quality GENI resources, and to discover and record issues that early experimenters are likely to encounter. Our conclusions are that mesoscale GENI is generally ready for operations with more experimenters, that GENI software is mostly ready to use in a production-quality environment, that GENI experiments are somewhat isolated from each other, and that GENI is easy to use at a simple level but more challenging to use for more complex purposes. All of these areas need continuing improvement, and we make suggestions for ways to do that. The report also includes more details about the environment, experiments, baselines, and tools we used, and further discussion of some of the results that highlight the ways in which GENI is a unique research environment. More details are available at http://groups.geni.net/geni/wiki/PlasticSlices.

ThumbImage(TangoGENI:OF-VLAN3715.jpg, thumb=800)?

(A diagram of one of the core VLANs in the current mesoscale GENI deployment.)

1. Motivation

The Plastic Slices project was the first project in the Meso-Scale Campus Experiments plan, for evolving meso-scale experiments between now and GEC 12. The overarching purpose was to set shared campus goals, and to agree on a schedule and resources to support experiments. This first project was an effort to run ten (or more) GENI slices continuously for multiple months -- not merely for the sake of having ten slices running, but to gain experience managing and operating production-quality GENI resources.

During Spiral 3, campuses have been expanding and updating their OpenFlow deployments. All campuses have agreed to run a GENI AM API compliant aggregate manager, and to support at least four multi-site GENI experiments, by GEC 12. This lays the foundation for GENI to continuously support and manage multiple simultaneous slices that contain resources from GENI AM API compliant aggregates with multiple cross-country layer 2 data paths. (http://groups.geni.net/geni/wiki/GeniApi has more information about what it means to be a "GENI AM API compliant aggregate.") This campus infrastructure can support the transition from building GENI to using GENI continuously in the meso-scale infrastructure. Longer-term, it can also support the transition to at-scale production use in 2012, as originally proposed by each campus.

This project investigated technical issues associated with managing multiple slices for long-term experiments, and also tried out early operations procedures for supporting those experiments.

2. Objectives

A high-level goal of this project was to run ten GENI slices continuously over a period of several months. This wasn't just an end in itself, though, but rather a means for us to accomplish two other objectives.

First, we wanted to gain experience managing and operating production-quality mesoscale GENI resources, at various levels:

  • Campuses and backbone providers managing their local resources
  • The GMOC performing meta-operations activities
  • Experimenters running experiments (a role that the GPO filled for this project)

Additionally, we wanted to discover and record issues that early experimenters are likely to encounter, such as:

  • Operational availability and uptime
  • Software-related issues, both with user tools and with aggregate software
  • Experiment isolation, i.e. preventing experiments from interfering with each other
  • Ease of use

Everything we did is documented on the GENI wiki, and is reproducible by others, modulo changes in the resources available from sites, versions of software deployed at each site, etc.

3. Environment

The environment we used for the project began with two engineered VLANs (VLAN 3715 and 3716), which were provisioned through the Internet2 and NLR backbones, through various regional networks, and through to the mesoscale deployments on eight campuses. Each of I2 and NLR provided OpenFlow network resources, and each campuses provided OpenFlow network and MyPLC compute resources where we ran the experiments. The GENI Meta-Operations Center (GMOC) collected monitoring data, and provided OpenFlow support to campuses. We ran ten GENI slices, using different subsets of the eight campuses, and used those slices to run five artificial experiments on two slices each. We used those experiments in a series of eight baselines, with traffic flows that were representative of real GENI experimenters. All of these resources were allocated with the Omni command-line tool via the GENI API, and we then used simplistic command-line tools to manage the slices and experiments. The various operators used draft versions of GENI operational procedures, and communicated (with each other and with experimenters) via GENI mailing lists and chatrooms.

We had very much hoped to include real experiments and/or real experimenters in the project by the end, but we concluded that there weren't any non-artificial experiments ready to run at either of the two interim points when we reviewed this question during the project. This is one of the central things we intend to accomplish in our follow-on work.

4. Conclusions

We focused our conclusions on the four areas of concern about issues that experimenters were likely to encounter.

4.1. Operational availability

Is mesoscale GENI ready for operations with more experimenters? Our conclusion is that it is, but with some caveats.

We found that resource operators need to communicate more than they were, both with each other about their plans and about issues requiring coordination between sites, and with experimenters about outages. This improved over the course of the project, but further improvement is still needed.

We also found that it was difficult to identify the relationships between the various pieces the make up GENI. Given a resource, it's still hard to determine what sliver it's a part of, and what slice that sliver is a part of, and what user is responsible for that slice. We used naming conventions (e.g. all slices were named "plastic-" plus a number, all OpenFlow slivers started with the hostname of the system running the OpenFlow controller, all the slivers in a slice included the number of the slice they were a part of, etc) to make this easier, but real experimenters are unlikely to be so consistent (and certainly aren't required to be), so this solution won't scale well, especially to dozens or hundreds of experimenters.

We also concluded that resources weren't as available as they should be in a production-quality environment. Aggregate managers were sometimes down unexpectedly, and the software that runs them is still undesirably buggy. (On the plus side, the software developers we worked with were very responsive to our bug reports.) Also, operators sometimes found it difficult to identify which versions of software they were running, or to determine who was running a newer version, as some software is still largely identified by Git hash tags rather than by sequential release numbers.

We identified some specific ideas for improvement:

  • We'd like to work with campuses to set and measure uptime targets for the aggregates they support. We understand that there will be variations from campus to campus, potentially fairly wide variations, and that's not fundamentally a problem, as long as the targets are well-documented and publicized, so that experimenters can set their expectations accurately.
  • Operators should continue to give feedback and input to software developers on features and priorites.
  • We hope that campus staff will reach out to prospective new experimenters at their campuses, and encourage those experimenters to contact the GPO's Experimenter Support group (via help@geni.net) about getting started with GENI.

4.2. Software

Is mesoscale GENI software ready to use in a production-quality environment? We concluded that it is, but again with some caveats.

Much of the software that we rely upon is still new, and at an early stage of its life cycle. This is unavoidable to some extent, and improves with time. Similarly, GENI may be the first large-scale deployment for some software, whether for the software as a whole, or for new versions as they come out. It may be difficult for a developer to test their software adequately in their development environment, which is unlikely to be as large, geographically diverse, and otherwise heterogeneous, as the whole of GENI.

Some ideas for improvement:

  • GENI racks (still on the horizon) will make production environments more similar, giving developers a somewhat smaller target, and reducing some of the difficulty in developing for a large heterogeneous environment.
  • InCNTRE, a Software Defined Networking initiative at Indiana University, will emphasize interoperability and commercial uses of OpenFlow, which should help to make OpenFlow in particular more mature.
  • Software developers could use GENI slices and resources to test their software in ways that aren't feasible within their development environments, such as with hardware from multiple vendors, over long-distance network links, etc -- using GENI resources before the software is fully deployed to the entirety of GENI.
  • More professional software engineers are getting involved with GENI, taking on software development tasks that had previously been handled by researchers.

4.3. Isolation

Are mesoscale GENI experiments isolated from each other? We concluded that they are only somewhat.

Many resources in GENI are still shared between mulitiple experiments, with only soft (e.g. advisory) partitions between them. We found many examples of this:

  • MyPLC plnodes are virtual machines on shared hardware, so an experiment that goes haywire on one VM can starve other VMs on that hardware for resources.
  • The FlowVisor flowspace is partitioned between experiments, but the flowspace as a whole is shared, so issues that affect the entire flowspace (such as the number of rules, which can reach into the thousands or tens of thousands) can make it difficult or impossible for any experiment to manipulate the flowspace.
  • Topology problems with one experiment can affect other experiments, even experiments which have a every simple toplogy themselves, e.g. by creating a broadcast storm, leaking traffic across VLANs unexpectedly, etc.
  • All of the bandwidth in GENI is shared, with no easy ways to set either hard limits on maximum use, or to request a minimum requirement.

This is already an active area of work within GENI, but we had some specific ideas for improving the situation:

  • We'll encourage more information-sharing between experimenters, to help them prevent their experiments from going haywire in the first place.
  • When that isn't enough, the GPO and GMOC should work to develop better procedures to handle communication between operators, and with experimenters, when there are issues related to isolation.
  • Additional hard limits on resource usage, like QoS in the OpenFlow protocol and in backbone hardware, are the best way to truly ensure that experiments are isolated from each other.

4.4. Ease of use

Is mesoscale GENI easy for experimenters to use? We found that it is in some ways, but not in others.

Getting started, and doing very simple things, is in fact fairly easy: There are few barriers to entry, and they're generally low. However, experimenter tools to enable more sophisticated experiments are only just now interoperating with GENI -- although this is improving rapidly, and in fact improved over the course of the project. There are also specific usability concerns with OpenFlow, where the Expedient Opt-In Manager requires manual approval from an operator at every OpenFlow aggregate, whenever any OpenFlow resource is allocated; this can be daunting to new experimenters, and even for experienced folks, the need for manual approval can cause significant delays.

This is another area where work is already active within GENI (most of the Experimenter track at GEC 11 focused on tools), but we again have some ideas for additional improvements:

  • We'd like to encourage experimenters to try out newly-integrated tools, and actively request bug fixes and features. Experimenter demand is part of a very positive feedback loop, encouraging developers to create better tools, which in turn makes GENI more usable to more experimenters.
  • Similar to the idea of using GENI slices and resources to test aggregate software, tool developers may be able to use GENI resources to test their tools on a larger scale than their own development environments.
  • Work on the general issue of GENI stitching may help improve the OpenFlow opt-in situation.

5. Backbone resources

The Plastice Slices project used the GENI network core in Internet2 and NLR, which includes two OpenFlow-controlled VLANs (3715 and 3716) on a total of ten switches (five in each of I2 and NLR). The OpenFlow network in each provider was managed by an Expedient OpenFlow aggregate manager.

http://groups.geni.net/geni/wiki/NetworkCore has more information about the GENI network core and how to use it.

6. Campus resources

Each campus had a private OpenFlow-controlled VLAN (typically VLAN 1750), which was cross-connected to the two backbone VLANs (which were typically not OpenFlow-controlled on campuses), and managed by an Expedient OpenFlow aggregate manager. Each campus also had a MyPLC aggregate, with two plnodes (or in some cases more), each of which included a dataplane interfaces connected to VLAN 1750 (and others).

This network topology is described in more detail at http://groups.geni.net/geni/wiki/NetworkCore, and http://groups.geni.net/geni/wiki/TangoGENI#ParticipatingAggregates has links to the aggregate info page for each aggregate.

7. Monitoring

ThumbImage(PlasticSlices/BaselineEvaluation/Baseline5Traffic:baseline5-txbytes.png, thumb=500)? ThumbImage(PlasticSlices/BaselineEvaluation/Baseline5Traffic:baseline5-rxbytes.png, thumb=500)?

(Graphs of bytes sent (TX) and bytes received (TX) during Baseline 5.)

During the project, all mesoscale campuses were configured to send monitoring data to the GMOC. We found that sites had resources that didn't use NTP, which was essential for correlating data between sites. The GMOC offers an interface called SNAPP for browising the data that they collect, visible at http://gmoc-db.grnoc.iu.edu/api-demo/. In addition, the GMOC offers an API which anyone can use to download the raw data that they collect, analyze it, graph it, etc, which we used to create some local graphs that we found useful. They collect data of interest to both operators and experimenters, at various levels of granularity, and can create some per-slice information, although this relies on naming conventions to tie together slices and slivers.

http://groups.geni.net/geni/wiki/PlasticSlices/MonitoringRecommendations has links a variety of monitoring sites and information.

8. Slices

We created ten slices, named plastic-101 through plastic-110, to make them easy to identify, and to make it easy to identify the slivers within them. Each included a sliver on the MyPLC plnodes at each campuses, and an OpenFlow sliver controlling an IP subnet throughout the network (10.42.X.0/24), each of which was controlled by a simple OpenFlow controller (the NOX 'switch') module. For simplicity's sake, we used VLAN 3715 for the odd-numbered slices and VLAN 3716 for the even-numbered ones. The slices included various subsets of the eight campuses: Two that included all eight sites, two at the endpoints of the core VLANs, etc.

http://groups.geni.net/geni/wiki/PlasticSlices/SliceStatus has a table of which sites were in which slice during each baseline.

9. Experiments

We ran five experiments on these slices, to send various kinds of artificial but representative traffic across the network: ping for ICMP, netcat for unencrypted TCP, wget (HTTPS) for encrypted TCP, and iperf for TCP and UDP with some performance statistics. We picked these because they were simple and widely available, but still provided some variety, and are similar to the types of traffic that we expect to be used by real mesoscale GENI experiments. (Note that although ping and iperf both give you performance statistics, we weren't specifically trying to measure network performance, as this wasn't one of the goals of the project.)

http://groups.geni.net/geni/wiki/PlasticSlices/Experiments has more details about the experiments in general, and the baseline pages (see below) have more details about the exact paramemters used for each experiment in each baseline.

10. Baselines

We ran a series of eight baselines using these slices and experiments. We first confirmed the basic functionality and stability of the environment, by sending 1 GB of data across each slice, then repeating that once a day for three days, and then repeating that once a day for six days. We then began sending continuous traffic, 24 hours a day; first 1 Mbit/sec for a full day, then 10 Mb/s for a day, and then 10 Mb/s for six days. The final two baselines tested GENI procedures at a larger scale: We performed an Emergency Stop test with BBN, and tried creating many slices very quickly (one per second, for 10, 100, and 1000 seconds).

http://groups.geni.net/geni/wiki/PlasticSlices/BaselineEvaluation has a summary of the baselines, with links to pages with more details, which themselves have links to full logs.

11. Tools

We used a variety of simplistic command-line tools to manage the slices and experiments, including:

  • A Subversion directory full of rspecs, which we used as input to the Omni command-line client to manage the slices.
  • A directory of files with a list of logins for each slice, which we used as input to rsync (to copy files) and shmux (to execute commands).
  • The 'screen' program, with a screenrc file for each slice, which we used to get an interactive login to all slivers in a slice simultaneously, and to capture logs of the results of the experiments.
  • A directory of common user configuration files (dotfiles).

We briefly investigated tools such as Gush or Raven, but at the time the project began, neither seemed sufficiently well-integrated with GENI to be worth the overhead.

http://groups.geni.net/geni/wiki/PlasticSlices/Tools has many more details about the tools we used and how we used them, all the configuration files, etc.

12. Results

Most of the results were as we expected. We found that long-running experiments, unsurprisingly, are more vulnerable to infrastructure issues: The longer your experiment runs, the more likely it is to be affected by hardware/software bugs, upgrades, or outages. We also found that logging the results of long-running experiments poses some challenges: Logging every packet sent by an experiment that sends dozens or hundreds of packets per second, for six days, creates a very large log file, which can fill disks, and be challenging to analyze with simple tools.

We also found some results that would have been surprising on the regular Internet (or a similar traditional IP network), but which in fact demonstrate precisely the ways in which GENI is different from regular networks, often in advantageous ways.

12.1. Packet loss and OpenFlow

One of the results that would've been surprising on a regular network is packet loss, e.g. the 8% loss from BBN to Clemson with UDP in a 40-second test. This turns out to be related to our simplistic use of OpenFlow: As the first packet hits each OF switch in the path to the destination, across the entire country, each has to connect back to the slice's OF controller in Boston for instructions. This can take a few seconds to complete, but once the controller has installed rules in the switch's flowtable, subsequent packets flow at line speed, as expeced. Thus, packet loss statistics like this typically reflect "the first 8% of packets failed", not "out of every hundred packets, eight of them failed".

The logs from the client and server make this clear. On the client, all you see is the overall packet loss:

[  3] Server Report:
[  3]  0.0-38.5 sec   461 MBytes   100 Mbits/sec   0.067 ms 27877/356658 (7.8%)
[  3]  0.0-38.5 sec  208 datagrams received out-of-order

On the client, however, you see 76% loss in the first second, and loss is minimal after that:

[  3] local 10.42.104.52 port 5104 connected with 10.42.104.104 port 39958
[  3]  0.0- 1.0 sec  12.1 MBytes    101 Mbits/sec  0.053 ms 27604/36219 (76%)
[  3]  0.0- 1.0 sec  128 datagrams received out-of-order
[  3]  1.0- 2.0 sec  12.0 MBytes    101 Mbits/sec  465.109 ms    6/ 8523 (0.07%)
[  3]  1.0- 2.0 sec  60 datagrams received out-of-order
[  3]  2.0- 3.0 sec  12.0 MBytes    100 Mbits/sec  0.038 ms   11/ 8519 (0.13%)
[  3]  2.0- 3.0 sec  19 datagrams received out-of-order
[  3]  3.0- 4.0 sec  11.9 MBytes    100 Mbits/sec  0.043 ms    9/ 8524 (0.11%)
[  3]  4.0- 5.0 sec  12.0 MBytes    100 Mbits/sec  0.038 ms   10/ 8547 (0.12%)
[  3]  5.0- 6.0 sec  12.0 MBytes    100 Mbits/sec  0.031 ms   12/ 8546 (0.14%)
[  3]  6.0- 7.0 sec  12.0 MBytes    100 Mbits/sec  0.029 ms    4/ 8539 (0.047%)
[  3]  7.0- 8.0 sec  11.9 MBytes    100 Mbits/sec  0.032 ms    6/ 8523 (0.07%)

Packet loss is generally not desirable, but it highlights the fact that OpenFlow allows you to control traffic in GENI in ways that aren't possible in a regular network. Using OpenFlow doesn't require packet loss, of course: For example, we could have used a smarter (experiment-specific) controller that added flowtable rules to the switches before we even began sending traffic. Or, if we didn't want to use a more complicated controller for other reasons, we could have sent some seed traffic to cause the simplistic controller to create the flows, before we began sending the traffic that we actually cared about. OpenFlow in GENI gives you a great deal of flexibilty.

12.2. Latency and topology

Another result that would've been surprising on a regular network is low throughput, e.g. between two geographical nearby sites like BBN (in Boston) and Rutgers (in New Jersey). This turns out to be due to a very valuable feature of GENI: The ability to create and use differnet topologies in the core network. Not all GENI network paths are optimized for distance -- deliberately so, since some experiments specifically want long links with high latency. One of the paths available during this project took nearly ten thousand geographical miles to get from BBN to Rutgers, for example.

The RTT results from a ping test between BBN and Rutgers, using each of four different paths, shows this clearly. BBN is a useful test case for this because we connect to both NLR and Internet2, thus giving us four possible paths to each other campus (two VLANs through each of the two providers).

When you connect via BBN's connection to NLR, on VLAN 3715, the path goes Boston - Chicago - Atlanta - DC - NJ, and the ping RTT is 74.3 ms:

PING 10.42.101.111 (10.42.101.111) 56(84) bytes of data.
64 bytes from 10.42.101.111: icmp_seq=1 ttl=64 time=74.3 ms
64 bytes from 10.42.101.111: icmp_seq=2 ttl=64 time=74.3 ms
64 bytes from 10.42.101.111: icmp_seq=3 ttl=64 time=74.3 ms

If instead you use BBN's connection to I2 on VLAN 3715, the path goes Boston - New York - Los Angeles - Houston - Atlanta - DC - NJ, and the ping RTT doubles, to 152 ms:

PING 10.42.103.111 (10.42.103.111) 56(84) bytes of data.
64 bytes from 10.42.103.111: icmp_seq=1 ttl=64 time=152 ms
64 bytes from 10.42.103.111: icmp_seq=2 ttl=64 time=152 ms
64 bytes from 10.42.103.111: icmp_seq=3 ttl=64 time=152 ms

If you use BBN's connection to NLR on VLAN 3716, the path gets even longer -- Boston - Chicago - Denver - Seattle - Sunnyvale - Atlanta - DC - NJ -- which takes 179 ms:

PING 10.42.102.111 (10.42.102.111) 56(84) bytes of data.
64 bytes from 10.42.102.111: icmp_seq=1 ttl=64 time=179 ms
64 bytes from 10.42.102.111: icmp_seq=2 ttl=64 time=179 ms
64 bytes from 10.42.102.111: icmp_seq=3 ttl=64 time=179 ms

But if you use BBN's connection to I2 on VLAN 3716, the path is much shorter, Boston - New York - DC - NJ -- and only takes 14.8 ms:

PING 10.42.104.111 (10.42.104.111) 56(84) bytes of data.
64 bytes from 10.42.104.111: icmp_seq=1 ttl=64 time=14.8 ms
64 bytes from 10.42.104.111: icmp_seq=2 ttl=64 time=14.8 ms
64 bytes from 10.42.104.111: icmp_seq=3 ttl=64 time=14.8 ms

Thus, you can get variations up to a factor of 10 - 12 just by choosing your sites and paths carefully (and potentially even more, by designing an engineering a new toplogy using a different VLAN, rather than simply using one of the existing ones). This topology flexibility is another crucial feature of GENI.

13. Future work

This report concludes the formal part of the Plastic Slices project, but we plan to continue using the mesoscale infrastructure to run experiments and tests. We'll publish our plans, and our results, on the GENI wiki. We intend to keep data flowing continuously for the next few months, to allow us to continue to develop and test monitoring, operational procedures and practices, etc. We intend to switch to running experiments that are less artificial, and dig deeper into some of the things that we didn't have time to complete.

Specific goals include:

  • Emergency stop tests with every campus
  • More tests with high throughput (UDP and TCP)
  • More tests of high user volume (like Baseline 8)
  • Switching from static ARP tables to dynamic ARP requests
  • More iterations of long-running experiments
  • Investigating more sophisticated user tools

We're also interested in ideas for things that others would like us to investigate and/or try; feel free to contact help@geni.net if you have suggestions.

Finally, one high-level conclusion is that GENI is ready to expand, so if you represent a campus that isn't yet part of the GENI mesoscale, and would like to be, let us know. (help@geni.net)

14. Thanks

This project would have gone nowhere without countless hours of work and support from all the participants: The campuses (Clemson, Georgia Tech, Indiana, Rutgers, Stanford, Washington, and Wisconsin), their regional networks (NoX, SoX, Indiana GigaPoP, MAGPI, CENIC, PNWGP, and WiscNet), the core backbones (Internet2 and NLR), the monitoring work done by the GMOC and GPO staff, and the software developers at Stanford (OpenFlow), Princeton (MyPLC), Utah (ProtoGENI), the GMOC, and the GPO. Our deep and heartfelt thanks to all of them.