wiki:PlasticSlices/FinalReport

Version 8 (modified by hdempsey@bbn.com, 13 years ago) (diff)

fix typos

This is the final report on the conclusions and results of the Plastic Slices project. This project ran ten GENI slices continuously over a period of three months, in a nationwide infrastructure that included eight campuses, in order to gain experience managing and operating production-quality GENI resources, and to discover and record issues that early experimenters likely would encounter. We conclude that meso-scale GENI is generally ready for operations with more experimenters, that GENI software is mostly ready to use in a production-quality environment, that GENI experiments are somewhat isolated from each other, and that GENI is easy to use initially but more challenging to use for complex experiments. All of these areas need continuing improvement, and we make suggestions for ways to do that. The report also includes more details about the environment, experiments, baselines, and tools we used, and further discussion of some of the results which highlight ways in which GENI is a unique research environment. More details are available at http://groups.geni.net/geni/wiki/PlasticSlices.

ThumbImage(TangoGENI:OF-VLAN3715.jpg, thumb=800)?

(A diagram of one of the core VLANs in the current meso-scale GENI deployment.)

1. Motivation

The Plastic Slices project in the GENI meso-scale infrastructure was created to set shared campus goals for evolving GENI's infrastructure and to agree on a schedule and resources to support GENI experiments. This first project was an effort to run ten (or more) GENI slices continuously for multiple months -- not merely for the sake of having ten slices running, but to gain experience managing and operating production-quality GENI resources.

During Spiral 3, campuses expanded and updated their OpenFlow deployments. All campuses agreed to run a GENI AM API compliant aggregate manager, and to support at least four multi-site GENI experiments by GEC 12 (November 2011). This laid the foundation for GENI to continuously support and manage multiple simultaneous slices that contain resources from GENI AM API compliant aggregates with multiple cross-country layer 2 data paths. (http://groups.geni.net/geni/wiki/GeniApi has more information about what it means to be a "GENI AM API compliant aggregate.") This campus infrastructure can support the transition from building GENI to using GENI continuously in the meso-scale infrastructure. Longer-term, it can also support the transition to at-scale production use in 2012, as originally proposed by each campus.

This project investigated technical issues associated with managing multiple slices for long-term experiments, and also tried out early operations procedures for supporting those experiments.

2. Objectives

A high-level goal of this project was to run ten GENI slices continuously over a period of several months. This wasn't just an end in itself, but rather a means for us to accomplish two other objectives.

First, we wanted to gain experience managing and operating production-quality meso-scale GENI resources, at various levels:

  • Campuses and backbone providers managing their local resources
  • The GMOC performing meta-operations activities
  • Experimenters running experiments (a role that the GPO filled for this project)

Additionally, we wanted to discover and record issues that early experimenters likely would encounter, such as:

  • Operational availability and uptime
  • Software-related issues, both with user tools and with aggregate software
  • Experiment isolation, i.e. preventing experiments from interfering with each other
  • Ease of use

Everything we did is documented on the GENI wiki, and is reproducible by others, modulo changes in the resources available from sites, versions of software deployed at each site, etc.

3. Environment

The environment we used for the project began with two engineered VLANs (VLAN 3715 and 3716), which were provisioned through the Internet2 and NLR backbones, through various regional networks, and through to the meso-scale deployments on eight campuses. Each of I2 and NLR provided OpenFlow network resources, and each campus provided OpenFlow network and MyPLC compute resources where we ran the experiments. The GENI Meta-Operations Center (GMOC) collected monitoring data, and provided OpenFlow support to campuses. We ran ten GENI slices, using different subsets of the eight campuses, and used those slices to run five artificial experiments on two slices each. We used those experiments in a series of eight baselines, with traffic flows that were representative of real GENI experiments. All of these resources were allocated with the Omni GENI API client, and we then used a variety of common Linux command-line tools to manage the slices and experiments. The various operators used draft versions of GENI operational procedures, and communicated with each other and with experimenters via GENI mailing lists and chatrooms.

We had very much hoped to include real experiments and/or real experimenters in the project by the end, but we concluded that there weren't any non-artificial experiments ready to run at either of the two interim points when we reviewed this question during the project. This is one of the central things we intend to accomplish in our follow-on work.

4. Conclusions

We focused our conclusions on the four areas of concern about issues that experimenters were likely to encounter.

4.1. Operational availability

Is meso-scale GENI ready for operations with more experimenters? Our conclusion is that it is, but with some caveats.

We found that resource operators need to communicate more than they were, both with each other about their plans and about issues requiring coordination between sites, and with experimenters about outages. This improved over the course of the project, but further improvement is still needed.

We also found that it was difficult to identify the relationships between the various pieces the make up GENI. Given a resource, it's still hard to determine what sliver it's a part of, and what slice that sliver is a part of, and what user is responsible for that slice. We used naming conventions to make this easier (e.g. all slices were named "plastic-" plus a number, all OpenFlow slivers started with the hostname of the system running the OpenFlow controller, all the slivers in a slice included the number of the slice they were a part of, etc), but real experimenters are unlikely to be so consistent (and certainly aren't required to be), so this solution won't scale well, especially to dozens or hundreds of experimenters.

We also concluded that resources weren't as available as they should be in a production-quality environment. Aggregate managers were sometimes down unexpectedly, and the software that runs them is still undesirably buggy. (On the plus side, the software developers we worked with were very responsive to our bug reports.) Also, operators sometimes found it difficult to identify which versions of software they were running, or to determine who was running a newer version, as some software is still largely identified by Git hash tags rather than by sequential release numbers.

We identified some specific ideas for improvement:

  • We'd like to work with campuses to set and measure uptime targets for the aggregates they support. We understand that there will be variations from campus to campus, potentially fairly wide variations, and that's not fundamentally a problem, as long as the targets are well-documented and publicized, so that experimenters can set their expectations accurately.
  • Operators should continue to give feedback and input to software developers on features and priorities.
  • We hope that campus staff will reach out to prospective new experimenters at their campuses, and encourage those experimenters to contact the GPO's Experimenter Support group (via help@geni.net) about getting started with GENI.

4.2. Software

Is meso-scale GENI software ready to use in a production-quality environment? We concluded that it is, but again with some caveats.

Much of the software we rely on is still new, and at an early stage of its life cycle. This is unavoidable to some extent, and improves with time. Similarly, GENI may be the first large-scale deployment for some software, whether for the software as a whole, or for new versions as they come out. It may be difficult for a developer to test their software adequately in their local development environment, which is unlikely to be as large, geographically diverse, and otherwise heterogeneous, as the whole of GENI.

Some ideas for improvement:

  • GENI racks (still on the horizon) will make production environments more similar, giving developers a somewhat smaller target, and reducing some of the difficulty in developing for a large heterogeneous environment.
  • InCNTRE, a Software Defined Networking initiative at Indiana University, will emphasize interoperability and commercial uses of OpenFlow, which should help to improve maturity for OpenFlow in particular.
  • Software developers could use GENI slices and resources to test their software in ways that aren't feasible within their development environments, such as with hardware from multiple vendors, over long-distance network links, etc -- using GENI resources before the software is fully deployed to the entirety of GENI.
  • More professional software engineers are getting involved with GENI, taking on meso-scale infrastructure software development tasks that had previously been handled by researchers and students, which is helping to make the integrated software more robust.

4.3. Isolation

Are meso-scale GENI experiments isolated from each other? We concluded that they are only somewhat.

Many resources in GENI are still shared between multiple experiments, with only soft (e.g. advisory) partitions between them. We found many examples of this:

  • MyPLC plnodes are virtual machines on shared hardware, so an experiment that goes haywire on one VM can starve other VMs on that hardware for resources.
  • The FlowVisor flowspace is partitioned between experiments, but the flowspace as a whole is shared, so issues that affect the entire flowspace (such as the number of rules, which can reach into the thousands or tens of thousands) can make it difficult or impossible for any experiment to manipulate the flowspace.
  • Topology problems with one experiment can affect other experiments, even experiments which have a every simple topology themselves, e.g. by creating a broadcast storm, leaking traffic across VLANs unexpectedly, etc.
  • All of the bandwidth in GENI is shared, with no easy ways for an experiment or slice either to set hard limits on its maximum usage, or to request a minimum requirement for its dedicated use.

Improving isolation is already an active area of work within GENI, but we had some specific ideas for improvements:

  • We'll encourage more information-sharing between experimenters, to help them prevent their experiments from going haywire in the first place.
  • When that isn't enough, the GPO and GMOC should work to develop better procedures to handle communication between operators, and with experimenters, when there are issues related to isolation.
  • Additional hard limits on resource usage, like QoS in the OpenFlow protocol and in backbone hardware (which are planned for later releases), will help keep experiments more isolated from each other.

4.4. Ease of use

Is meso-scale GENI easy for experimenters to use? We found that it is in some ways, but not in others.

Getting started and doing very simple things is in fact fairly easy: There are few barriers to entry, and they're generally low. However, experimenter tools to enable more sophisticated experiments are only just now interoperating with GENI -- although this is improving rapidly, and in fact improved over the course of the project. There are also specific usability concerns with OpenFlow, where the Expedient Opt-In Manager requires separate manual approval from an operator whenever any OpenFlow sliver is created or changed. This can be daunting to new experimenters, and can cause significant delays for anyone.

Usability is another area where work is already active in GENI (most of the Experimenter track at GEC 11 focused on tools), but we again have some ideas for additional improvements:

  • We'd like to encourage experimenters to try out newly-integrated tools, and actively request bug fixes and features. Experimenter demand is part of a very positive feedback loop, encouraging developers to create better tools, which in turn makes GENI more usable to more experimenters.
  • Similar to the idea of using GENI slices and resources to test aggregate software, tool developers may be able to use GENI resources to test their tools on a larger scale than their own local development environments.
  • Ongoing work on the general issue of stitching in GENI may also help improve the OpenFlow opt-in situation.

5. Backbone resources

The Plastic Slices project used the GENI network core in Internet2 and NLR, which includes two OpenFlow-controlled VLANs (3715 and 3716) on a total of ten switches (five in each of I2 and NLR). The OpenFlow network in each provider was managed by an Expedient OpenFlow aggregate manager.

http://groups.geni.net/geni/wiki/NetworkCore has more information about the GENI network core and how to use it.

6. Campus resources

Each campus had a private OpenFlow-controlled VLAN (typically VLAN 1750), which was cross-connected to the two backbone VLANs (which were typically not OpenFlow-controlled on campuses), and managed by an Expedient OpenFlow aggregate manager. Each campus also had a MyPLC aggregate, with two plnodes (or in some cases more), each of which included a dataplane interface connected to VLAN 1750 (and others).

This network topology is described in more detail at http://groups.geni.net/geni/wiki/OpenFlow/CampusTopology, and http://groups.geni.net/geni/wiki/TangoGENI#ParticipatingAggregates has links to the aggregate info page for each aggregate.

7. Monitoring

ThumbImage(PlasticSlices/BaselineEvaluation/Baseline5Traffic:baseline5-txbytes.png, thumb=500)? ThumbImage(PlasticSlices/BaselineEvaluation/Baseline5Traffic:baseline5-rxbytes.png, thumb=500)?

(Graphs of bytes sent (first graph) and bytes received (second graph) in each slice during Baseline 5.)

During the project, all meso-scale campuses were configured to send monitoring data to the GMOC. Some sites initially configured resources that didn't use NTP, but revised the configurations after starting monitoring, because NTP is essential for correlating data between sites. The GMOC offers an interface called SNAPP for browsing the data that they collect, visible at http://gmoc-db.grnoc.iu.edu/api-demo/ (Despite the word "demo" in the URL, this is a production GMOC web interface with a number of options for searching and displaying data). In addition, the GMOC offers an API which anyone can use to download raw GMOC-collected data to analyze, graph, etc. The GPO used this API to create some useful Plastic Slices monitoring graphs (such as the graph above, for example; more are available through the GENI wiki). The GPO data is of interest to both operators and experimenters, covers various levels of granularity, and presents some per-slice information (although the per-slice information relies on naming conventions to tie together slices and slivers in the current implementation).

http://groups.geni.net/geni/wiki/PlasticSlices/MonitoringRecommendations has links to a variety of monitoring sites and information.

8. Slices

We created ten slices, named plastic-101 through plastic-110, to make them easy to identify, and to make it easy to identify the slivers within them. Each included a sliver on the MyPLC plnodes at each campus; and an OpenFlow sliver at each campus and in I2 and NLR, with an IP subnet throughout the network (10.42.X.0/24), controlled by a simple OpenFlow controller (the NOX 'switch') module. For simplicity's sake, we used VLAN 3715 for the odd-numbered slices and VLAN 3716 for the even-numbered ones. The slices included various subsets of the eight campuses: Two that included all eight sites, two at the endpoints of the core VLANs, etc.

http://groups.geni.net/geni/wiki/PlasticSlices/SliceStatus has a table of which sites' resources were included in which slice during each baseline.

9. Experiments

We ran five experiments on these slices, to send various kinds of artificial but representative traffic across the network: ping for ICMP, netcat for unencrypted TCP, wget (HTTPS) for encrypted TCP, and iperf for TCP and UDP with some performance statistics. We picked these because they were simple and widely available, but still provided some variety, and are similar to the types of traffic that we expect to be used by real meso-scale GENI experiments. (Note that although ping and iperf both give you performance statistics, we weren't specifically trying to measure network performance, as this wasn't one of the goals of the project.)

http://groups.geni.net/geni/wiki/PlasticSlices/Experiments has more details about the experiments in general, and the baseline pages (see below) have more details about the exact parameters used for each experiment in each baseline.

10. Baselines

We ran a series of eight baselines using these slices and experiments. We first confirmed the basic functionality and stability of the environment, by sending 1 GB of data across each slice, then repeating that once a day for three days, and then repeating that once a day for six days. We then began sending continuous traffic, 24 hours a day: First 1 Mbit/sec for a full day, then 10 Mb/s for a day, and then 10 Mb/s for six days. The final two baselines tested GENI procedures at a larger scale: We performed an Emergency Stop test with BBN, and tried creating many slices very quickly to simulate user load (one per second, for 10, 100, and 1000 seconds).

http://groups.geni.net/geni/wiki/PlasticSlices/BaselineEvaluation has a summary of the baselines, with links to pages with more details, which themselves have links to full logs.

11. Tools

We used a variety of simplistic command-line tools to manage the slices and experiments, including:

  • A Subversion directory full of rspecs, which we used as input to the Omni command-line client to manage the slices.
  • A directory of files with a list of logins for each slice, which we used as input to rsync (to copy files) and shmux (to execute commands).
  • The 'screen' program, with a screenrc file for each slice, which we used to get an interactive login to all slivers in a slice simultaneously, and to capture logs of the results of the experiments.
  • A directory of common user configuration files (dotfiles).

We briefly investigated experimenter tools such as Gush and Raven, but at the time the project began, neither seemed sufficiently well-integrated with GENI to be easily used. We expect to revisit experimenter tools in later projects.

http://groups.geni.net/geni/wiki/PlasticSlices/Tools has many more details about the tools we used and how we used them, all the configuration files, etc.

12. Results

Most of the results were as we expected. We found that long-running experiments, unsurprisingly, are more vulnerable to infrastructure issues: The longer your experiment runs, the more likely it is to be affected by hardware/software bugs, upgrades, or outages. We also found that logging the results of long-running experiments poses some challenges: Logging every packet sent by an experiment that sends dozens or hundreds of packets per second, for six days, creates a very large log file, which can fill disks, and be challenging to analyze with simple tools.

We also found some results that would have been surprising on the regular Internet (or a similar traditional IP network), but which in fact demonstrate precisely the ways in which GENI is different from regular networks, often in advantageous ways.

12.1. Packet loss and OpenFlow

One of the results that would've been surprising on a regular network is packet loss, e.g. nearly 8% loss from BBN to Clemson with UDP in a 40-second test. This turns out to be related to our simplistic use of OpenFlow: As the first packet hits each OF switch in the path to the destination, across the entire country, each has to connect back to the slice's OF controller in Boston for instructions. This can take a few seconds to complete, but once the controller has installed rules in the switch's flowtable, subsequent packets flow at line speed, as expeced. Thus, packet loss statistics like this typically reflect "the first 8% of packets failed", not "out of every hundred packets, eight of them failed".

This becomes clear when the logs from the client and server are compared. On the client (summarizing what the server said it saw), all you see is the overall packet loss:

[  3] Server Report:
[  3]  0.0-38.5 sec   461 MBytes   100 Mbits/sec   0.067 ms 27877/356658 (7.8%)
[  3]  0.0-38.5 sec  208 datagrams received out-of-order

On the server (in the detailed second-by-second logs), however, you see 76% loss in the first second, and loss is minimal after that:

[  3] local 10.42.104.52 port 5104 connected with 10.42.104.104 port 39958
[  3]  0.0- 1.0 sec  12.1 MBytes    101 Mbits/sec  0.053 ms 27604/36219 (76%)
[  3]  0.0- 1.0 sec  128 datagrams received out-of-order
[  3]  1.0- 2.0 sec  12.0 MBytes    101 Mbits/sec  465.109 ms    6/ 8523 (0.07%)
[  3]  1.0- 2.0 sec  60 datagrams received out-of-order
[  3]  2.0- 3.0 sec  12.0 MBytes    100 Mbits/sec  0.038 ms   11/ 8519 (0.13%)
[  3]  2.0- 3.0 sec  19 datagrams received out-of-order
[  3]  3.0- 4.0 sec  11.9 MBytes    100 Mbits/sec  0.043 ms    9/ 8524 (0.11%)
[  3]  4.0- 5.0 sec  12.0 MBytes    100 Mbits/sec  0.038 ms   10/ 8547 (0.12%)
[  3]  5.0- 6.0 sec  12.0 MBytes    100 Mbits/sec  0.031 ms   12/ 8546 (0.14%)
[  3]  6.0- 7.0 sec  12.0 MBytes    100 Mbits/sec  0.029 ms    4/ 8539 (0.047%)
[  3]  7.0- 8.0 sec  11.9 MBytes    100 Mbits/sec  0.032 ms    6/ 8523 (0.07%)

Packet loss is generally not desirable, but it highlights the fact that OpenFlow allows you to control traffic in GENI in ways that aren't possible in a regular network. Using OpenFlow doesn't require packet loss, of course: For example, we could have used a smarter (experiment-specific) controller that added flowtable rules to the switches before we even began sending traffic. Or, if we didn't want to use a more complicated controller for other reasons, we could have sent some seed traffic to cause the simplistic controller to create the flows, before we began sending the traffic that we actually measured. OpenFlow in GENI gives experimenters a great deal of flexibility.

12.2. Latency and topology

Another result that would've been surprising on a regular network is low throughput, e.g. between two geographically nearby sites like BBN (in Boston) and Rutgers (in New Jersey). This turns out to be due to a very valuable feature of GENI: The ability to create and use different topologies in the core network. Not all GENI network paths are optimized for distance -- deliberately so, since some experiments specifically want long links with high latency. One of the paths available during this project took nearly ten thousand geographical miles to get from BBN to Rutgers, for example.

The RTT results from a ping test between BBN and Rutgers, using each of four different paths, show this clearly. BBN is a useful test case for this because we connect to both NLR and Internet2, thus giving us four possible paths to each other campus (two VLANs through each of the two providers).

When you connect via BBN's connection to NLR, on VLAN 3715, the traffic path is Boston - Chicago - Atlanta - DC - NJ, and the ping RTT is 74.3 ms:

PING 10.42.101.111 (10.42.101.111) 56(84) bytes of data.
64 bytes from 10.42.101.111: icmp_seq=1 ttl=64 time=74.3 ms
64 bytes from 10.42.101.111: icmp_seq=2 ttl=64 time=74.3 ms
64 bytes from 10.42.101.111: icmp_seq=3 ttl=64 time=74.3 ms

If instead you use BBN's connection to I2 on VLAN 3715, the path is Boston - New York - Los Angeles - Houston - Atlanta - DC - NJ, and the ping RTT doubles, to 152 ms:

PING 10.42.103.111 (10.42.103.111) 56(84) bytes of data.
64 bytes from 10.42.103.111: icmp_seq=1 ttl=64 time=152 ms
64 bytes from 10.42.103.111: icmp_seq=2 ttl=64 time=152 ms
64 bytes from 10.42.103.111: icmp_seq=3 ttl=64 time=152 ms

If you use BBN's connection to NLR on VLAN 3716, the path gets even longer -- Boston - Chicago - Denver - Seattle - Sunnyvale - Atlanta - DC - NJ -- which takes 179 ms:

PING 10.42.102.111 (10.42.102.111) 56(84) bytes of data.
64 bytes from 10.42.102.111: icmp_seq=1 ttl=64 time=179 ms
64 bytes from 10.42.102.111: icmp_seq=2 ttl=64 time=179 ms
64 bytes from 10.42.102.111: icmp_seq=3 ttl=64 time=179 ms

But if you use BBN's connection to I2 on VLAN 3716, the path is much shorter, Boston - New York - DC - NJ -- and only takes 14.8 ms:

PING 10.42.104.111 (10.42.104.111) 56(84) bytes of data.
64 bytes from 10.42.104.111: icmp_seq=1 ttl=64 time=14.8 ms
64 bytes from 10.42.104.111: icmp_seq=2 ttl=64 time=14.8 ms
64 bytes from 10.42.104.111: icmp_seq=3 ttl=64 time=14.8 ms

Thus, you can get variations up to a factor of 10 - 12 just by choosing your sites and paths carefully (and potentially even more, by engineering a new toplogy using a different VLAN, rather than simply using one of the existing ones). This topology flexibility allows GENI to support more varied experiments than an exclusively IP testbed.

13. Future work

This report concludes the formal part of the Plastic Slices project, but we plan to continue using the meso-scale infrastructure to run experiments and tests. We'll publish additional plans and results on the GENI wiki, but in general terms, we intend to keep data flowing continuously for the next few months, to allow us to continue to develop and test monitoring, operational procedures and practices and to integrate new software and hardware. We also expect to involve actual experimenters in future work, and to investigate some of the initial Plastic Slices results in more detail.

Specific goals include:

  • Emergency stop tests with every campus
  • More tests with high throughput (UDP and TCP)
  • More tests of high user volume (like Baseline 8)
  • Switching from static ARP tables to dynamic ARP requests
  • More iterations of long-running experiments
  • Investigating more sophisticated user tools

We're also interested in ideas for things that others would like us to investigate and/or try; feel free to contact help@geni.net if you have suggestions.

Finally, one high-level conclusion is that GENI is ready to expand, so if you represent a campus that isn't yet part of the GENI meso-scale, and would like to be, let us know. (help@geni.net)

14. Thanks

This project would have gone nowhere without countless hours of work and support from all the participants: The campuses (Clemson, Georgia Tech, Indiana, Rutgers, Stanford, Washington, and Wisconsin), their regional networks (NoX, SoX, Indiana GigaPoP, MAGPI, CENIC, PNWGP, and WiscNet), the core backbones (Internet2 and NLR), the monitoring work done by the GMOC and GPO staff, and the software developers at Stanford (OpenFlow), Princeton (MyPLC), Utah (ProtoGENI), the GMOC, and the GPO. Our deep and heartfelt thanks to all of them.