Changes between Version 4 and Version 5 of PlasticSlices/FinalReport


Ignore:
Timestamp:
08/03/11 17:57:05 (13 years ago)
Author:
hdempsey@bbn.com
Comment:

typos, minor wording fixes

Legend:

Unmodified
Added
Removed
Modified
  • PlasticSlices/FinalReport

    v4 v5  
    99= 1. Motivation =
    1010
    11 The Plastic Slices project was the first project in the Meso-Scale Campus Experiments plan, for evolving meso-scale experiments between now and GEC 12. The overarching purpose was to set shared campus goals, and to agree on a schedule and resources to support experiments. This first project was an effort to run ten (or more) GENI slices continuously for multiple months -- not merely for the sake of having ten slices running, but to gain experience managing and operating production-quality GENI resources.
    12 
    13 During Spiral 3, campuses have been expanding and updating their OpenFlow deployments. All campuses have agreed to run a GENI AM API compliant aggregate manager, and to support at least four multi-site GENI experiments, by GEC 12. This lays the foundation for GENI to continuously support and manage multiple simultaneous slices that contain resources from GENI AM API compliant aggregates with multiple cross-country layer 2 data paths. (http://groups.geni.net/geni/wiki/GeniApi has more information about what it means to be a "GENI AM API compliant aggregate.") This campus infrastructure can support the transition from building GENI to using GENI continuously in the meso-scale infrastructure. Longer-term, it can also support the transition to at-scale production use in 2012, as originally proposed by each campus.
     11The Plastic Slices project in the GENI meso-scale infrastructure was created to set shared campus goals for evolving GENI's infrastructure and to agree on a schedule and resources to support GENI experiments. This first project was an effort to run ten (or more) GENI slices continuously for multiple months -- not merely for the sake of having ten slices running, but to gain experience managing and operating production-quality GENI resources.
     12
     13During Spiral 3, campuses expanded and updated their OpenFlow deployments. All campuses agreed to run a GENI AM API compliant aggregate manager, and to support at least four multi-site GENI experiments by GEC 12 (November, 2011). This laid the foundation for GENI to continuously support and manage multiple simultaneous slices that contain resources from GENI AM API compliant aggregates with multiple cross-country layer 2 data paths. (http://groups.geni.net/geni/wiki/GeniApi has more information about what it means to be a "GENI AM API compliant aggregate.") This campus infrastructure can support the transition from building GENI to using GENI continuously in the meso-scale infrastructure. Longer-term, it can also support the transition to at-scale production use in 2012, as originally proposed by each campus.
    1414
    1515This project investigated technical issues associated with managing multiple slices for long-term experiments, and also tried out early operations procedures for supporting those experiments.
     
    1717= 2. Objectives =
    1818
    19 A high-level goal of this project was to run ten GENI slices continuously over a period of several months. This wasn't just an end in itself, though, but rather a means for us to accomplish two other objectives.
    20 
    21 First, we wanted to gain experience managing and operating production-quality mesoscale GENI resources, at various levels:
     19A high-level goal of this project was to run ten GENI slices continuously over a period of several months. This wasn't just an end in itself, but rather a means for us to accomplish two other objectives.
     20
     21First, we wanted to gain experience managing and operating production-quality meso-scale GENI resources, at various levels:
    2222
    2323 * Campuses and backbone providers managing their local resources
     
    2525 * Experimenters running experiments (a role that the GPO filled for this project)
    2626
    27 Additionally, we wanted to discover and record issues that early experimenters are likely to encounter, such as:
    28 
    29  * Operational availability and uptime
     27Additionally, we wanted to discover and record issues that early experimenters likely would encounter, such as:
     28
     29 * Operational availability and up-time
    3030 * Software-related issues, both with user tools and with aggregate software
    3131 * Experiment isolation, i.e. preventing experiments from interfering with each other
     
    4646== 4.1. Operational availability ==
    4747
    48 Is mesoscale GENI ready for operations with more experimenters? Our conclusion is that it is, but with some caveats.
     48Is meso-scale GENI ready for operations with more experimenters? Our conclusion is that it is, but with some caveats.
    4949
    5050We found that resource operators need to communicate more than they were, both with each other about their plans and about issues requiring coordination between sites, and with experimenters about outages. This improved over the course of the project, but further improvement is still needed.
     
    5656We identified some specific ideas for improvement:
    5757
    58  * We'd like to work with campuses to set and measure uptime targets for the aggregates they support. We understand that there will be variations from campus to campus, potentially fairly wide variations, and that's not fundamentally a problem, as long as the targets are well-documented and publicized, so that experimenters can set their expectations accurately.
     58 * We'd like to work with campuses to set and measure up-time targets for the aggregates they support. We understand that there will be variations from campus to campus, potentially fairly wide variations, and that's not fundamentally a problem, as long as the targets are well-documented and publicized, so that experimenters can set their expectations accurately.
    5959
    6060 * Operators should continue to give feedback and input to software developers on features and priorites.
     
    6464== 4.2. Software ==
    6565
    66 Is mesoscale GENI software ready to use in a production-quality environment? We concluded that it is, but again with some caveats.
    67 
    68 Much of the software that we rely upon is still new, and at an early stage of its life cycle. This is unavoidable to some extent, and improves with time. Similarly, GENI may be the first large-scale deployment for some software, whether for the software as a whole, or for new versions as they come out. It may be difficult for a developer to test their software adequately in their development environment, which is unlikely to be as large, geographically diverse, and otherwise heterogeneous, as the whole of GENI.
     66Is meso-scale GENI software ready to use in a production-quality environment? We concluded that it is, but again with some caveats.
     67
     68Much of the software we rely on is still new, and at an early stage of its life cycle. This is unavoidable to some extent, and improves with time. Similarly, GENI may be the first large-scale deployment for some software, whether for the software as a whole, or for new versions as they come out. It may be difficult for a developer to test their software adequately in their development environment, which is unlikely to be as large, geographically diverse, and otherwise heterogeneous, as the whole of GENI.
    6969
    7070Some ideas for improvement:
     
    7676 * Software developers could use GENI slices and resources to test their software in ways that aren't feasible within their development environments, such as with hardware from multiple vendors, over long-distance network links, etc -- using GENI resources before the software is fully deployed to the entirety of GENI.
    7777
    78  * More professional software engineers are getting involved with GENI, taking on software development tasks that had previously been handled by researchers.
     78 * More professional software engineers are getting involved with GENI, taking on meso-scale infrastructure software development tasks that had previously been handled by researchers and students, which is helping to make the integrated software more robust.
    7979
    8080== 4.3. Isolation ==
    8181
    82 Are mesoscale GENI experiments isolated from each other? We concluded that they are only somewhat.
    83 
    84 Many resources in GENI are still shared between mulitiple experiments, with only soft (e.g. advisory) partitions between them. We found many examples of this:
     82Are meso-scale GENI experiments isolated from each other? We concluded that they are only somewhat.
     83
     84Many resources in GENI are still shared between multiple experiments, with only soft (e.g. advisory) partitions between them. We found many examples of this:
    8585
    8686 * MyPLC plnodes are virtual machines on shared hardware, so an experiment that goes haywire on one VM can starve other VMs on that hardware for resources.
     
    8888 * The FlowVisor flowspace is partitioned between experiments, but the flowspace as a whole is shared, so issues that affect the entire flowspace (such as the number of rules, which can reach into the thousands or tens of thousands) can make it difficult or impossible for any experiment to manipulate the flowspace.
    8989
    90  * Topology problems with one experiment can affect other experiments, even experiments which have a every simple toplogy themselves, e.g. by creating a broadcast storm, leaking traffic across VLANs unexpectedly, etc.
     90 * Topology problems with one experiment can affect other experiments, even experiments which have a every simple topology themselves, e.g. by creating a broadcast storm, leaking traffic across VLANs unexpectedly, etc.
    9191
    9292 * All of the bandwidth in GENI is shared, with no easy ways to set either hard limits on maximum use, or to request a minimum requirement.
    9393
    94 This is already an active area of work within GENI, but we had some specific ideas for improving the situation:
     94Improving isolation is already an active area of work within GENI, but we had some specific ideas for improvements:
    9595
    9696 * We'll encourage more information-sharing between experimenters, to help them prevent their experiments from going haywire in the first place.
     
    9898 * When that isn't enough, the GPO and GMOC should work to develop better procedures to handle communication between operators, and with experimenters, when there are issues related to isolation.
    9999
    100  * Additional hard limits on resource usage, like QoS in the OpenFlow protocol and in backbone hardware, are the best way to truly ensure that experiments are isolated from each other.
     100 * Additional hard limits on resource usage, like QoS in the OpenFlow protocol and in backbone hardware, which are planned for later releases, will help keep experiments more isolated from each other.
    101101
    102102== 4.4. Ease of use ==
    103103
    104 Is mesoscale GENI easy for experimenters to use? We found that it is in some ways, but not in others.
    105 
    106 Getting started, and doing very simple things, is in fact fairly easy: There are few barriers to entry, and they're generally low. However, experimenter tools to enable more sophisticated experiments are only just now interoperating with GENI -- although this is improving rapidly, and in fact improved over the course of the project. There are also specific usability concerns with OpenFlow, where the Expedient Opt-In Manager requires manual approval from an operator at every OpenFlow aggregate, whenever any OpenFlow resource is allocated; this can be daunting to new experimenters, and even for experienced folks, the need for manual approval can cause significant delays.
    107 
    108 This is another area where work is already active within GENI (most of the Experimenter track at GEC 11 focused on tools), but we again have some ideas for additional improvements:
     104Is meso-scale GENI easy for experimenters to use? We found that it is in some ways, but not in others.
     105
     106Getting started and doing very simple things is in fact fairly easy: There are few barriers to entry, and they're generally low. However, experimenter tools to enable more sophisticated experiments are only just now interoperating with GENI -- although this is improving rapidly, and in fact improved over the course of the project. There are also specific usability concerns with OpenFlow, where the Expedient Opt-In Manager requires manual approval from an operator at every OpenFlow aggregate, whenever any OpenFlow resource is allocated.  This can be daunting to new experimenters, and can cause significant delays for anyone.
     107
     108Usability is another area where work is already active in GENI (most of the Experimenter track at GEC 11 focused on tools), but we again have some ideas for additional improvements:
    109109
    110110 * We'd like to encourage experimenters to try out newly-integrated tools, and actively request bug fixes and features. Experimenter demand is part of a very positive feedback loop, encouraging developers to create better tools, which in turn makes GENI more usable to more experimenters.
     
    116116= 5. Backbone resources =
    117117
    118 The Plastice Slices project used the GENI network core in Internet2 and NLR, which includes two OpenFlow-controlled VLANs (3715 and 3716) on a total of ten switches (five in each of I2 and NLR). The OpenFlow network in each provider was managed by an Expedient OpenFlow aggregate manager.
     118The Plastic Slices project used the GENI network core in Internet2 and NLR, which includes two OpenFlow-controlled VLANs (3715 and 3716) on a total of ten switches (five in each of I2 and NLR). The OpenFlow network in each provider was managed by an Expedient OpenFlow aggregate manager.
    119119
    120120http://groups.geni.net/geni/wiki/NetworkCore has more information about the GENI network core and how to use it.
     
    122122= 6. Campus resources =
    123123
    124 Each campus had a private OpenFlow-controlled VLAN (typically VLAN 1750), which was cross-connected to the two backbone VLANs (which were typically not OpenFlow-controlled on campuses), and managed by an Expedient OpenFlow aggregate manager. Each campus also had a MyPLC aggregate, with two plnodes (or in some cases more), each of which included a dataplane interfaces connected to VLAN 1750 (and others).
     124Each campus had a private OpenFlow-controlled VLAN (typically VLAN 1750), which was cross-connected to the two backbone VLANs (which were typically not OpenFlow-controlled on campuses), and managed by an Expedient OpenFlow aggregate manager. Each campus also had a MyPLC aggregate, with two plnodes (or in some cases more), each of which included a dataplane interface connected to VLAN 1750 (and others).
    125125
    126126This network topology is described in more detail at http://groups.geni.net/geni/wiki/NetworkCore, and http://groups.geni.net/geni/wiki/TangoGENI#ParticipatingAggregates has links to the aggregate info page for each aggregate.
     
    130130|| [[ThumbImage(PlasticSlices/BaselineEvaluation/Baseline5Traffic:baseline5-txbytes.png, thumb=500)]] || [[ThumbImage(PlasticSlices/BaselineEvaluation/Baseline5Traffic:baseline5-rxbytes.png, thumb=500)]] ||
    131131
    132 (Graphs of bytes sent (TX) and bytes received (TX) during Baseline 5.)
    133 
    134 During the project, all mesoscale campuses were configured to send monitoring data to the GMOC. We found that sites had resources that didn't use NTP, which was essential for correlating data between sites. The GMOC offers an interface called SNAPP for browising the data that they collect, visible at http://gmoc-db.grnoc.iu.edu/api-demo/. In addition, the GMOC offers an API which anyone can use to download the raw data that they collect, analyze it, graph it, etc, which we used to create some local graphs that we found useful. They collect data of interest to both operators and experimenters, at various levels of granularity, and can create some per-slice information, although this relies on naming conventions to tie together slices and slivers.
    135 
    136 http://groups.geni.net/geni/wiki/PlasticSlices/MonitoringRecommendations has links a variety of monitoring sites and information.
     132(Graphs of bytes sent (first graph TX) and bytes received (second graph RX) in each slice during Baseline 5.)
     133
     134During the project, all mesoscale campuses were configured to send monitoring data to the GMOC. Some sites initially configured resources that didn't use NTP, but revised the configurations after starting monitoring, because NTP is essential for correlating data between sites. The GMOC offers an interface called SNAPP for browsing the data that they collect, visible at http://gmoc-db.grnoc.iu.edu/api-demo/. In addition, the GMOC offers an API which anyone can use to download raw collected data from GMOC to analyze it, graph it, etc.  The GPO used this API to create useful local monitoring graphs (samples included in this report, and more available through the GENI wiki). The GPO data is of interest to both operators and experimenters, covers various levels of granularity, and presents some per-slice information.  The per-slice information relies on naming conventions to tie together slices and slivers in this implementation.
     135
     136http://groups.geni.net/geni/wiki/PlasticSlices/MonitoringRecommendations has links to a variety of monitoring sites and information.
    137137
    138138= 8. Slices =