Changes between Version 4 and Version 5 of PlasticSlices/FinalReport
- Timestamp:
- 08/03/11 17:57:05 (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
PlasticSlices/FinalReport
v4 v5 9 9 = 1. Motivation = 10 10 11 The Plastic Slices project was the first project in the Meso-Scale Campus Experiments plan, for evolving meso-scale experiments between now and GEC 12. The overarching purpose was to set shared campus goals, and to agree on a schedule and resources to supportexperiments. This first project was an effort to run ten (or more) GENI slices continuously for multiple months -- not merely for the sake of having ten slices running, but to gain experience managing and operating production-quality GENI resources.12 13 During Spiral 3, campuses have been expanding and updating their OpenFlow deployments. All campuses have agreed to run a GENI AM API compliant aggregate manager, and to support at least four multi-site GENI experiments, by GEC 12. This laysthe foundation for GENI to continuously support and manage multiple simultaneous slices that contain resources from GENI AM API compliant aggregates with multiple cross-country layer 2 data paths. (http://groups.geni.net/geni/wiki/GeniApi has more information about what it means to be a "GENI AM API compliant aggregate.") This campus infrastructure can support the transition from building GENI to using GENI continuously in the meso-scale infrastructure. Longer-term, it can also support the transition to at-scale production use in 2012, as originally proposed by each campus.11 The Plastic Slices project in the GENI meso-scale infrastructure was created to set shared campus goals for evolving GENI's infrastructure and to agree on a schedule and resources to support GENI experiments. This first project was an effort to run ten (or more) GENI slices continuously for multiple months -- not merely for the sake of having ten slices running, but to gain experience managing and operating production-quality GENI resources. 12 13 During Spiral 3, campuses expanded and updated their OpenFlow deployments. All campuses agreed to run a GENI AM API compliant aggregate manager, and to support at least four multi-site GENI experiments by GEC 12 (November, 2011). This laid the foundation for GENI to continuously support and manage multiple simultaneous slices that contain resources from GENI AM API compliant aggregates with multiple cross-country layer 2 data paths. (http://groups.geni.net/geni/wiki/GeniApi has more information about what it means to be a "GENI AM API compliant aggregate.") This campus infrastructure can support the transition from building GENI to using GENI continuously in the meso-scale infrastructure. Longer-term, it can also support the transition to at-scale production use in 2012, as originally proposed by each campus. 14 14 15 15 This project investigated technical issues associated with managing multiple slices for long-term experiments, and also tried out early operations procedures for supporting those experiments. … … 17 17 = 2. Objectives = 18 18 19 A high-level goal of this project was to run ten GENI slices continuously over a period of several months. This wasn't just an end in itself, though,but rather a means for us to accomplish two other objectives.20 21 First, we wanted to gain experience managing and operating production-quality meso scale GENI resources, at various levels:19 A high-level goal of this project was to run ten GENI slices continuously over a period of several months. This wasn't just an end in itself, but rather a means for us to accomplish two other objectives. 20 21 First, we wanted to gain experience managing and operating production-quality meso-scale GENI resources, at various levels: 22 22 23 23 * Campuses and backbone providers managing their local resources … … 25 25 * Experimenters running experiments (a role that the GPO filled for this project) 26 26 27 Additionally, we wanted to discover and record issues that early experimenters are likely toencounter, such as:28 29 * Operational availability and up time27 Additionally, we wanted to discover and record issues that early experimenters likely would encounter, such as: 28 29 * Operational availability and up-time 30 30 * Software-related issues, both with user tools and with aggregate software 31 31 * Experiment isolation, i.e. preventing experiments from interfering with each other … … 46 46 == 4.1. Operational availability == 47 47 48 Is meso scale GENI ready for operations with more experimenters? Our conclusion is that it is, but with some caveats.48 Is meso-scale GENI ready for operations with more experimenters? Our conclusion is that it is, but with some caveats. 49 49 50 50 We found that resource operators need to communicate more than they were, both with each other about their plans and about issues requiring coordination between sites, and with experimenters about outages. This improved over the course of the project, but further improvement is still needed. … … 56 56 We identified some specific ideas for improvement: 57 57 58 * We'd like to work with campuses to set and measure up time targets for the aggregates they support. We understand that there will be variations from campus to campus, potentially fairly wide variations, and that's not fundamentally a problem, as long as the targets are well-documented and publicized, so that experimenters can set their expectations accurately.58 * We'd like to work with campuses to set and measure up-time targets for the aggregates they support. We understand that there will be variations from campus to campus, potentially fairly wide variations, and that's not fundamentally a problem, as long as the targets are well-documented and publicized, so that experimenters can set their expectations accurately. 59 59 60 60 * Operators should continue to give feedback and input to software developers on features and priorites. … … 64 64 == 4.2. Software == 65 65 66 Is meso scale GENI software ready to use in a production-quality environment? We concluded that it is, but again with some caveats.67 68 Much of the software that we rely upon is still new, and at an early stage of its life cycle. This is unavoidable to some extent, and improves with time. Similarly, GENI may be the first large-scale deployment for some software, whether for the software as a whole, or for new versions as they come out. It may be difficult for a developer to test their software adequately in their development environment, which is unlikely to be as large, geographically diverse, and otherwise heterogeneous, as the whole of GENI.66 Is meso-scale GENI software ready to use in a production-quality environment? We concluded that it is, but again with some caveats. 67 68 Much of the software we rely on is still new, and at an early stage of its life cycle. This is unavoidable to some extent, and improves with time. Similarly, GENI may be the first large-scale deployment for some software, whether for the software as a whole, or for new versions as they come out. It may be difficult for a developer to test their software adequately in their development environment, which is unlikely to be as large, geographically diverse, and otherwise heterogeneous, as the whole of GENI. 69 69 70 70 Some ideas for improvement: … … 76 76 * Software developers could use GENI slices and resources to test their software in ways that aren't feasible within their development environments, such as with hardware from multiple vendors, over long-distance network links, etc -- using GENI resources before the software is fully deployed to the entirety of GENI. 77 77 78 * More professional software engineers are getting involved with GENI, taking on software development tasks that had previously been handled by researchers.78 * More professional software engineers are getting involved with GENI, taking on meso-scale infrastructure software development tasks that had previously been handled by researchers and students, which is helping to make the integrated software more robust. 79 79 80 80 == 4.3. Isolation == 81 81 82 Are meso scale GENI experiments isolated from each other? We concluded that they are only somewhat.83 84 Many resources in GENI are still shared between mul itiple experiments, with only soft (e.g. advisory) partitions between them. We found many examples of this:82 Are meso-scale GENI experiments isolated from each other? We concluded that they are only somewhat. 83 84 Many resources in GENI are still shared between multiple experiments, with only soft (e.g. advisory) partitions between them. We found many examples of this: 85 85 86 86 * MyPLC plnodes are virtual machines on shared hardware, so an experiment that goes haywire on one VM can starve other VMs on that hardware for resources. … … 88 88 * The FlowVisor flowspace is partitioned between experiments, but the flowspace as a whole is shared, so issues that affect the entire flowspace (such as the number of rules, which can reach into the thousands or tens of thousands) can make it difficult or impossible for any experiment to manipulate the flowspace. 89 89 90 * Topology problems with one experiment can affect other experiments, even experiments which have a every simple top logy themselves, e.g. by creating a broadcast storm, leaking traffic across VLANs unexpectedly, etc.90 * Topology problems with one experiment can affect other experiments, even experiments which have a every simple topology themselves, e.g. by creating a broadcast storm, leaking traffic across VLANs unexpectedly, etc. 91 91 92 92 * All of the bandwidth in GENI is shared, with no easy ways to set either hard limits on maximum use, or to request a minimum requirement. 93 93 94 This is already an active area of work within GENI, but we had some specific ideas for improving the situation:94 Improving isolation is already an active area of work within GENI, but we had some specific ideas for improvements: 95 95 96 96 * We'll encourage more information-sharing between experimenters, to help them prevent their experiments from going haywire in the first place. … … 98 98 * When that isn't enough, the GPO and GMOC should work to develop better procedures to handle communication between operators, and with experimenters, when there are issues related to isolation. 99 99 100 * Additional hard limits on resource usage, like QoS in the OpenFlow protocol and in backbone hardware, are the best way to truly ensure that experiments are isolated from each other.100 * Additional hard limits on resource usage, like QoS in the OpenFlow protocol and in backbone hardware, which are planned for later releases, will help keep experiments more isolated from each other. 101 101 102 102 == 4.4. Ease of use == 103 103 104 Is meso scale GENI easy for experimenters to use? We found that it is in some ways, but not in others.105 106 Getting started , and doing very simple things, is in fact fairly easy: There are few barriers to entry, and they're generally low. However, experimenter tools to enable more sophisticated experiments are only just now interoperating with GENI -- although this is improving rapidly, and in fact improved over the course of the project. There are also specific usability concerns with OpenFlow, where the Expedient Opt-In Manager requires manual approval from an operator at every OpenFlow aggregate, whenever any OpenFlow resource is allocated; this can be daunting to new experimenters, and even for experienced folks, the need for manual approval can cause significant delays.107 108 This is another area where work is already active within GENI (most of the Experimenter track at GEC 11 focused on tools), but we again have some ideas for additional improvements:104 Is meso-scale GENI easy for experimenters to use? We found that it is in some ways, but not in others. 105 106 Getting started and doing very simple things is in fact fairly easy: There are few barriers to entry, and they're generally low. However, experimenter tools to enable more sophisticated experiments are only just now interoperating with GENI -- although this is improving rapidly, and in fact improved over the course of the project. There are also specific usability concerns with OpenFlow, where the Expedient Opt-In Manager requires manual approval from an operator at every OpenFlow aggregate, whenever any OpenFlow resource is allocated. This can be daunting to new experimenters, and can cause significant delays for anyone. 107 108 Usability is another area where work is already active in GENI (most of the Experimenter track at GEC 11 focused on tools), but we again have some ideas for additional improvements: 109 109 110 110 * We'd like to encourage experimenters to try out newly-integrated tools, and actively request bug fixes and features. Experimenter demand is part of a very positive feedback loop, encouraging developers to create better tools, which in turn makes GENI more usable to more experimenters. … … 116 116 = 5. Backbone resources = 117 117 118 The Plastic eSlices project used the GENI network core in Internet2 and NLR, which includes two OpenFlow-controlled VLANs (3715 and 3716) on a total of ten switches (five in each of I2 and NLR). The OpenFlow network in each provider was managed by an Expedient OpenFlow aggregate manager.118 The Plastic Slices project used the GENI network core in Internet2 and NLR, which includes two OpenFlow-controlled VLANs (3715 and 3716) on a total of ten switches (five in each of I2 and NLR). The OpenFlow network in each provider was managed by an Expedient OpenFlow aggregate manager. 119 119 120 120 http://groups.geni.net/geni/wiki/NetworkCore has more information about the GENI network core and how to use it. … … 122 122 = 6. Campus resources = 123 123 124 Each campus had a private OpenFlow-controlled VLAN (typically VLAN 1750), which was cross-connected to the two backbone VLANs (which were typically not OpenFlow-controlled on campuses), and managed by an Expedient OpenFlow aggregate manager. Each campus also had a MyPLC aggregate, with two plnodes (or in some cases more), each of which included a dataplane interface sconnected to VLAN 1750 (and others).124 Each campus had a private OpenFlow-controlled VLAN (typically VLAN 1750), which was cross-connected to the two backbone VLANs (which were typically not OpenFlow-controlled on campuses), and managed by an Expedient OpenFlow aggregate manager. Each campus also had a MyPLC aggregate, with two plnodes (or in some cases more), each of which included a dataplane interface connected to VLAN 1750 (and others). 125 125 126 126 This network topology is described in more detail at http://groups.geni.net/geni/wiki/NetworkCore, and http://groups.geni.net/geni/wiki/TangoGENI#ParticipatingAggregates has links to the aggregate info page for each aggregate. … … 130 130 || [[ThumbImage(PlasticSlices/BaselineEvaluation/Baseline5Traffic:baseline5-txbytes.png, thumb=500)]] || [[ThumbImage(PlasticSlices/BaselineEvaluation/Baseline5Traffic:baseline5-rxbytes.png, thumb=500)]] || 131 131 132 (Graphs of bytes sent ( TX) and bytes received (TX)during Baseline 5.)133 134 During the project, all mesoscale campuses were configured to send monitoring data to the GMOC. We found that sites had resources that didn't use NTP, which was essential for correlating data between sites. The GMOC offers an interface called SNAPP for browising the data that they collect, visible at http://gmoc-db.grnoc.iu.edu/api-demo/. In addition, the GMOC offers an API which anyone can use to download the raw data that they collect, analyze it, graph it, etc, which we used to create some local graphs that we found useful. They collect data of interest to both operators and experimenters, at various levels of granularity, and can create some per-slice information, although this relies on naming conventions to tie together slices and slivers.135 136 http://groups.geni.net/geni/wiki/PlasticSlices/MonitoringRecommendations has links a variety of monitoring sites and information.132 (Graphs of bytes sent (first graph TX) and bytes received (second graph RX) in each slice during Baseline 5.) 133 134 During the project, all mesoscale campuses were configured to send monitoring data to the GMOC. Some sites initially configured resources that didn't use NTP, but revised the configurations after starting monitoring, because NTP is essential for correlating data between sites. The GMOC offers an interface called SNAPP for browsing the data that they collect, visible at http://gmoc-db.grnoc.iu.edu/api-demo/. In addition, the GMOC offers an API which anyone can use to download raw collected data from GMOC to analyze it, graph it, etc. The GPO used this API to create useful local monitoring graphs (samples included in this report, and more available through the GENI wiki). The GPO data is of interest to both operators and experimenters, covers various levels of granularity, and presents some per-slice information. The per-slice information relies on naming conventions to tie together slices and slivers in this implementation. 135 136 http://groups.geni.net/geni/wiki/PlasticSlices/MonitoringRecommendations has links to a variety of monitoring sites and information. 137 137 138 138 = 8. Slices =