Changes between Initial Version and Version 1 of LAMP/Summer2011Experiments


Ignore:
Timestamp:
08/19/11 15:44:53 (13 years ago)
Author:
chaos@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • LAMP/Summer2011Experiments

    v1 v1  
     1[[PageOutline]]
     2
     3= GENI Monitoring Slice: Enabling Network Visibility in the GENI OpenFlow Core Network =
     4
     5== Motivation ==
     6
     7'''GENI is a highly distributed infrastructure that constitutes multiple aggregates at geographically diverse locations.  To be useful, GENI slices must be available all the time. Any outages must be detected promptly and be resolved by aggregate operators and or the GMOC.  For this reason, GENI will need a flexible approach to monitor the behavior of the network and also to measure network performance. Furthermore, such an approach must facilitate troubleshooting in such a highly distributed environment. We demonstrate one such flexible approach based on principles developed by the perfSONAR-LAMP project.    This approach is “GENI-centric” as it is deployed within a GENI slice.   
     8'''
     9
     10''NB. This project was experimental and therefore, the setup detailed below is not generally available as yet.'' 
     11
     12== Approach ==
     13
     14 * An operator sets up a measurement slice to monitor infrastructure, e.g., GENI backbone with Open Flow.
     15 * The Measurement slice may be long-term for service quality monitoring, or short- term to assist with troubleshooting.
     16 * Researchers or other operators can use LAMP tools, or others that are compatible with I&M architecture, where data can be registered with the Measurement information (Lookup) service, so that others (when authorized) can discover and retrieve measurement data from a standardized API.
     17
     18
     19
     20== Why Use a Slice? ==
     21
     22 1. Convenience - The slice mechanism allows the seemless allocation of resources
     23 across multiple aggregates.
     24 2. Efficient – A slice can quickly be reserved and removed as necessary.
     25 3. Realism – You can often evaluate the performance of a slice with another “identical” slice.
     26'''Issue:''' ''Robust servers (as opposed to VMs) are required to obtain accurate measurements.''
     27
     28== Why LAMP? ==
     29
     30 * Adapted from perfSONAR, LAMP has suite of good network measurement tools.
     31 * It registers daemons in a global lookup service such that other users can find the data.
     32 * LAMP provides authorization to restrict access to data.
     33 * LAMP  provides easy access to other participating LAMP-nodes through the use of daemons.
     34 * Provided that LAMP nodes register their services with a global lookup service, they can also run performance tests with other perfSONAR nodes.
     35
     36== Network Environments for LAMP ==
     37
     38LAMP can be used in Layer 2 or Layer 3 networks. However, we used LAMP to monitor the
     39OpenFlow L2 backbone core. To date, several perfSONAR nodes exist in Internet2’s (I2)
     40core network. These nodes are publicly available, through the Global Lookup Service, for
     41L3 tests with other such non-I2 perfSONAR nodes with a network path to I2’s network. As
     42such, LAMP should provide mechanisms to test against these perfSONAR nodes (i.e.
     43perfSONAR nodes external to a slice). This LAMP-perfSONAR integration removes the
     44need for GENI sites to acquire dedicated hardware for monitoring segments of the network
     45not directly connected to the OpenFlow backbone.
     46 
     47
     48== Ideal GENI Monitoring Slice ==
     49
     50The ideal GENI monitoring slice will contain resources from the dominant GENI control
     51frameworks namely ProtoGENI, PlanetLab, and OpenFlow. Figure 2-1 provides a
     52conceptual view of such a slice. In particular, it constitutes ProtoGENI nodes from the
     53respective testbeds in Utah and the GPO, a PlanetLab node from the PlanetLab
     54subaggregate at the GPO, and OpenFlow resources from Internet2’s and NLR’s aggregates.
     55
     56
     57The ProtoGENI resources will host the LAMP I&M services and data. Specifically, the LAMP
     58node at Utah will host the LAMP web portal, among other services, while the PlanetLab
     59node at the GPO lab will host the NOX controller for all OpenFlow switches in the core.
     60These switches will facilitate schedule tests between ProtoGENI nodes labeled PC UTAH 1,
     612, 3, and PC GPOLAB. A server at the University of Delaware hosts the Unified Network
     62Information Service (UNIS). UNIS is a combination of the topology and lookup service that
     63stores the topology data. In particular, LAMP services are configured through the LAMP
     64web portal, and the updated topology pushed to UNIS. The perfSONAR-PS pSConfig service
     65running on each node fetches the updated configurations.
     66
     67
     68[[ThumbImage(LAMPIdeal.jpg, thumb=600)]]
     69
     70
     71== Use Cases ==
     72
     73'''Example 1'''
     74
     75An operator of the GENI backbone network sets up long-term backbone measurement slice to monitor service quality, and share data with GMOC via LAMP (perfSONAR) API.
     76
     77'''Example 2'''
     78Experimenters complain to GMOC about intermittent backbone connectivity outages between Sites X, Y and Z.  GMOC sets up a short-term reference measurement slice involving Sites X, Y and Z, and shares data with Experimenters. 
     79
     80'''Example 3'''
     81A GENI experimenter deploys a measurement slice or deploys LAMP I&M tools within a slice for monitoring the health of the network throughout the duration of an experiment.   
     82
     83== Initial Deployment of the Monitoring Slice ==
     84'''The figure below depicts the current resources existing in the monitoring slice. In particular, it contains 3 ProtoGENI nodes, 4 OpenFlow switches, 2 ProtoGENI switches and 1 PlanetLab node.  The LAMP node hosts the LAMP web portal, stores statistics in local round-robin databases, and exports the metrics collected by the host monitoring collector using the SNMP measurement archive (MA) interface. The 2 ProtoGENI hosts facilitate scheduled performance and measurement tests, and the PlanetLab node at the GPO lab hosts the NOX controller for all Internet2 switches'''.
     85
     86[[ThumbImage(LAMPActual.jpg, thumb=600)]]
     87
     88
     89== Resources Allocated to the Slice ==
     90
     91The initial deployment encountered challenges from all fronts which resulted in the use of
     92only a subset of the resources shown in the figure from section "'''Ideal GENI Monitoring Slice'''". These challenges will be presented in the "'''Issues'''" section. The table below depicts the current resources existing in the monitoring slice. In particular, it contains three ProtoGENI nodes, four OpenFlow switches, two ProtoGENI switches and one
     93PlanetLab node. The LAMP node hosts the LAMP web portal, stores statistics in local
     94round-robin databases, and exports the metrics collected by the host monitoring collector
     95using the SNMP measurement archive (MA) interface. The two ProtoGENI hosts facilitate
     96scheduled performance and measurement tests, and the PlanetLab node at the GPO lab
     97hosts the NOX controller for all Internet2 switches.
     98
     99'''The table below lists the LAMP services enabled on each respective ProtoGENI node.  Services such as perfSONAR-BOUY Regular testing is enabled on both PC UTAH 1 and 2. This is necessary to conduct throughput tests between nodes.  This is also the case for the ping and delay tests.'''
     100
     101||'''Services'''||'''Nodes'''||
     102|| [[Color(lightgrey, Bandwitdh Test Controller (BWCTL))]]                        || [[Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)]] ||
     103||                    Host Monitoring Collector (Ganglia Meta Daemon)             ||                   LAMP host     : PC UTAH 1 : PC UTAH 2    ||
     104|| [[Color(lightgrey, Host Monitoring Daemon(Ganglia Monitoring Daemon))]]        || [[Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)]] ||
     105||                    Ganglia Measurement Archive                                 ||                   LAMP host     : ------n/a------ : -----n/a-----    ||
     106|| [[Color(lightgrey, LAMP I&M System Web Portal))]]                              || [[Color(lightgrey,LAMP host     : ------n/a------ : -----n/a----- )]] ||
     107||                    NTP server                                                  ||                   LAMP host     : PC UTAH 1 : PC UTAH 2    ||
     108|| [[Color(lightgrey, One-Way Ping (OWAMP))]]                        || [[Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)]] ||
     109||                    perfSONAR-BOUY Regular testing (Throughput)                 ||                   -----n/a-----    : PC UTAH 1 : PC UTAH 2    ||
     110|| [[Color(lightgrey, perfSONAR-BOUY Measurement Archive))]]                        || [[Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)]] ||
     111||                    perfSONAR-BOUY Regular testing (one-way-latency)                 ||                   -----n/a-----    : PC UTAH 1 : PC UTAH 2    ||
     112|| [[Color(lightgrey, PingER Measurement Archive and Regular Tester))]]                        || [[Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)]] ||
     113
     114
     115
     116
     117
     118
     119== Results ==
     120
     121The initial results can be accessed [http://groups.geni.net/syseng/wiki/AliSandbox/Results here]
     122
     123== Issues ==
     124
     125'''Stitching in GENI'''
     126
     127To date, GENI has yet to realize an efficient stitching mechanism between OpenFlow and ProtoGENI resources.  First of all, the ProtoGENI software suite is unable to dynamically allocate VLANs through the ProtoGENI and OpenFlow switches through the respective cross connects.  For this reason, we experienced a minor broadcast storm at the initial stages of obtaining network connectivity in the OpenFlow core network. Ultimately, a combined, one-week, “manual” effort of operators at Utah and Internet2 was necessary to provide network connectivity between PC UTAH 1 and PC UTAH 2.   The execution of this process was somewhat satisfactory since few GENI users require such complex configuration of resources.  However, as GENI “ramps-up” the scale of experiments, it will be crucial for ProtoGENI and OpenFlow operators to converge on a dynamic stitching mechanism.     
     128
     129'''Fragility of LAMP'''
     130 * This project has challenged the robustness of the LAMP project and stirred remarkable interest in the LAMP development community to revisit and perhaps re-engineer mechanisms within the LAMP software suite. Prior to this project, and over the past year since its deployment, LAMP was not used significantly by experimenters.  For this reason, several “bugs” were encountered and resolved.  However, a few issues still exist. In particular, LAMP does not support version 2 rspec, which GENI intends to support by GEC12.  For this reason, the network topology, from our initial experiments with rspec version 2, was not registered at the Unified Network Information Service (UNIS) controller.
     131 * At the onset of this project, Flack was used to generate the rspec for the ProtoGENI-LAMP resources.  During this time, flack only supported version 2 rspecs(Currently, Flack is able to create version 0.1 and 0.2 rspec). Among other issues, this forced manual creation of the resource manifest submitted to UNIS.  One other issue relates to throughput tests.  For example, the throughput results for a test scheduled almost two weeks ago is unavailable through the web portal.   
     132 * LAMP, like perfSONAR, is dependent on good servers and not VMs in the measurement slice for accurate networking measurements.  For this reason, several services such as the throughput daemons, are dependent on the synchronization of NTP prior to initialization. Obviously, there exists a work around such that this NTP dependency is disabled.  However, the results obtained are not guaranteed to be accurate.
     133 * Finally, there exists a bug which surfaces  when adding the LAMP Ubuntu 9.1 image to Emulab sites. This is one of the main reasons why we were unable to deploy LAMP on a ProtoGENI node at the GPO lab.  The LAMP developers are currently addressing these issues in addition to creating a new LAMP image which supports Ubuntu 10.04.
     134
     135'''Future Work'''
     136 1. Followup with the LAMP developers with regards to availability of the Ubuntu 10.04 image at Utah
     137 2. Deploy LAMP on Utah's ProtoGENI testbed, schedule all tests, and ensure the results are immediately available through the LAMP web-portal
     138 3. Test the mechanism to allow other users access to the web portal
     139 4. Develop procedures to send data to the GMOC (From conversations with Chaos, this may be a matter of installing a script, provided by the GMOC folks, on the node that hosts the LAMP web portal. This script fetches data from the RRD databases 'on the LAMP node' and sends these files to the GMOC)
     140 5. Repeat a similar experiment (i.e. with the same subset of resources through the OpenFlow backbone core) with the 10.04 image and again, ensure all network test results are immediately available through the LAMP web-portal
     141 6. Ensure the LAMP image is available at the GPO's testbed
     142 7. Create the ideal slice as illustrated in the diagram below section "''Ideal Realization of a GENI Monitoring Slice''".
     143 8. Ensure that the Ubuntu 10.04 LAMP image can be installed on the Wide Area ProtoGENI Nodes
     144 9. Coordinate with the various parties involved in the "ideal slice" to establish efficient stitching mechanisms 
     145 10. Ensure that LAMP are able to register services with the Global Lookup Service (gLS) to conduct tests between perfSONAR and LAMP nodes (i.e. LAMP-perfSONAR federation)
     146
     147
     148'''References'''
     149
     150 * [http://groups.geni.net/syseng/wiki/AliSandbox/Create%20a%20ProtoGENI%20Slice Create a ProtoGENI Slice]
     151 * [http://groups.geni.net/syseng/wiki/AliSandbox/Create%20an%20OpenFlow%20Slice Create an OpenFlow Slice]
     152 * [http://groups.geni.net/syseng/wiki/AliSandbox/Access%20the%20LAMP%20Web%20Portal Access the LAMP Web Portal]
     153
     154 * [http://groups.geni.net/syseng/attachment/wiki/AliSandbox/081911%20%20GENI%20Monitoring%20Slice.docx  GENI Monitoring Slice  (document)]