wiki:LAMP/Summer2011Experiments

Version 2 (modified by chaos@bbn.com, 10 years ago) (diff)

--

GENI Monitoring Slice: Enabling Network Visibility in the GENI OpenFlow Core Network

Motivation

GENI is a highly distributed infrastructure that constitutes multiple aggregates at geographically diverse locations. To be useful, GENI slices must be available all the time. Any outages must be detected promptly and be resolved by aggregate operators and or the GMOC. For this reason, GENI will need a flexible approach to monitor the behavior of the network and also to measure network performance. Furthermore, such an approach must facilitate troubleshooting in such a highly distributed environment. We demonstrate one such flexible approach based on principles developed by the perfSONAR-LAMP project. This approach is “GENI-centric” as it is deployed within a GENI slice.

NB. This project was experimental and therefore, the setup detailed below is not generally available as yet.

Approach

  • An operator sets up a measurement slice to monitor infrastructure, e.g., GENI backbone with Open Flow.
  • The Measurement slice may be long-term for service quality monitoring, or short- term to assist with troubleshooting.
  • Researchers or other operators can use LAMP tools, or others that are compatible with I&M architecture, where data can be registered with the Measurement information (Lookup) service, so that others (when authorized) can discover and retrieve measurement data from a standardized API.

Why Use a Slice?

  1. Convenience - The slice mechanism allows the seemless allocation of resources across multiple aggregates.
  2. Efficient – A slice can quickly be reserved and removed as necessary.
  3. Realism – You can often evaluate the performance of a slice with another “identical” slice.

Issue: Robust servers (as opposed to VMs) are required to obtain accurate measurements.

Why LAMP?

  • Adapted from perfSONAR, LAMP has suite of good network measurement tools.
  • It registers daemons in a global lookup service such that other users can find the data.
  • LAMP provides authorization to restrict access to data.
  • LAMP provides easy access to other participating LAMP-nodes through the use of daemons.
  • Provided that LAMP nodes register their services with a global lookup service, they can also run performance tests with other perfSONAR nodes.

Network Environments for LAMP

LAMP can be used in Layer 2 or Layer 3 networks. However, we used LAMP to monitor the OpenFlow L2 backbone core. To date, several perfSONAR nodes exist in Internet2’s (I2) core network. These nodes are publicly available, through the Global Lookup Service, for L3 tests with other such non-I2 perfSONAR nodes with a network path to I2’s network. As such, LAMP should provide mechanisms to test against these perfSONAR nodes (i.e. perfSONAR nodes external to a slice). This LAMP-perfSONAR integration removes the need for GENI sites to acquire dedicated hardware for monitoring segments of the network not directly connected to the OpenFlow backbone.

Ideal GENI Monitoring Slice

The ideal GENI monitoring slice will contain resources from the dominant GENI control frameworks namely ProtoGENI, PlanetLab, and OpenFlow. Figure 2-1 provides a conceptual view of such a slice. In particular, it constitutes ProtoGENI nodes from the respective testbeds in Utah and the GPO, a PlanetLab node from the PlanetLab subaggregate at the GPO, and OpenFlow resources from Internet2’s and NLR’s aggregates.

The ProtoGENI resources will host the LAMP I&M services and data. Specifically, the LAMP node at Utah will host the LAMP web portal, among other services, while the PlanetLab node at the GPO lab will host the NOX controller for all OpenFlow switches in the core. These switches will facilitate schedule tests between ProtoGENI nodes labeled PC UTAH 1, 2, 3, and PC GPOLAB. A server at the University of Delaware hosts the Unified Network Information Service (UNIS). UNIS is a combination of the topology and lookup service that stores the topology data. In particular, LAMP services are configured through the LAMP web portal, and the updated topology pushed to UNIS. The perfSONAR-PS pSConfig service running on each node fetches the updated configurations.

ThumbImage(LAMPIdeal.jpg, thumb=600)?

Use Cases

Example 1

An operator of the GENI backbone network sets up long-term backbone measurement slice to monitor service quality, and share data with GMOC via LAMP (perfSONAR) API.

Example 2 Experimenters complain to GMOC about intermittent backbone connectivity outages between Sites X, Y and Z.  GMOC sets up a short-term reference measurement slice involving Sites X, Y and Z, and shares data with Experimenters. 

Example 3 A GENI experimenter deploys a measurement slice or deploys LAMP I&M tools within a slice for monitoring the health of the network throughout the duration of an experiment.

Initial Deployment of the Monitoring Slice

The figure below depicts the current resources existing in the monitoring slice. In particular, it contains 3 ProtoGENI nodes, 4 OpenFlow switches, 2 ProtoGENI switches and 1 PlanetLab node. The LAMP node hosts the LAMP web portal, stores statistics in local round-robin databases, and exports the metrics collected by the host monitoring collector using the SNMP measurement archive (MA) interface. The 2 ProtoGENI hosts facilitate scheduled performance and measurement tests, and the PlanetLab node at the GPO lab hosts the NOX controller for all Internet2 switches.

ThumbImage(LAMPActual.jpg, thumb=600)?

Resources Allocated to the Slice

The initial deployment encountered challenges from all fronts which resulted in the use of only a subset of the resources shown in the figure from section "Ideal GENI Monitoring Slice". These challenges will be presented in the "Issues" section. The table below depicts the current resources existing in the monitoring slice. In particular, it contains three ProtoGENI nodes, four OpenFlow switches, two ProtoGENI switches and one PlanetLab node. The LAMP node hosts the LAMP web portal, stores statistics in local round-robin databases, and exports the metrics collected by the host monitoring collector using the SNMP measurement archive (MA) interface. The two ProtoGENI hosts facilitate scheduled performance and measurement tests, and the PlanetLab node at the GPO lab hosts the NOX controller for all Internet2 switches.

The table below lists the LAMP services enabled on each respective ProtoGENI node. Services such as perfSONAR-BOUY Regular testing is enabled on both PC UTAH 1 and 2. This is necessary to conduct throughput tests between nodes. This is also the case for the ping and delay tests.

ServicesNodes
Color(lightgrey, Bandwidth Test Controller (BWCTL))? Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)?
Host Monitoring Collector (Ganglia Meta Daemon) LAMP host : PC UTAH 1 : PC UTAH 2
Color(lightgrey, Host Monitoring Daemon(Ganglia Monitoring Daemon))? Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)?
Ganglia Measurement Archive LAMP host : ------n/a------ : -----n/a-----
Color(lightgrey, LAMP I&M System Web Portal))? Color(lightgrey,LAMP host : ------n/a------ : -----n/a----- )?
NTP server LAMP host : PC UTAH 1 : PC UTAH 2
Color(lightgrey, One-Way Ping (OWAMP))? Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)?
perfSONAR-BOUY Regular testing (Throughput) -----n/a----- : PC UTAH 1 : PC UTAH 2
Color(lightgrey, perfSONAR-BOUY Measurement Archive))? Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)?
perfSONAR-BOUY Regular testing (one-way-latency) -----n/a----- : PC UTAH 1 : PC UTAH 2
Color(lightgrey, PingER Measurement Archive and Regular Tester))? Color(lightgrey,-----n/a----- : PC UTAH 1 : PC UTAH 2)?

Results

The initial results can be accessed here

Issues

Stitching in GENI

To date, GENI has yet to realize an efficient stitching mechanism between OpenFlow and ProtoGENI resources. First of all, the ProtoGENI software suite is unable to dynamically allocate VLANs through the ProtoGENI and OpenFlow switches through the respective cross connects. For this reason, we experienced a minor broadcast storm at the initial stages of obtaining network connectivity in the OpenFlow core network. Ultimately, a combined, one-week, “manual” effort of operators at Utah and Internet2 was necessary to provide network connectivity between PC UTAH 1 and PC UTAH 2. The execution of this process was somewhat satisfactory since few GENI users require such complex configuration of resources. However, as GENI “ramps-up” the scale of experiments, it will be crucial for ProtoGENI and OpenFlow operators to converge on a dynamic stitching mechanism.

Fragility of LAMP

  • This project has challenged the robustness of the LAMP project and stirred remarkable interest in the LAMP development community to revisit and perhaps re-engineer mechanisms within the LAMP software suite. Prior to this project, and over the past year since its deployment, LAMP was not used significantly by experimenters. For this reason, several “bugs” were encountered and resolved. However, a few issues still exist. In particular, LAMP does not support version 2 rspec, which GENI intends to support by GEC12. For this reason, the network topology, from our initial experiments with rspec version 2, was not registered at the Unified Network Information Service (UNIS) controller.
  • At the onset of this project, Flack was used to generate the rspec for the ProtoGENI-LAMP resources. During this time, flack only supported version 2 rspecs(Currently, Flack is able to create version 0.1 and 0.2 rspec). Among other issues, this forced manual creation of the resource manifest submitted to UNIS. One other issue relates to throughput tests. For example, the throughput results for a test scheduled almost two weeks ago is unavailable through the web portal.
  • LAMP, like perfSONAR, is dependent on good servers and not VMs in the measurement slice for accurate networking measurements. For this reason, several services such as the throughput daemons, are dependent on the synchronization of NTP prior to initialization. Obviously, there exists a work around such that this NTP dependency is disabled. However, the results obtained are not guaranteed to be accurate.
  • Finally, there exists a bug which surfaces when adding the LAMP Ubuntu 9.1 image to Emulab sites. This is one of the main reasons why we were unable to deploy LAMP on a ProtoGENI node at the GPO lab. The LAMP developers are currently addressing these issues in addition to creating a new LAMP image which supports Ubuntu 10.04.

Future Work

  1. Followup with the LAMP developers with regards to availability of the Ubuntu 10.04 image at Utah
  2. Deploy LAMP on Utah's ProtoGENI testbed, schedule all tests, and ensure the results are immediately available through the LAMP web-portal
  3. Test the mechanism to allow other users access to the web portal
  4. Develop procedures to send data to the GMOC (From conversations with Chaos, this may be a matter of installing a script, provided by the GMOC folks, on the node that hosts the LAMP web portal. This script fetches data from the RRD databases 'on the LAMP node' and sends these files to the GMOC)
  5. Repeat a similar experiment (i.e. with the same subset of resources through the OpenFlow backbone core) with the 10.04 image and again, ensure all network test results are immediately available through the LAMP web-portal
  6. Ensure the LAMP image is available at the GPO's testbed
  7. Create the ideal slice as illustrated in the diagram below section "Ideal Realization of a GENI Monitoring Slice".
  8. Ensure that the Ubuntu 10.04 LAMP image can be installed on the Wide Area ProtoGENI Nodes
  9. Coordinate with the various parties involved in the "ideal slice" to establish efficient stitching mechanisms
  10. Ensure that LAMP are able to register services with the Global Lookup Service (gLS) to conduct tests between perfSONAR and LAMP nodes (i.e. LAMP-perfSONAR federation)

References

Attachments (3)

Download all attachments as: .zip