wiki:GEC11MonitoringMiniWorkshop

Version 10 (modified by sedwards@bbn.com, 10 years ago) (diff)

--

Monitoring Mini-Workshop

Organizer

Sarah Edwards, GENI Project Office

Time

Wed 1:00 - 2:30 pm

Description

Aggregrates, campuses, GENI meta-operations, and experimenters have an interest in sharing data, analysis, and techniques to support operations and experimentation at their location and also at entities they depend on. GENI monitoring includes the collection and analysis of data which supports this interoperability. GENI monitoring overlaps with Instrumentation & Measurement, but I&M will be discussed in a separate session.

Campuses, aggregates, and experimenters all have an interest in GENI monitoring capabilities as GENI participation increases over the coming months.

Participants in this workshop will discuss:

  • What resources do campuses and experimenters need monitored?
  • What resources are currently being monitored?
  • What monitoring data would be easy and useful to share?

The discussion will be very practical, covering monitoring of particular resources such as OpenFlow, MyPLC, ProtoGENI nodes, and will attempt to apply lessons learned from the Plastic Slices project.

The outcome of the workshop will be a list of monitoring topics that are important to address in the next few months.

Agenda

All slides

Meeting notes

  • Introduction - Sarah Edwards, BBN/GPO [10 min]
    • Define scope of GENI monitoring discussion
  • Point of View Talks
    • Lessons learned from monitoring Plastic Slices - Chaos Golubitsky, BBN/GPO [10 min]
    • Sharing Data via GMOC - Camilo Viecco, Indiana University/GMOC [10 min]
    • OpenFlow Monitoring - Nick Bastin, Stanford/OpenFlow [10 min]
    • Campus and Experimenter Needs and Resources - Sarah Edwards, BBN/GPO [10 min]
      • What resources do campuses and experimenters need monitored?
      • What resources are currently being monitored?
      • What monitoring data would be easy and useful to share?
  • Open Discussion - All [30 min]
    • Possible topics:
      • What tools are missing?
      • What agreements do we need? APIs? data formats? etc.?
      • Are there concerns about sharing data?
  • Wrap Up/Conclusions [10 min]
    • Consensus on topics to discuss/resolve in coming months

Meeting Summary

The first half of the monitoring mini-workshop included presentations on the current status of what tools are used and what data is collected as part of monitoring today. The second half of the workshop was a discussion about issues of interest to the community including: what OpenFlow stats are available and the difficulty of current OpenFlow Opt-In, the importance of privacy and incorporating privacy into the design now, and concerns about how campuses could share data without necessarily giving others admin access to their resources. This community will be working to address these issues in the coming months.

Meeting Details

Sarah Edwards motivated the workshop:

  • Aggregates are trying to operate as if they are production, so GENI needs monitoring tools now.
  • Operators should share information about what tools each has and what tools are needed, and use those capabilities and requirements to set priorities.
  • Sites which have signed aggregate provider agreements need to meet operations requirements. Shared monitoring information may make their jobs easier. Sharing data on resources that multiple campuses use prevents campuses from needing to reinvent the wheel. Sharing data also makes joint debugging of shared infrastructure more feasible.

Camilo Viecco of GMOC described the GMOC API for pushing data to GMOC.

  • GMOC has three functions: to provide a unified view of GENI, to be an initial point of contact for operations, and to provide monitoring visualization and monitoring APIs. They try to be "the place for unified data".
  • GMOC's measurement tools are "production-ready", but notifications need more work before they can be deployed in production. GMOC's measurement tools focus on long-term trend analysis and include: SNAPP (SNMP collection), ganglia, and Measurement Manager (for OpenFlow).
  • To submit data to GMOC, either use their API and submit data, or provide SNMP access to GMOC so they can monitor devices directly.

Nick Bastin of Stanford University advocated the use of SNMP for OpenFlow monitoring.

  • However, OpenFlow uses custom monitoring heavily at present, because there aren't MIBs for many things which need them.
  • Dataplane monitoring should probably use SNMP, and they hope to add SNMP support to the FlowVisor soon (although there are limitations to this approach). However, SNMP should be used and encouraged by researchers, network operators, and protocol designers whenever possible.

Chaos Golubitsky described lessons learned from monitoring the Plastic Slices project.

  • Eight campuses and two backbones submitted data to GMOC’s staging database via the API.
  • GPO was able to download data again with about 96% success rate over the course of several months.
  • Lessons learned include:
    • Clock synchronization is important for time-series data with self-reported timestamps.
    • Collecting, downloading, and processing data is a high-performance application (needs sufficient hardware)
    • There’s never enough data to debug all the problems

Sarah Edwards characterized monitoring requests and current practices based on conversations with campuses, experimenter representatives, ProtoGENI, and PlanetLab.

  • A variety of requests from operators were raised at GEC10 BOF.
    • Suggestions included a "slice top" utility to identify slices which are using the most resources.
    • Operators were also keen to get data which could prove that a problem was *not* originating at their site.
  • She interviewed operators Russ Clark at Georgia Tech, Chaos Golubitsky at GPO, Chris Small and John Meylor at Indiana University.
    • Russ Clark mentioned that, while live or recent statistics from remote switches would be helpful, remote SNMP access might be a non-starter at his site. Motivated sites could publish and link to local data sources as one way around this.
    • Indiana mentioned they would like visualizations to contain per-campus or per-aggregate views of collected data.
  • Sarah interviewed Mark Berman and Niky Riga at GPO to get an experimenter perspective.
    • They want information about what resources exist, and which are currently available, for GENI use.
    • They also want standardized information across sites, and, as a troubleshooting aid, they want a per-slice view into data.
    • In addition, more OpenFlow topology data would be helpful for debugging and analysis.
  • Rob Ricci of ProtoGENI and Tony Mack of PlanetLab gave some input on monitoring tools provided by their aggregates.
    • ProtoGENI does not provide a central monitoring solution, counting on slice owners to select their own monitoring options for their experiments. However, the graphical experimenter tool Flack has recently been integrated with Instools, a measurement and collection service, allowing some easy measurement access for experimenters.
    • PlanetLab provides three monitoring interfaces: CoMon reports per-node (and per-sliver-per-node) statistics. PlanetFlow reports on all traffic flows into and out of a node, and is useful for diagnosing abuse problems. The MyOps tool reports on node availability as seen by PLC, and is targeted at site operators.

The discussion that followed allowed the community to share best practices and identified some areas of common concern.

  • Actively measuring OpenFlow: Srini Seetharaman and Masa Kobayashi of Stanford discussed their active testing tool they have deployed for measuring OpenFlow performance internals.
  • FlowVisor Statistics: There was extensive discussion about what information FlowVisor currently provides for monitoring. See notes for list.
  • Privacy: Many people were interested in the question of how privacy and monitoring will interact in GENI. It was generally agreed that privacy is an issue that should be addressed sooner rather than later.
  • Sharing switch data: The question of how (non-OpenFlow) switch data could be shared was raised. A robust SNMP proxy might address this need.
  • OpenFlow Opt-In: A lot of operators have agreed that OpenFlow experimenter opt-in decisions are very difficult, so having monitoring tools which would make it easier to determine whether an opt-in was safe would be useful.
  • Notifications: There was a request for notifications regarding which GENI things are not working.

Related activities at GEC

Background Reading

Items marked FYI are additional reading for interested parties.

Many of the pages below are links to monitoring data and graphs so that we all know what kinds of data is already being published.

Attachments (2)

Download all attachments as: .zip