Changes between Version 9 and Version 10 of GEC11MonitoringMiniWorkshop

08/31/11 15:59:17 (13 years ago)



  • GEC11MonitoringMiniWorkshop

    v9 v10  
    3131        * Lessons learned from monitoring Plastic Slices - Chaos Golubitsky, BBN/GPO [10 min]
    3232        * Sharing Data via GMOC - Camilo Viecco, Indiana University/GMOC [10 min]
    33         * !OpenFlow Monitoring - Nick Bastin, Stanford/OpenFlow [10 min]
     33        * !OpenFlow Monitoring - Nick Bastin, Stanford/!OpenFlow [10 min]
    3434        * Campus and Experimenter Needs and Resources - Sarah Edwards, BBN/GPO [10 min]         
    3535         * What resources do campuses and experimenters need monitored?
    4343    * Wrap Up/Conclusions [10 min]
    4444        * Consensus on topics to discuss/resolve in coming months         
     47==== Meeting Summary ====
     48 The first half of the monitoring mini-workshop included presentations on the current status of what tools are used and what data is collected as part of monitoring today.  The second half of the workshop was a discussion about issues of interest to the community including: what !OpenFlow stats are available and the difficulty of current !OpenFlow Opt-In, the importance of privacy and incorporating privacy into the design now, and concerns about how campuses could share data without necessarily giving others admin access to their resources.  This community will be working to address these issues in the coming months.
     51==== Meeting Details ====
     53'''Sarah Edwards''' motivated the workshop:
     54 * Aggregates are trying to operate as if they are production, so
     55   GENI needs monitoring tools now.
     56 * Operators should share information about what tools each has and what
     57   tools are needed, and use those capabilities and requirements to
     58   set priorities.
     59 * Sites which have signed aggregate provider agreements need to
     60   meet operations requirements.  Shared monitoring information may
     61   make their jobs easier.  Sharing data on resources that multiple
     62   campuses use prevents campuses from needing to reinvent the
     63   wheel.  Sharing data also makes joint debugging of shared
     64   infrastructure more feasible.
     66'''Camilo Viecco''' of GMOC described the GMOC API for pushing data to GMOC.
     68 * GMOC has three functions: to provide a unified view of GENI, to be an initial point of contact for operations, and to provide monitoring visualization and monitoring APIs.  They try to be "the place for unified data". 
     69 * GMOC's measurement tools are "production-ready", but notifications need more work before they can be deployed in production. GMOC's measurement tools focus on long-term trend analysis and include: SNAPP (SNMP collection), ganglia, and Measurement Manager (for !OpenFlow).
     70 * To submit data to GMOC, either use their API and submit data, or provide SNMP access to GMOC so they can monitor devices directly.
     72'''Nick Bastin''' of Stanford University advocated the use of SNMP for !OpenFlow monitoring. 
     73 * However, !OpenFlow uses custom monitoring heavily at present, because there aren't MIBs for many things which need them.
     74 * Dataplane monitoring should probably use SNMP, and they hope to add SNMP support to the !FlowVisor soon (although there are limitations to this approach). However, SNMP should be used and encouraged by researchers, network operators, and protocol designers whenever possible.
     76'''Chaos Golubitsky''' described lessons learned from monitoring the Plastic Slices project.
     77 * Eight campuses and two backbones submitted data to GMOC’s staging database via the API.
     78 * GPO was able to download data again with about 96% success rate over the course of several months.
     79 * Lessons learned include:
     80  * Clock synchronization is important for time-series data with self-reported timestamps.
     81  * Collecting, downloading, and processing data is a high-performance application (needs sufficient hardware)
     82  * There’s never enough data to debug all the problems
     85'''Sarah Edwards''' characterized monitoring requests and current practices based on conversations with campuses, experimenter representatives, ProtoGENI, and !PlanetLab.
     86 * A variety of requests from operators were raised at GEC10 BOF.
     87  * Suggestions included a "slice top" utility to identify slices which are using the most resources.
     88  * Operators were also keen to get data which could prove that a problem was *not* originating at their site.
     89 * She interviewed operators Russ Clark at Georgia Tech, Chaos Golubitsky at GPO, Chris Small and John Meylor at Indiana University. 
     90  * Russ Clark mentioned that, while live or recent statistics from remote switches would be helpful, remote SNMP access might be a non-starter at his site.  Motivated sites could publish and link to local data sources as one way around this. 
     91  * Indiana mentioned they would like visualizations to contain per-campus or per-aggregate views of collected data.
     92 * Sarah interviewed Mark Berman and Niky Riga at GPO to get an experimenter perspective. 
     93  * They want information about what resources exist, and which are currently available, for GENI use.
     94  * They also want standardized information across sites, and, as a troubleshooting aid, they want a per-slice view into data. 
     95  * In addition, more !OpenFlow topology data would be helpful for debugging and analysis.
     96 * Rob Ricci of ProtoGENI and Tony Mack of !PlanetLab gave some input on monitoring tools provided by their aggregates. 
     97  * ProtoGENI does not provide a central monitoring solution, counting on slice owners to select their own monitoring options for their experiments. However, the graphical experimenter tool Flack has recently been integrated with Instools, a measurement and collection service, allowing some easy measurement access for experimenters. 
     98  * !PlanetLab provides three monitoring interfaces: !CoMon reports per-node (and per-sliver-per-node) statistics.  !PlanetFlow reports on all traffic flows into and out of a node, and is useful for diagnosing abuse problems.  The MyOps tool reports on node availability as seen by PLC, and is targeted at site operators.
     100The '''discussion''' that followed allowed the community to share best practices and identified some areas of common concern.
     102 * Actively measuring !OpenFlow: Srini Seetharaman and Masa Kobayashi of Stanford discussed their active testing tool they have deployed for measuring !OpenFlow performance internals.
     103 * !FlowVisor Statistics: There was extensive discussion about what information !FlowVisor currently provides for monitoring.  See notes for list.
     104 * Privacy: Many people were interested in the question of how privacy and monitoring will interact in GENI. It was generally agreed that privacy is an issue that should be addressed sooner rather than later.
     105 * Sharing switch data: The question of how (non-!OpenFlow) switch data could be shared was raised.  A robust SNMP proxy might address this need.
     106 * !OpenFlow Opt-In: A lot of operators have agreed that !OpenFlow experimenter opt-in decisions are very difficult, so having monitoring tools which would make it easier to determine whether an opt-in was safe would be useful. 
     107 * Notifications: There was a request for notifications regarding which GENI things are not working.
    46113==== Related activities at GEC ====