Changes between Version 4 and Version 5 of GEC13Agenda/Monitoring


Ignore:
Timestamp:
03/22/12 11:01:10 (12 years ago)
Author:
sedwards@bbn.com
Comment:

Added notes

Legend:

Unmodified
Added
Removed
Modified
  • GEC13Agenda/Monitoring

    v4 v5  
    1111Sarah Edwards and Chaos Golubitsky, GENI Project Office
    1212
     13== Original Session Description ==
     14
     15At this working session, we will review and aim to approve an updated GENI monitoring architecture that satisfies the monitoring requirements discussed at the last GEC.  In addition, we will discuss extensions to the design of the monitoring API for relational, event, and time series data.  Finally, we will review the status of solutions to previously identified "pain points" of concern to the monitoring and operations community. Campus IT staff, aggregate operators and software developers are encouraged to attend.
     16
    1317== Agenda / Details ==
    1418
    15 At this working session, we will review and aim to approve an updated GENI monitoring architecture that satisfies the monitoring requirements discussed at the last GEC.  In addition, we will discuss extensions to the design of the monitoring API for relational, event, and time series data.  Finally, we will review the status of solutions to previously identified "pain points" of concern to the monitoring and operations community. Campus IT staff, aggregate operators and software developers are encouraged to attend.
     19 * Monitoring Architecture
     20   * Sarah Edwards, GPO
     21 * Monitoring Privacy
     22   * Heidi Picher Dempsey, GPO
     23 * Monitoring APIs + Experimenter/Operator Portal
     24   * Chaos Golubitsky, GPO
     25 * Status of Monitoring Pains
     26   * Sarah Edwards, GPO
     27
     28== Notes ==
     29
     30'''Sarah Edwards''' described major elements of a proposed monitoring architecture including the major actors in the monitoring architecture,  the interfaces between them, use cases, and the data to be collected. (See slides 4-22)
     31
     32The major actors are:
     33 * ''Meta-operations'', which collects and makes available operational data and also generates cross-GENI monitoring data
     34 * The future ''GENI Clearinghouse'', which is the authoritative repository for project, user, and slice information
     35 * ''Aggregates'', which contain resources
     36 * ''Campuses'', which host resources
     37 * ''Experimenters'', which under limited circumstances might provide some monitoring information to meta-operations
     38 * ''Regional Networks'' and ''Backbone Networks'' (I2, NLR)
     39
     40The major monitoring interfaces allow:
     41 * Aggregates/Campuses/Experimenters/Meta-operations to submit data to Meta-operations
     42 * Anyone to query data from Meta-operations
     43 * Out-of-band communication to experimenters, aggregates and campuses
     44
     45In addition, GENI monitoring relies on other interfaces which allow:
     46 * Access to definitive data about slices, projects, and users from the future GENI Clearinghouse (via interfaces defined by the GENI Architecture group)
     47 * Resolution of problems via out-of-band access between Campuses and their Regional Networks
     48 * Resolution of problems via out-of-band access between GMOC and GENI backbone networks (I2, NLR)
     49
     50The structure of the monitoring data falls in one of three categories:
     51 * ''Time series'' data is a particular piece of data measured over time.
     52 * ''Relational'' data conveys entities and the relationship between them as observed at time T.
     53 * ''Event'' data denotes something happening or being noticed at a particular moment in time.
     54
     55The precision of the data can be described as either:
     56 * ''Definitive'' data is known to be definitive because it was generated by the source that created it. The future GENI clearinghouse will contain a lot of definitive data.
     57 * ''Sampled'' data is collected by measuring the state of the world periodically. The Meta-operations database contains a lot of sampled data.
     58
     59Several uses cases were reviewed with notes about the data required to support them:
     60 * An operator responds to a request from an experimenter who is unable to create a sliver at an aggregate.  The operator needs to see the current and recent status of that aggregate (eg is the AM responding to queries?)
     61 * Assess the availability/utilization of GENI resources over a four month period.
     62 * A core !OpenFlow switch misbehaves leading to intermittent failures. Various operators need information about whether L2 network resources are usable and the utilization of relevant network resources in order to debug the failure.
     63 * Assess the historic usage of network resources in order to plan for future expansion of those resources.
     64 * An experimenter wants to access basic information about their sliver to assess the general health and activity of their experiment.
     65 * An aggregate operator receives a complaint from campus IT about unusual traffic on the campus network.  The aggregate operator needs to looks at the current state of resources at their campus in order to demonstrate the problem is not caused by his aggregate.
     66
     67The data needed to be collected for the above use cases was also reviewed.
     68
     69Discussion:
     70
     71Heidi Picher Dempsey asked about the scope of the use cases. Sarah Edwards responded that use cases aren't meant to be exhaustive, but are based on real things we wanted to know over the fairly short time we've been monitoring GENI.  If people know of data or use cases that they think are important, talk to Sarah about adding them.
     72
     73Victor Orlikowsky noted that I&M overlaps with monitoring.  The group discussed the distinction for awhile.  In particular, Heidi noted that "we don't want every experimenter to go out and collect all of the same data."
     74
     75'''Heidi Picher Dempsey''' described a proposal for privacy of monitoring data in GENI. (Slides 24-31)
     76
     77Heidi said that the goal of the proposal is to make some decisions.  We've been collecting data, but we've given no guidance about what to do about it.  This proposal is based on input from Adam Slagell.
     78
     79General Recommendation:
     80 * Data that CAN be shared publicly: existence of slice (urn, UUID); slice is active (has resources); slice name
     81 * Some data CAN be shared IF you have access controls.  Aggregates should make sure they are not sharing user data. LLR requests are a special case.
     82 * Data that should NOT be shared: opt-in user data; data that identifies experimenters by username/real name; data that identifies experimenter contact info
     83
     84Privacy recommendations to experimenters
     85 * GENI needs to warn experimenters that slice name is public
     86 * Slice credential will (soon) include a slice e-mail. If you don't want it shared, then don't use a personal email.
     87 * Your personal email is part of your experimenter credential and will be shared with the aggregate and GMOC.  Should not be shared outside GMOC.
     88
     89Discussion:
     90
     91Victor suggested that we create two documents:
     92 1. An operator "Code of Ethics": Have a privacy policy of what we will or won't do with your data.  (Chaos Golubitsky suggested looking at the [https://www.usenix.org/lisa/system-administrators-code-ethics sysadmin code of ethics] for guidance).
     93 2. "Experimenter's privacy recommendations": Here's what you should do/not do as an experimenter.
     94
     95One person expressed surprise that slice name is public and that we should make the warning about slice name being public front and center.
     96
     97Sarah noted that basic information about an experiment (eg nodes used, interface stats) will be publicly available.
     98
     99'''Chaos Golubitsky''' presented on monitoring interfaces and the experimenter portal including updates done in conjunction with Mitch !McCracken. (Slides 33-43)
     100
     101GMOC has been collecting slice metadata, measurement data, and operational data from GENI aggregates and slice authorities.  Starting in Spiral 4, they will run the Service Desk which will detect and respond to operational problems in GENI.  Chaos introduced '''Mitch !McCracken''' who is GMOC's primary software developer who is working on GENI monitoring; Mitch started working on monitoring data submission and interfaces in January. 
     102
     103GMOC tools allow for the collection, display and alerting on operational slice metadata and measurement data.
     104
     105Work to improve these tools has focused on:
     106 * standardizing data submission to make it easy for new projects (like racks) to submit data to GMOC and provide consistent data naming
     107 * Improving user interfaces to make it easy for data of interest to be easy for experimenters and operators to find.  In particular, data should be tied together in standard ways and operational health data should be easy to find.
     108
     109Currently there are two APIs for submitting data: a measurement API for submitting time series data, and a relational API for submitting meta-data.  In Spiral 3 GMOC and GPO tested the measurement API and collected a year of data improving reliability and tools for data submission.
     110
     111Chaos demonstrated progress with the GMOC Experimenter web interface which supports finding and viewing time series data for slices and nodes.  In addition, there is now submission of slice data from the GPO's SA using the relational API.
     112
     113In spiral 4, work will continue to test and improve the relational data API, refine the time-series API, and collect data from the new racks.  In addition, work will continue on improving the user interfaces including: improved documentation, improved health reports, tying together relational data with time-series data, use of URNs and UUIDs consistently with other GENI entities to distinguish slices over time, and to support specific use cases.
     114
     115Chaos then invited participation by others on the following items:
     116 * early adopters for the experimenter web portal
     117 * ideas for GENI health tests that people are running or would like to see run
     118 * feedback
     119
     120'''Sarah Edwards''' reviewed the status of the ongoing list of "monitoring pains". (Slides 45-47)
     121
     122Items which have made progress since the last GEC are listed on the slides with check marks:
     123 * Chaos and Mitch have made progress on packaging the plastic slices monitoring software
     124 * Nick Bastin and Josh Smift have released FOAM (the !OpenFlow Aggregate Manager) and installed it throughout the meso-scale which supports improved monitoring
     125 * The experimenter web interfaces improves the ability to find data by slice
     126 * Heidi (based on input from Adam Slagell) outlined a proposal regarding privacy of monitoring data at this session
     127 * Sarah presented some monitoring data that should be collected at this session
     128
     129
     130For more information please contact one of the following people or attend the biweekly monitoring call...
     131
     132Contacts:
     133 * GENI operational monitoring list: monitoring@geni.net
     134 * GMOC: Mitch !McCracken (mrmccrac@grnoc.iu.edu)
     135 * GPO: Chaos Golubitsky (chaos@bbn.com), Sarah Edwards (sedwards@bbn.com)
     136
     137Biweekly monitoring conference call:
     138 * Every other Friday at 2pm Eastern
     139 * Information on GENI monitoring mailing list
     140
     141GENI operational monitoring list: monitoring@geni.net (Sign up at http://lists.geni.net)
     142
     143
     144
     145
     146
     147
     148
     149