Changes between Initial Version and Version 1 of GEC12MesoScaleMonitoring/DetailedNotes


Ignore:
Timestamp:
11/22/11 12:14:40 (12 years ago)
Author:
sedwards@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GEC12MesoScaleMonitoring/DetailedNotes

    v1 v1  
     1= [http://groups.geni.net/geni/wiki/GEC12MesoScaleMonitoring GEC12 Monitoring Session] Detailed Notes =
     2
     3[[PageOutline]]
     4
     5The following notes includes references to the many helpful pictures in the [http://groups.geni.net/geni/attachment/wiki/GEC12MesoScaleMonitoring/GEC12_MonitoringReqts_FINAL.pdf slides] for reference.
     6
     7== Introduction ==
     8'''Sarah Edwards''' of the GPO introduced the session:
     9
     10There were two goals of the session:
     11 1. keep talking about day-to-day issues and problems, as we did at GEC11
     12 2. look ahead to where GENI is headed, so we can provide inputs
     13    to the future of GENI which represent the things we need, so
     14    we don't ignore any imporant pieces
     15
     16Sarah presented a list of "pain points" from GEC11 that form the basis for a running list of tasks that members of the GENI monitoring community feel need to be addressed.
     17
     18== FOAM: New OpenFlow Aggregate Manager ==
     19'''Josh Smift''' of the GPO spoke about FOAM, the new !OpenFlow Aggregate Manager (AM). 
     20 * FOAM acts as an AM (but does not yet implement !OpenFlow "opt-in".)
     21   * Supports:
     22     * AM API interface for experimenters
     23     * JSON API (and tools) for management
     24   * Works on Ubuntu 10.04 now.  Will ensure it works on Ubuntu 12.04 LTS when that comes out.
     25 * BBN has been running it for the past few weeks.  It runs fine and we transitioned our experimenters to it.
     26 * Plan is to keep things stable until after SC11, then migrate campuses over to FOAM 0.6, an upcoming release, on November 21.
     27
     28Josh then talked about how FOAM addressed several pain points identified at GEC11.
     29 * Pain point: Mapping data among !FlowVisor slices, GENI slivers, GENI slices, and experimenter identities is hard. (See slides 8-11 for pictures of the following.)
     30  * FOAM shows the GENI slice name of the FOAM sliver
     31  * FOAM shows an e-mail address, which is user-specified (not provided by the slice authority).  FOAM 0.5 includes the user URN (note that the URN is not shown in the picture on the included slides).
     32  * !FlowVisor slices are now identified by UUIDs (nonmeaningful on their own, so you can use FOAM's mapping to get a GENI slice name)
     33  * Using the above features, monitoring can now map between !FlowVisor slice and FOAM sliver, can show the state of a FOAM sliver, and can show both !OpenFlow and !PlanetLab resources in a slice
     34
     35 * Pain point: Approval (aka !OpenFlow Opt-In) is hard
     36  * FOAM helps by having an engine for automating approval decisions
     37  * FOAM 0.4 has only a default policy setting (approve or deny all)
     38  * FOAM 0.6 will have hooks for other policies, which may include some of: slice owner,
     39   flowspace characteristics, other fields in the sliver record, "a sliver like the one I approved before" (which is a popular request but pretty fuzzy)
     40  * not all of these will be in FOAM 0.6 on Nov 21
     41
     42 * Pain point: Flowspace explosion
     43  * FOAM's algorithm is more succinct.
     44  * BBN's production !FlowVisor went from 6000 rules to 200!
     45
     46== Using SNAPP to Find and Visualize GENI Monitoring Data  ==
     47'''Camilo Viecco''' (of Indiana U/GMOC) did a demonstration of how to use SNAPP to find and visualize monitoring data from the GMOC database:
     48 * SNAPP was originally designed for SNMP collection and display, and to store time-series based data
     49 * GMOC expanded the UI to support lots of non-SNMP data including:
     50  * data API for data submission
     51  * !CoMon data (from !PlanetLab)
     52 * Some terms useful for understanding SNAPP:
     53  * collections: closely-related timeseries data (data of the same type, using the same units).  For example, interface stats like in/out pps and in/out bytes.
     54  * categories: More flexible.  A group of collections that are related in some manner.  Think iTunes listings by artist or by album.
     55 * There are three ways to use SNAPP:
     56  * browsing (for deep categories),
     57  * search (wide categories),
     58  * the portal
     59 * Search (not browse) is the recommended way to find data given the large amount of data (50000 categories)
     60  * google-like syntax supports "and"/"or" logical operators
     61 * "Portal" is new!
     62  * Portal interface to SNAPP is found at: http://gmoc-db.grnoc.iu.edu/measurement/portal.cgi
     63  * Browse by mesoscale participant.  For example, showing !FlowVisors and MyPLCs at Clemson.
     64 * Use SNAPP click-and-drag tools to get more detail about things of interest
     65  * double click on a graph for more detail
     66  * click-and-drag horizontally to zoom into a a time period of interest
     67 * "Browse" view in SNAPP
     68  * SNAPP at: http://gmoc-db.grnoc.iu.edu/measurement
     69  * Drill down by campus or tag
     70 * "Search" view in SNAPP (Pictures on slides 24-25)
     71  * SNAPP at same location as above
     72  * Search for 'tut01' to get all data related to the tutorial slice 'tut01'
     73 * In addition to the 3 ways of accessing SNAPP mentioned above there is a "GMOC protected UI": you can search on a slice, and get nodes which are in that slice
     74  * Current policy is that anyone can see anything about slices or nodes, but it's more restricted who can see organizations and contacts.
     75  * However, once you're part of an organization, you start having access to these things.
     76 * Other useful features of SNAPP:
     77  * Save views of interest, so that you or other people can view them later.
     78  * You can get URLs of custom collections, or embed them in other webpages
     79  * Parts of the tool are open source
     80  * There is a Help button (in lower right hand corner)
     81
     82 * Chris Heermann of RENCI asked:
     83  * Is GMOC accessing people's aggregates? 
     84   * Yes, SNMP for I2/NLR/IU/ProtoGENI/CRON
     85   * And mesoscale aggregates push data to GMOC.
     86  * How frequently is data collected?
     87   * Every 10 seconds for SNMP data;
     88   * 5 minute frequency for mesoscale data, but the granularity is 30 seconds
     89  * How download data?
     90   * Recent data can be gotten from:     
     91     http://gmoc-db.grnoc.iu.edu/web-services/gen_api.pl
     92
     93== Topology ==
     94'''Chaos Golubitsky''' of GPO spoke about Topology in the Mesoscale.
     95
     96Chaos first spoke about the ways in with the Layer 2 meso-scale network is different from commercial networks:
     97 * topology intentionally contains loops for use by experimenters
     98 * experimenters know how packets are switched through the network
     99 * experimenters modify switching and routing for their experiments
     100
     101In the year of meso-scale deployment there have been some network problems which have implications for future monitoring and debugging.
     102
     103 * Problem: Uncontrolled VLAN bridging
     104  * The two meso-scale VLANs are bridged at each campus by an !OpenFlow-controlled VLAN.
     105  * If this fails open, traffic leaks between core VLANs
     106  * Chaos listed 4 ways this can happen from interface misconfiguration, various !OpenFlow switch failures, and experiment misconfiguration
     107  * Detection of this is easy:  Alert if you see traffic on another VLAN that you should never see there (eg IP monitoring subnet of another VLAN). We are doing this at: http://groups.geni.net/geni/wiki/NetworkCore/TrafficLeaks
     108  * GMOC and GPO have tracked down many '''severe''' failures, but because the leaking traffic is not the traffic causing the leak, then intermittent failures are much harder to find.
     109  * How do we find leaks affecting a subset of (non-monitoring) traffic?
     110 * Problem: Shared MAC addresses
     111  * Why does this happen?
     112   * Physical interfaces are expensive
     113   * Subinterfacing puts one physical interface on many networks
     114   * Happens with mesoscale MyPLC and monitoring nodes
     115  * When does this cause problems?
     116   * Different topology using the same MACs on the same devices (or even the same VLAN)
     117   * Worse due to non-!OpenFlow devices in an !OpenFlow path
     118   * Chaos explained what happens here in detail (see slide 38-39 and [#Footnote:SharedMACAddressExplanation Shared MAC Address Explanation])
     119  * NoX detected this problem via syslog message noting port flapping
     120  * In the short term, can track MAC addresses of known hosts and alert if you see them somewhere unexpected.  Also need to ask regionals to watch for problem log messages on non-OF devices connecting OF networks.
     121 * What do these problems have in common?
     122  * Ways in which the topology can fail to match experimenter expectations (which could lead to experimental results not matching expectations)
     123  * We've seen this in the wild
     124  * We need to accurately detect these problems
     125   * In the long-term stitching and I&M software efforts may help here.
     126   * Currently, we define something we expect not to see and alert if we see it.
     127    * Easy. False positives are low
     128   * In the future, we need tools to improve detection and response.
     129    * Write and share "known bad symptom" tests to run at sites, regionals and backbones
     130    * Monitor MAC address learned by experimental switches and alter on unexpected paths when feasible
     131
     132Some questions from the audience:
     133
     134Q: What is the largest topology that can be generated in GENI?  Can you do a random graph?
     135
     136A: E-mail help@geni.net for guidance on setting up your experiment.
     137
     138== Monitoring and Management Requirements and Discussion ==
     139
     140'''Sarah Edwards''' of the GPO discussed top-down monitoring requirements. (Slides start at 48.)
     141
     142 * Why are we here?  To ensure we:
     143  1. don't ignore anything important
     144  2. build for what's coming
     145  3. give input to software developers and rack vendors based on our monitoring interests
     146 * Also: 4. What do we want to work on next?
     147
     148 * Define the players:
     149  * GMOC
     150  * aggregates (including racks, if each rack is a single aggregate)
     151  * campuses: host and run aggregates, or just host aggregates
     152  * networks: GENI participants, or just carry GENI traffic
     153  * experimenters
     154 * Once there's a GENI clearinghouse, aggregates can outsource some responsibilities to the clearinghouse
     155 * Define monitoring and management:
     156  * Monitoring: collecting data
     157  * Management: using that data to solve problems
     158 * Matt Zekauskas: Monitoring also includes configuration, but that's not part of the problem-solving part of monitoring
     159
     160 * Sarah provided an overview of top-level aspects of monitoring and management.
     161  * Top-level aspects of GENI monitoring (slide 56):
     162   * Information must be sharable
     163   * Information must be collected
     164   * Information must be available when needed
     165   * Cross-GENI operational statistics collected and synthesized to indicate GENI as a whole is working
     166   * Preserve privacy of users (opt-in, experimenters, other users of resources)
     167  * Discussion:
     168   * Chris Heermann Q: What needs to be made private here?  Do we need to anonymize MAC addresses?
     169   * Sarah: We don't have a solid answer for that, and there are probably some issues
     170   * Chris Heermann: Maybe generating unique per-experiment MACs might help... or hurt
     171   * Is it possible to anonymize per-experiment MAC addresses?
     172   * Camilo Viecco: We only collect what we need.
     173   * Victor Orlikowski: Collect only what we need and what's not legally prohibited.
     174   * There's a balance between privacy and introspection, and we can't neglect the latter either
     175   * Privacy is a major point we need to work on.
     176  * Top-level aspects of GENI management (slide 57):
     177   * For both debugging and security problems:
     178     * Must be possible to escalate events
     179     * Meta-operations and aggregate operators must work together to resolve problems in a timely manner
     180   * Must be possible to do an emergency stop in case of a problem
     181   * Orginizations must manage GENI resources consistent with local policy and best practices
     182   * Develop policies for monitoring
     183   * All parties should implement agreed upon policies
     184   * Security of GENI as a whole and its pieces
     185
     186Sarah then went through each of the top-level items one at a time.
     187
     188 * Cross-GENI Monitoring (slide 59)
     189 * Privacy (slide 60)
     190  * Major area we need to do work on
     191 * Troubleshooting & event escalation (slide 61):
     192  * Who do I notify when there's an outage?
     193   * Josh Smift: GMOC has data that they've been gathering, and they can better detect who's been using it
     194   * Heidi Picher Dempsey: Some aggregates which have a huge number of resources, aren't interested in who is down.
     195   * Maybe it depends what kind of rack you have: If you're hosting
     196     a rack, how do you even know what is failing?  So notifying
     197     GMOC makes sense because there's a big scaling issue for
     198     reporting to experimenters.
     199   * The proposal at yesterday's session would be for something a little more
     200     automated.
     201   * The rack proposals have a monitoring component as part of their design
     202   * Chris Heermann: Internally, they've talked about using nagios.  In Cluster D, they want to implement resource accounting.
     203   * Victor Orlikowski: Things may be down that the Aggregate Manager doesn't know about, and that information can be communicated back
     204   * Chris Heermann: Is there a date for advertising resources via AM API?
     205   * Sarah Edwards: There is no required date.
     206   * Chris Heermann: You need to assign priorities to tasks.
     207   * Josh Smift: To loop back, we don't think campuses are going to need to do a lot of effort to learn how to monitor their racks.
     208   * Chris Heermann: But what about monitoring within AM API?
     209   * Camilo Viecco: They're using that data to find out what exists, but
     210     neither the AM API nor the ORCA API has a monitoring or
     211     measurement component.
     212 * Emergency Stop (slide 62):
     213  * Campuses need to maintain a Point of Contact (POC)
     214  * Jim: Both the campus and GMOC need the ability to remotely
     215   withdraw a rack or a node from the environment.  They need the
     216   ability to act quickly off-hours, and remote management helped.
     217  * Sarah Edwards: Right now, emergency stop is manual (but remote).
     218  * Camilo Viecco: Right now, GMOC has access to the backbone,
     219   so the lever they can pull is to disconnect the campus.  So the
     220   plan is to work with the campus, and, if that fails, to isolate
     221   the campus from the backbone.
     222  * Josh Smift: But there's also a need for emergency stop for control
     223    interface issues.
     224  * Victor Orlikowski: What is emergency stop?
     225  * Josh Smift: There's a chain of events, which starts with someone complaining
     226  * There's a notion of shutdown inside the API (of a sliver or
     227   node), but it is not well-tested.
     228 * Policy (slide 63):
     229  * Victor Orlikowski: Best practices are social norms.
     230  * Josh Smift: There is the aggregate provider agreement, which does say
     231   things you need to do (but not very many).
     232 * Security (slide 64):
     233  * Victor Orlikowski: campuses need means of interfacing with campus IT to respond to strange things
     234 * Monitoring requirement: info must be shareable/collected/available (65)
     235 * Data definitions (slide 77-78)
     236  * Statements of belief: 3 kinds of data: time-series data, relational data, events
     237
     238(Note: The session ended without a conclusion on consensus with these items (per slides 72-74).  However the discussion continued at the November 18, 2011 monitoring call.  [http://lists.geni.net/pipermail/monitoring/2011-November/000019.html Notes from that meeting]).
     239
     240'''Monitoring telecons''' occur every other Friday at 2PM.  For more information sign-up for the [http://lists.geni.net/mailman/listinfo/monitoring monitoring mailing list] or e-mail Sarah Edwards (sedwards@bbn.com) or Chaos Golubitsky (chaos@bbn.com).
     241
     242== Footnote: Shared MAC Address Explanation ==
     243   * (See slide 38) GPO has a mildly complicated experimental topology, where our lab resources are connected to the mesoscale core via both Internet2 and NLR, to give experimenters more choice.
     244   * We get this connection via our regional, NoX, through which we connect to mesoscale core VLAN 3716 twice.
     245   * This uses two separate VLANs (A and B) in our regional and in our lab, one for each backbone provider.
     246   * VLAN A leaves our lab, goes to our regional, and connects to the backbone VLAN 3716 via Internet2 in NEWY (New York).
     247   * And VLAN B does the same, but connects to backbone VLAN 3716 via NLR in Chicago.
     248   * This is great for some things, but we found a problem in which, if two experimenters are using a resource with a shared MAC (eg the !PlanetLab node pictured in the diagram), AND want to use different paths, it doesn't work
     249   * (See slide 39)  If experiment 1 sends traffic from the !PlanetLab node.  their traffic traverses all of the NoX, I2, and NLR switches, populating MAC address forwarding tables on each switch, on VLAN A, and then on VLAN 3716.  Then the traffic goes through the core through CHIC NLR (in Chicago), and comes back out on the NoX switch again, now on VLAN B. 
     250   * This is wrong!
     251   * Now the NoX switch thinks that our !PlanetLab node, on VLAN B, can be found at CHIC NLR. So if experiment 1 ever causes NoX to think we're at CHIC, then experiment 2 can't *receive* traffic at that node.