wiki:GEC12MesoScaleMonitoring/DetailedNotes

Version 4 (modified by sedwards@bbn.com, 8 years ago) (diff)

--

GEC12 Monitoring Session Detailed Notes

The following notes about the GEC12 Monitoring Session includes references to the many helpful pictures in the slides for reference.

Introduction

Sarah Edwards of the GPO introduced the session:

There were two goals of the session:

  1. keep talking about day-to-day issues and problems, as we did at GEC11
  2. look ahead to where GENI is headed, so we can provide inputs to the future of GENI which represent the things we need, so we don't ignore any imporant pieces

Sarah presented a list of "pain points" from GEC11 that form the basis for a running list of tasks that members of the GENI monitoring community feel need to be addressed.

FOAM: New OpenFlow Aggregate Manager

Josh Smift of the GPO spoke about FOAM, the new OpenFlow Aggregate Manager (AM).

  • FOAM acts as an AM (but does not yet implement OpenFlow "opt-in".)
    • Supports:
      • AM API interface for experimenters
      • JSON API (and tools) for management
    • Works on Ubuntu 10.04 now. Will ensure it works on Ubuntu 12.04 LTS when that comes out.
  • BBN has been running it for the past few weeks. It runs fine and we transitioned our experimenters to it.
  • Plan is to keep things stable until after SC11, then migrate campuses over to FOAM 0.6, an upcoming release, on November 21.

Josh then talked about how FOAM addressed several pain points identified at GEC11.

  • Pain point: Mapping data among FlowVisor slices, GENI slivers, GENI slices, and experimenter identities is hard. (See slides 8-11 for pictures of the following.)
    • FOAM shows the GENI slice name of the FOAM sliver
    • FOAM shows an e-mail address, which is user-specified (not provided by the slice authority). FOAM 0.5 includes the user URN (note that the URN is not shown in the picture on the included slides).
    • FlowVisor slices are now identified by UUIDs (nonmeaningful on their own, so you can use FOAM's mapping to get a GENI slice name)
    • Using the above features, monitoring can now map between FlowVisor slice and FOAM sliver, can show the state of a FOAM sliver, and can show both OpenFlow and PlanetLab resources in a slice
  • Pain point: Approval (aka OpenFlow Opt-In) is hard
    • FOAM helps by having an engine for automating approval decisions
    • FOAM 0.4 has only a default policy setting (approve or deny all)
    • FOAM 0.6 will have hooks for other policies, which may include some of: slice owner, flowspace characteristics, other fields in the sliver record, "a sliver like the one I approved before" (which is a popular request but pretty fuzzy)
    • not all of these will be in FOAM 0.6 on Nov 21
  • Pain point: Flowspace explosion
    • FOAM's algorithm is more succinct.
    • BBN's production FlowVisor went from 6000 rules to 200!

Using SNAPP to Find and Visualize GENI Monitoring Data

Camilo Viecco (of Indiana U/GMOC) did a demonstration of how to use SNAPP to find and visualize monitoring data from the GMOC database:

  • SNAPP was originally designed for SNMP collection and display, and to store time-series based data
  • GMOC expanded the UI to support lots of non-SNMP data including:
    • data API for data submission
    • CoMon data (from PlanetLab)
  • Some terms useful for understanding SNAPP:
    • collections: closely-related timeseries data (data of the same type, using the same units). For example, interface stats like in/out pps and in/out bytes.
    • categories: More flexible. A group of collections that are related in some manner. Think iTunes listings by artist or by album.
  • There are three ways to use SNAPP:
    • browsing (for deep categories),
    • search (wide categories),
    • the portal
  • Search (not browse) is the recommended way to find data given the large amount of data (50000 categories)
    • google-like syntax supports "and"/"or" logical operators
  • "Portal" is new!
  • Use SNAPP click-and-drag tools to get more detail about things of interest
    • double click on a graph for more detail
    • click-and-drag horizontally to zoom into a a time period of interest
  • "Browse" view in SNAPP
  • "Search" view in SNAPP (Pictures on slides 24-25)
    • SNAPP at same location as above
    • Search for 'tut01' to get all data related to the tutorial slice 'tut01'
  • In addition to the 3 ways of accessing SNAPP mentioned above there is a "GMOC protected UI": you can search on a slice, and get nodes which are in that slice
    • Current policy is that anyone can see anything about slices or nodes, but it's more restricted who can see organizations and contacts.
    • However, once you're part of an organization, you start having access to these things.
  • Other useful features of SNAPP:
    • Save views of interest, so that you or other people can view them later.
    • You can get URLs of custom collections, or embed them in other webpages
    • Parts of the tool are open source
    • There is a Help button (in lower right hand corner)
  • Chris Heermann of RENCI asked:
    • Is GMOC accessing people's aggregates?
      • Yes, SNMP for I2/NLR/IU/ProtoGENI/CRON
      • And mesoscale aggregates push data to GMOC.
    • How frequently is data collected?
      • Every 10 seconds for SNMP data;
      • 5 minute frequency for mesoscale data, but the granularity is 30 seconds
    • How download data?

Topology

Chaos Golubitsky of GPO spoke about Topology in the Mesoscale.

Chaos first spoke about the ways in with the Layer 2 meso-scale network is different from commercial networks:

  • topology intentionally contains loops for use by experimenters
  • experimenters know how packets are switched through the network
  • experimenters modify switching and routing for their experiments

In the year of meso-scale deployment there have been some network problems which have implications for future monitoring and debugging.

  • Problem: Uncontrolled VLAN bridging
    • The two meso-scale VLANs are bridged at each campus by an OpenFlow-controlled VLAN.
    • If this fails open, traffic leaks between core VLANs
    • Chaos listed 4 ways this can happen from interface misconfiguration, various OpenFlow switch failures, and experiment misconfiguration
    • Detection of this is easy: Alert if you see traffic on another VLAN that you should never see there (eg IP monitoring subnet of another VLAN). We are doing this at: http://groups.geni.net/geni/wiki/NetworkCore/TrafficLeaks
    • GMOC and GPO have tracked down many severe failures, but because the leaking traffic is not the traffic causing the leak, then intermittent failures are much harder to find.
    • How do we find leaks affecting a subset of (non-monitoring) traffic?
  • Problem: Shared MAC addresses
    • Why does this happen?
      • Physical interfaces are expensive
      • Subinterfacing puts one physical interface on many networks
      • Happens with mesoscale MyPLC and monitoring nodes
    • When does this cause problems?
      • Different topology using the same MACs on the same devices (or even the same VLAN)
      • Worse due to non-OpenFlow devices in an OpenFlow path
      • Chaos explained what happens here in detail (see slide 38-39 and Footnote on Shared MAC Address Explanation)
    • NoX detected this problem via syslog message noting port flapping
    • In the short term, can track MAC addresses of known hosts and alert if you see them somewhere unexpected. Also need to ask regionals to watch for problem log messages on non-OF devices connecting OF networks.
  • What do these problems have in common?
    • Ways in which the topology can fail to match experimenter expectations (which could lead to experimental results not matching expectations)
    • We've seen this in the wild
    • We need to accurately detect these problems
      • In the long-term stitching and I&M software efforts may help here.
      • Currently, we define something we expect not to see and alert if we see it.
        • Easy. False positives are low
      • In the future, we need tools to improve detection and response.
        • Write and share "known bad symptom" tests to run at sites, regionals and backbones
        • Monitor MAC address learned by experimental switches and alter on unexpected paths when feasible

Some questions from the audience:

Q: What is the largest topology that can be generated in GENI? Can you do a random graph?

A: E-mail help@geni.net for guidance on setting up your experiment.

Monitoring and Management Requirements and Discussion

Sarah Edwards of the GPO discussed top-down monitoring requirements. (Slides start at 48.)

  • Why are we here? To ensure we:
    1. don't ignore anything important
    2. build for what's coming
    3. give input to software developers and rack vendors based on our monitoring interests
  • Also: 4. What do we want to work on next?
  • Define the players:
    • GMOC
    • aggregates (including racks, if each rack is a single aggregate)
    • campuses: host and run aggregates, or just host aggregates
    • networks: GENI participants, or just carry GENI traffic
    • experimenters
  • Once there's a GENI clearinghouse, aggregates can outsource some responsibilities to the clearinghouse
  • Define monitoring and management:
    • Monitoring: collecting data
    • Management: using that data to solve problems
  • Matt Zekauskas: Monitoring also includes configuration, but that's not part of the problem-solving part of monitoring
  • Sarah provided an overview of top-level aspects of monitoring and management.
    • Top-level aspects of GENI monitoring (slide 56):
      • Information must be sharable
      • Information must be collected
      • Information must be available when needed
      • Cross-GENI operational statistics collected and synthesized to indicate GENI as a whole is working
      • Preserve privacy of users (opt-in, experimenters, other users of resources)
    • Discussion:
      • Chris Heermann Q: What needs to be made private here? Do we need to anonymize MAC addresses?
      • Sarah: We don't have a solid answer for that, and there are probably some issues
      • Chris Heermann: Maybe generating unique per-experiment MACs might help... or hurt
      • Is it possible to anonymize per-experiment MAC addresses?
      • Camilo Viecco: We only collect what we need.
      • Victor Orlikowski: Collect only what we need and what's not legally prohibited.
      • There's a balance between privacy and introspection, and we can't neglect the latter either
      • Privacy is a major point we need to work on.
    • Top-level aspects of GENI management (slide 57):
      • For both debugging and security problems:
        • Must be possible to escalate events
        • Meta-operations and aggregate operators must work together to resolve problems in a timely manner
      • Must be possible to do an emergency stop in case of a problem
      • Orginizations must manage GENI resources consistent with local policy and best practices
      • Develop policies for monitoring
      • All parties should implement agreed upon policies
      • Security of GENI as a whole and its pieces

Sarah then went through each of the top-level items one at a time.

  • Cross-GENI Monitoring (slide 59)
  • Privacy (slide 60)
    • Major area we need to do work on
  • Troubleshooting & event escalation (slide 61):
    • Who do I notify when there's an outage?
      • Josh Smift: GMOC has data that they've been gathering, and they can better detect who's been using it
      • Heidi Picher Dempsey: Some aggregates which have a huge number of resources, aren't interested in who is down.
      • Maybe it depends what kind of rack you have: If you're hosting a rack, how do you even know what is failing? So notifying GMOC makes sense because there's a big scaling issue for reporting to experimenters.
      • The proposal at yesterday's session would be for something a little more automated.
      • The rack proposals have a monitoring component as part of their design
      • Chris Heermann: Internally, they've talked about using nagios. In Cluster D, they want to implement resource accounting.
      • Victor Orlikowski: Things may be down that the Aggregate Manager doesn't know about, and that information can be communicated back
      • Chris Heermann: Is there a date for advertising resources via AM API?
      • Sarah Edwards: There is no required date.
      • Chris Heermann: You need to assign priorities to tasks.
      • Josh Smift: To loop back, we don't think campuses are going to need to do a lot of effort to learn how to monitor their racks.
      • Chris Heermann: But what about monitoring within AM API?
      • Camilo Viecco: They're using that data to find out what exists, but neither the AM API nor the ORCA API has a monitoring or measurement component.
  • Emergency Stop (slide 62):
    • Campuses need to maintain a Point of Contact (POC)
    • Jim: Both the campus and GMOC need the ability to remotely withdraw a rack or a node from the environment. They need the ability to act quickly off-hours, and remote management helped.
    • Sarah Edwards: Right now, emergency stop is manual (but remote).
    • Camilo Viecco: Right now, GMOC has access to the backbone, so the lever they can pull is to disconnect the campus. So the plan is to work with the campus, and, if that fails, to isolate the campus from the backbone.
    • Josh Smift: But there's also a need for emergency stop for control interface issues.
    • Victor Orlikowski: What is emergency stop?
    • Josh Smift: There's a chain of events, which starts with someone complaining
    • There's a notion of shutdown inside the API (of a sliver or node), but it is not well-tested.
  • Policy (slide 63):
    • Victor Orlikowski: Best practices are social norms.
    • Josh Smift: There is the aggregate provider agreement, which does say things you need to do (but not very many).
  • Security (slide 64):
    • Victor Orlikowski: campuses need means of interfacing with campus IT to respond to strange things
  • Monitoring requirement: info must be shareable/collected/available (65)
  • Data definitions (slide 77-78)
    • Statements of belief: 3 kinds of data: time-series data, relational data, events

(Note: The session ended without a conclusion on consensus with these items (per slides 72-74). However the discussion continued at the November 18, 2011 monitoring call. Notes from that meeting).

Monitoring telecons occur every other Friday at 2PM. For more information sign-up for the monitoring mailing list or e-mail Sarah Edwards (sedwards@bbn.com) or Chaos Golubitsky (chaos@bbn.com).

Footnote: Shared MAC Address Explanation

  • (See slide 38) GPO has a mildly complicated experimental topology, where our lab resources are connected to the mesoscale core via both Internet2 and NLR, to give experimenters more choice.
  • We get this connection via our regional, NoX, through which we connect to mesoscale core VLAN 3716 twice.
  • This uses two separate VLANs (A and B) in our regional and in our lab, one for each backbone provider.
  • VLAN A leaves our lab, goes to our regional, and connects to the backbone VLAN 3716 via Internet2 in NEWY (New York).
  • And VLAN B does the same, but connects to backbone VLAN 3716 via NLR in Chicago.
  • This is great for some things, but we found a problem in which, if two experimenters are using a resource with a shared MAC (eg the PlanetLab node pictured in the diagram), AND want to use different paths, it doesn't work
  • (See slide 39) If experiment 1 sends traffic from the PlanetLab node. their traffic traverses all of the NoX, I2, and NLR switches, populating MAC address forwarding tables on each switch, on VLAN A, and then on VLAN 3716. Then the traffic goes through the core through CHIC NLR (in Chicago), and comes back out on the NoX switch again, now on VLAN B.
  • This is wrong!
  • Now the NoX switch thinks that our PlanetLab node, on VLAN B, can be found at CHIC NLR. So if experiment 1 ever causes NoX to think we're at CHIC, then experiment 2 can't *receive* traffic at that node.