wiki:GEC15Agenda/IMMonitoring

Campus/Experiment Topics in Monitoring and I&M

Schedule

Tuesday, 10:30-Noon

Session Leaders

Jeanne Ohren, Chaos Golubitsky, Sarah Edwards, GPO

Session Description

This session addresses how current tools provide visibility into the state of active slices/experiments and into operational GENI resources/campuses/aggregates.

As racks start to come online in the coming months, more experimenters and operations personnel will have interest in seeing the state of GENI experiments, slices and resources. In this session, each of the Spiral 4 I&M projects and the GENI Meta-operations Center (GMOC) will show the current state of their tools with an emphasis on both the experimenter and operations perspectives. Then we will discuss current, open issues of interest to both communities.

Agenda

Summary

At this session, the GIMI and GEMINI I&M projects and the GMOC monitoring project summarized their status and raised some issues including:

  • Does always-on infrastructure monitoring of slices raise opt-in issues? Performance issues?
  • Can we have standardized images with embedded monitoring?
  • Do we need global naming for entities and measurements?

We then discussed "what can monitoring and I&M provide to help with the cross-aggregate stitching effort?"

Topics raised include:

  • Will we have consistent naming of links among all participants? Probably, but make sure we can handle non-GENI-aware devices in the path.
  • VLAN mappings/translations in an experiment are visible to the experimenter via manifest. Provide a way for operators to get this information.
  • We need a substrate measurement infrastructure which experimenters can use to perform active measurements they wouldn’t normally have access to (query a hypervisor, get a pcap of sliver traffic)

Detailed Notes

Introduction

by Chaos Golubitsky, GPO

Chaos introduced the session. Both I&M and monitoring tools provide visibility into the state of:

  • GENI experiments and slices
  • Operational GENI resources, campuses, and aggregates
  • GENI-controlled networks, and non-GENI networks which GENI uses

The audience for these tools is both:

  • Experimenters who use GENI (and the staffers, instructors, and operators who help them)
  • Operators who run GENI infrastructure (especially when GENI runs on shared networks)

GIMI I&M project update

by Mike Zink, UMass Amherst (slides)

Mike gave an overview of the GIMI project and explained how GIMI can be used to monitor slices.

GIMI supports experimenter running experiments, collecting data, and analyzing measurements.

GIMI currently works on ExoGENI and they are working on WiMax support.

Currently publish images that are preinstalled with tools for measurement. The GEC15 Virtual Box VM contains all of the I&M tools including GIMI, iRODS, and IREEL and will be a major part of Thurs tutorial.

GIMI supports passive monitoring of slices.

Mike raised the following issues:

1) Monitoring an experimenter slice raises experimenter opt-in issues.

2) What will be monitored?

3) Does monitoring traffic cause a load to interfere with the experiment? For example, on ExoGENI, monitoring traffic is on a separate network so it's not a problem.

GEMINI I&M project update

by Martin Swany, Indiana University (slides)

Martin gave an overview of GEMINI.

There is a tutorial today which shows the two steps for installing GEMINI...

1) Run instrumentize.py script with appropriate credentials and slice name

2) Configure activities through portal

Martin showed a variety of screenshots.

GEMINI includes:

  • Portal
  • Measurement Store (like old MA) -- control summarization, put data reduction inline
  • Measurement Point (for perfSONAR using BLIPP)
  • Integrate with I&M archive
  • Working on integrating with GIMI archive
  • GEMINI Global Registry (UNIS)
  • Event Messaging Service (high rate notification service in nosql) - getting lots of data off systems
  • supports perfSONAR

What do you get?

  • "warm feeling that you know what's going on in your slice"
  • passive metrics
  • active network measurements
  • archiving

Martin's provided a long list of capabilities (shown in the slides).

New Features include:

  • New Measurement Store
    • NoSQL data store
    • JSON/REST
  • NEW UNIS
  • BLiPP

Campus Infrastructure Monitoring includes:

  • OpenFlow monitoring
  • active probes for dynamic resources -- circuit "acceptance testing"

Next Steps:

  1. Integrate with ABAC support (has been prototyped)
  2. GENI OneStop interface -- new cool portal
    • exp and measurement orchestration and management
  3. Application metrics with NetLogger
  4. Configurable, in-line, on-line summarization and data reduction
  5. Cooperation with Control Frameworks/rack operators/GMOC
    • common metrics across infrastructures
    • reduction of duplicate measurements
    • common collection tools

Discussion Topics:

  • What do users and operators want?
  • Authentication/Authorization
  • measurements should be a first class entity
    • standard images should have embedded monitoring
  • Ongoing measurements need to be addressed
    • access to hypervisor or provide code
    • AuthN/Z to initiate new measurement activities
  • Global naming, global info service
  • Coordinate more on measuring of infra and monitoring applications

GMOC operational monitoring project update

by Kevin Bohan, GRNOC (slides)

Kevin described the new monitoring client and how it's used to report data to GMOC.

Goals are to:

  • Help experimenters see resources before they start using them
  • Let campuses see what's happen in their resources

An issue leading up to GEC14: existing tools were very hard to use. Set about fixing that!

GMOC Objects API:

  • model the state of things in GENI (as a site thinks of it) and submit that to GOMC
  • integrate meta-data with time series data
  • For example, say I have this aggregate which has these resources which have these interfaces, etc and those have the following stats...
  • Previously, had to submit everything at once (even if redundant)
  • Now, split that up, so can submit what you know at any point in time
  • New Python module supports modeling infrastructure and sending data to GMOC
  • "easy to use correct and hard to do it wrong"

"easy to use correctly":

  • previously had to generate hard to write XML
  • Don't have to remember arcane metrics names which could previously be quite long
  • had to call multiple scripts, but now there is a single gmoc module

"hard to do wrong":

  • data is validated before submission BY CLIENT
    • IDs are correct format and globally unique (URNs for everything but aggregates)
    • object hierarchy is enforced
    • various measurement classes, can only set them on appropriate objects
  • Partial submissions supported
  • Backwards compatible

Kevin showed a long list of Modeled Network Elements and time series objects used to report statistics.

Kevin showed a number of code examples:

  1. Add a measurement manually
    • previously several hundred lines in perl
    • now just a few lines
  2. Can also parse RRD files.
    • Now easier
    • working on specifying column headers
  3. Changing aggregate state
    • 5 lines of python code

Who's Reporting?

  • 20 FOAM aggregates
  • GPO SA and GENI CH
  • ExoGENI metadata/time series
  • InstaGENI

Kevin demoed the GMOC DB interface.

Future Directions:

  • Populate the API with data _from_ GMOC
  • more measurements
  • parse RSpecs
  • support additional languages? currently python
  • better visualization of data within the UI/ map
  • integration with other projects
  • use circuit data operationally

Discussion: I&M/monitoring and GENI stitching

Discussion led by Chaos Golubitsky, GPO (slides)

Naming

Martin Swany: How do we expect links to be named? (eg. seg A and seg B have different names; seg C is composed of A and B)

Aaron Helsinger: who picks name between two different networks

Martin S: Sender names the link. (In DCN, links are uni-directional. Each path has distinct properties. The port and TX link are owned by one side. The port and RX link are owned by another side.)

Tom Lehman: Is there a common link component_id to name the link?

Aaron H: Dynamic circuit across physical links may have a different identifier.

Aaron H: There is probably a straightforward mapping between URN and real world, but hope just largely a translation problem. Make sure when we name things in the stitching extension, make sure maps to names on operational names.

Chaos: Where are non-GENI URNs coming from?

Aaron H/Tom L: ION, DYNES, etc

VLANs under translation

How do we know what VLANs are bridged?

Tom L: The manifest will have everything in it.

Chaos G: May only be available to experimenters.

Someone: Collecting info from various places into a common format will help a lot

Diverse resource types

Different nodes are running different OSes and environments. Can we have tools that work across these? Can we help operators understand the network properties of adjacent networks?

Martin S: Yes, but we need a common subtrate measurement system (active measurements -- have hypervisor on machine run for you; passive measurement might look different from inside the host)

Chaos: There is an analogous OpenFlow scenario. May need pcap of what the controller is seeing.

Someone: Need ability for experimenters to run active debugging services they don't have permission to run themselves.

Last modified 11 years ago Last modified on 11/01/12 09:53:30

Attachments (4)