[[PageOutline]] = Campus/Experiment Topics in Monitoring and I&M = == Schedule == Tuesday, 10:30-Noon == Session Leaders == Jeanne Ohren, Chaos Golubitsky, Sarah Edwards, GPO == Session Description == This session addresses how current tools provide visibility into the state of active slices/experiments and into operational GENI resources/campuses/aggregates. As racks start to come online in the coming months, more experimenters and operations personnel will have interest in seeing the state of GENI experiments, slices and resources. In this session, each of the Spiral 4 I&M projects and the GENI Meta-operations Center (GMOC) will show the current state of their tools with an emphasis on both the experimenter and operations perspectives. Then we will discuss current, open issues of interest to both communities. == Agenda == * [#GIMIIMprojectupdate GIMI I&M project update] [attachment:"GIMI GEC 15 I&M Session.pptx" slides] - Mike Zink, UMass Amherst * [#GEMINIIMprojectupdate GEMINI I&M project update] [attachment:"GEMINI Overview and Direction-GEC15.pdf" slides] - Martin Swany, Indiana University * [#GMOCoperationalmonitoringprojectupdate GMOC operational monitoring project update] [attachment:"GEC15_IM.pptx" slides] - Kevin Bohan, GRNOC * [#Discussion:IMmonitoringandGENIstitching Discussion: I&M/monitoring and GENI stitching] [attachment:"GEC15_monitoring_stitching_discussion.pptx" slides] - Chaos Golubitsky, GPO == Summary == At this session, the GIMI and GEMINI I&M projects and the GMOC monitoring project summarized their status and raised some issues including: * Does always-on infrastructure monitoring of slices raise opt-in issues? Performance issues? * Can we have standardized images with embedded monitoring? * Do we need global naming for entities and measurements? We then discussed "what can monitoring and I&M provide to help with the cross-aggregate stitching effort?" Topics raised include: * Will we have consistent naming of links among all participants? Probably, but make sure we can handle non-GENI-aware devices in the path. * VLAN mappings/translations in an experiment are visible to the experimenter via manifest. Provide a way for operators to get this information. * We need a substrate measurement infrastructure which experimenters can use to perform active measurements they wouldn’t normally have access to (query a hypervisor, get a pcap of sliver traffic) == Detailed Notes == === Introduction === by Chaos Golubitsky, GPO Chaos introduced the session. Both I&M and monitoring tools provide visibility into the state of: * GENI experiments and slices * Operational GENI resources, campuses, and aggregates * GENI-controlled networks, and non-GENI networks which GENI uses The audience for these tools is both: * Experimenters who use GENI (and the staffers, instructors, and operators who help them) * Operators who run GENI infrastructure (especially when GENI runs on shared networks) === GIMI I&M project update === by Mike Zink, UMass Amherst Mike gave an overview of the GIMI project and explained how GIMI can be used to monitor slices. GIMI supports experimenter running experiments, collecting data, and analyzing measurements. GIMI currently works on ExoGENI and they are working on !WiMax support. Currently publish images that are preinstalled with tools for measurement. The GEC15 Virtual Box VM contains all of the I&M tools including GIMI, iRODS, and IREEL and will be a major part of Thurs tutorial. GIMI supports passive monitoring of slices. Mike raised the following issues: 1) Monitoring an experimenter slice raises experimenter opt-in issues. 2) What will be monitored? 3) Does monitoring traffic cause a load to interfere with the experiment? For example, on ExoGENI, monitoring traffic is on a separate network so it's not a problem. === GEMINI I&M project update === by Martin Swany, Indiana University Martin gave an overview of GEMINI. There is a tutorial today which shows the two steps for installing GEMINI... 1) Run instrumentize.py script with appropriate credentials and slice name 2) Configure activities through portal Martin showed a variety of screenshots. GEMINI includes: * Portal * Measurement Store (like old MA) -- control summarization, put data reduction inline * Measurement Point (for perfSONAR using BLIPP) * Integrate with I&M archive * Working on integrating with GIMI archive * GEMINI Global Registry (UNIS) * Event Messaging Service (high rate notification service in nosql) - getting lots of data off systems * supports perfSONAR What do you get? * "warm feeling that you know what's going on in your slice" * passive metrics * active network measurements * archiving Martin's provided a long list of capabilities (shown in the slides). New Features include: * New Measurement Store - NoSQL data store - JSON/REST * NEW UNIS * BLiPP Campus Infrastructure Monitoring includes: * !OpenFlow monitoring * active probes for dynamic resources -- circuit "acceptance testing" Next Steps: 1. Integrate with ABAC support (has been prototyped) 2. GENI !OneStop interface -- new cool portal * exp and measurement orchestration and management 3. Application metrics with !NetLogger 4. Configurable, in-line, on-line summarization and data reduction 5. Cooperation with Control Frameworks/rack operators/GMOC * common metrics across infrastructures * reduction of duplicate measurements * common collection tools Discussion Topics: * What do users and operators want? * Authentication/Authorization * measurements should be a first class entity - standard images should have embedded monitoring * Ongoing measurements need to be addressed - access to hypervisor or provide code - AuthN/Z to initiate new measurement activities * Global naming, global info service * Coordinate more on measuring of infra and monitoring applications === GMOC operational monitoring project update === by Kevin Bohan, GRNOC Kevin described the new monitoring client and how it's used to report data to GMOC. Goals are to: * Help experimenters see resources before they start using them * Let campuses see what's happen in their resources An issue leading up to GEC14: existing tools were very hard to use. Set about fixing that! GMOC Objects API: * model the state of things in GENI (as a site thinks of it) and submit that to GOMC * integrate meta-data with time series data * For example, say I have this aggregate which has these resources which have these interfaces, etc and those have the following stats... * Previously, had to submit everything at once (even if redundant) * Now, split that up, so can submit what you know at any point in time * New Python module supports modeling infrastructure and sending data to GMOC * "easy to use correct and hard to do it wrong" "easy to use correctly": - previously had to generate hard to write XML - Don't have to remember arcane metrics names which could previously be quite long - had to call multiple scripts, but now there is a single gmoc module "hard to do wrong": - data is validated before submission BY CLIENT * IDs are correct format and globally unique (URNs for everything but aggregates) * object hierarchy is enforced * various measurement classes, can only set them on appropriate objects - Partial submissions supported - Backwards compatible Kevin showed a long list of Modeled Network Elements and time series objects used to report statistics. Kevin showed a number of code examples: 1. Add a measurement manually - previously several hundred lines in perl - now just a few lines 2. Can also parse RRD files. - Now easier - working on specifying column headers 3. Changing aggregate state - 5 lines of python code Who's Reporting? * 20 FOAM aggregates * GPO SA and GENI CH * ExoGENI metadata/time series * InstaGENI Kevin demoed the GMOC DB interface. Future Directions: * Populate the API with data _from_ GMOC * more measurements * parse RSpecs * support additional languages? currently python * better visualization of data within the UI/ map * integration with other projects * use circuit data operationally === Discussion: I&M/monitoring and GENI stitching === Discussion led by Chaos Golubitsky, GPO ==== Naming ==== Martin Swany: How do we expect links to be named? (eg. seg A and seg B have different names; seg C is composed of A and B) Aaron Helsinger: who picks name between two different networks Martin S: Sender names the link. (In DCN, links are uni-directional. Each path has distinct properties. The port and TX link are owned by one side. The port and RX link are owned by another side.) Tom Lehman: Is there a common link component_id to name the link? Aaron H: Dynamic circuit across physical links may have a different identifier. Aaron H: There is probably a straightforward mapping between URN and real world, but hope just largely a translation problem. Make sure when we name things in the stitching extension, make sure maps to names on operational names. Chaos: Where are non-GENI URNs coming from? Aaron H/Tom L: ION, DYNES, etc ==== VLANs under translation ==== How do we know what VLANs are bridged? Tom L: The manifest will have everything in it. Chaos G: May only be available to experimenters. Someone: Collecting info from various places into a common format will help a lot ==== Diverse resource types ==== Different nodes are running different OSes and environments. Can we have tools that work across these? Can we help operators understand the network properties of adjacent networks? Martin S: Yes, but we need a common subtrate measurement system (active measurements -- have hypervisor on machine run for you; passive measurement might look different from inside the host) Chaos: There is an analogous !OpenFlow scenario. May need pcap of what the controller is seeing. Someone: Need ability for experimenters to run active debugging services they don't have permission to run themselves.