Changes between Version 1 and Version 2 of GEC15Agenda/IMMonitoring

10/31/12 17:41:38 (9 years ago)



  • GEC15Agenda/IMMonitoring

    v1 v2  
    12= Campus/Experiment Topics in Monitoring and I&M =
    1314As racks start to come online in the coming months, more experimenters and operations personnel will have interest in seeing the state of GENI experiments, slices and resources.  In this session, each of the Spiral 4 I&M projects and the GENI Meta-operations Center (GMOC) will show the current state of their tools with an emphasis on both the experimenter and operations perspectives.  Then we will discuss current, open issues of interest to both communities.
     16== Agenda ==
     17 * [#GIMIIMprojectupdate GIMI I&M project update]
     18   - Mike Zink, UMass Amherst
     19 * [#GEMINIIMprojectupdate GEMINI I&M project update]
     20   - Martin Swany, Indiana University
     21 * [#GMOCoperationalmonitoringprojectupdate GMOC operational monitoring project update]
     22   - Kevin Bohan, GRNOC
     23 * [#Discussion:IMmonitoringandGENIstitching Discussion: I&M/monitoring and GENI stitching]
     24   - Chaos Golubitsky, GPO
     26== Summary ==
     28At this session, the GIMI and GEMINI I&M projects and the GMOC monitoring project summarized their status and raised some issues including:
     29 * Does always-on infrastructure monitoring of slices raise opt-in issues?  Performance issues?
     30 * Can we have standardized images with embedded monitoring?
     31 * Do we need global naming for entities and measurements?
     33We then discussed "what can monitoring and I&M provide to help with the cross-aggregate stitching effort?"
     35Topics raised include:
     36 * Will we have consistent naming of links among all participants?  Probably, but make sure we can handle non-GENI-aware devices in the path.
     37 * VLAN mappings/translations in an experiment are visible to the experimenter via manifest.  Provide a way for operators to get this information.
     38 * We need a substrate measurement infrastructure which experimenters can use to perform active measurements they wouldn’t normally have access to (query a hypervisor, get a pcap of sliver traffic)
     40== Detailed Notes ==
     42=== Introduction ===
     44by Chaos Golubitsky, GPO
     46Chaos introduced the session. Both I&M and monitoring tools provide visibility into the state of:
     47 * GENI experiments and slices
     48 * Operational GENI resources, campuses, and aggregates
     49 * GENI-controlled networks, and non-GENI networks which GENI uses
     50The audience for these tools is both:
     51 * Experimenters who use GENI (and the staffers, instructors, and operators who help them)
     52 * Operators who run GENI infrastructure (especially when GENI runs on shared networks)
     54=== GIMI I&M project update ===
     55by Mike Zink, UMass Amherst
     57Mike gave an overview of the GIMI project and explained how GIMI can be used to monitor slices.
     59GIMI supports experimenter running experiments, collecting data, and analyzing measurements.
     61GIMI currently works on ExoGENI and they are working on !WiMax support.
     63Currently publish images that are preinstalled with tools for measurement. The GEC15 Virtual Box VM contains all of the I&M tools including GIMI, iRODS, and IREEL and will be a major part of Thurs tutorial.
     65GIMI supports passive monitoring of slices.
     67Mike raised the following issues:
     69 1) Monitoring an experimenter slice raises experimenter opt-in issues. 
     71 2) What will be monitored?
     73 3) Does monitoring traffic cause a load to interfere with the experiment? For example, on ExoGENI, monitoring traffic is on a separate network so it's not a problem.
     75=== GEMINI I&M project update ===
     76by Martin Swany, Indiana University
     78Martin gave an overview of GEMINI.
     80There is a tutorial today which shows the two steps for installing GEMINI...
     82 1) Run script with appropriate credentials and slice name
     84 2) Configure activities through portal
     86Martin showed a variety of screenshots.
     88GEMINI includes:
     89 * Portal
     90 * Measurement Store (like old MA) -- control summarization, put data reduction inline
     91 * Measurement Point (for perfSONAR using BLIPP)
     92 * Integrate with I&M archive
     93 * Working on integrating with GIMI archive
     94 * GEMINI Global Registry (UNIS)
     95 * Event Messaging Service (high rate notification service in nosql) - getting lots of data off systems
     96 * supports perfSONAR
     98What do you get?
     99 * "warm feeling that you know what's going on in your slice"
     100 * passive metrics
     101 * active network measurements
     102 * archiving
     104Martin's provided a long list of capabilities (shown in the slides).
     106New Features include:
     107 * New Measurement Store
     108   - NoSQL data store
     109   - JSON/REST
     110 * NEW UNIS
     111 * BLiPP
     113Campus Infrastructure Monitoring includes:
     114 * !OpenFlow monitoring
     115 * active probes for dynamic resources
     116   -- circuit "acceptance testing"
     118Next Steps:
     119 1. Integrate with ABAC support (has been prototyped)
     120 2. GENI !OneStop interface -- new cool portal
     121     * exp and measurement orchestration and management
     122 3. Application metrics with !NetLogger
     123 4. Configurable, in-line, on-line summarization and data reduction
     124 5. Cooperation with Control Frameworks/rack operators/GMOC
     125     * common metrics across infrastructures
     126     * reduction of duplicate measurements
     127     * common collection tools
     129Discussion Topics:
     130 * What do users and operators want?
     131 * Authentication/Authorization
     132 * measurements should be a first class entity
     133   - standard images should have embedded monitoring
     134 * Ongoing measurements need to be addressed
     135   - access to hypervisor or provide code
     136   - AuthN/Z to initiate new measurement activities
     137 * Global naming, global info service
     138 * Coordinate more on measuring of infra and monitoring applications
     140=== GMOC operational monitoring project update ===
     141by Kevin Bohan, GRNOC
     143Kevin described the new monitoring client and how it's used to report data to GMOC.
     145Goals are to:
     146 * Help experimenters see resources before they start using them
     147 * Let campuses see what's happen in their resources
     149An issue leading up to GEC14: existing tools were very hard to use. Set about fixing that!
     151GMOC Objects API:
     152 * model the state of things in GENI (as a site thinks of it) and submit that to GOMC
     153 * integrate meta-data with time series data
     154 * For example, say I have this aggregate which has these resources which have these interfaces, etc and those have the following stats...
     155 * Previously, had to submit everything at once (even if redundant)
     156 * Now, split that up, so can submit what you know at any point in time
     157 * New Python module supports modeling infrastructure and sending data to GMOC
     158 * "easy to use correct and hard to do it wrong"
     160"easy to use correctly":
     161 - previously had to generate hard to write XML
     162 - Don't have to remember arcane metrics names which could previously be quite long
     163 - had to call multiple scripts, but now there is a single gmoc module
     164"hard to do wrong":
     165 - data is validated before submission BY CLIENT
     166    * IDs are correct format and globally unique (URNs for everything but aggregates)
     167    * object hierarchy is enforced
     168    * various measurement classes, can only set them on appropriate objects
     169 - Partial submissions supported
     170 - Backwards compatible
     172Kevin showed a long list of Modeled Network Elements and time series objects used to report statistics.
     174Kevin showed a number of code examples:
     176 1. Add a measurement manually
     177    - previously several hundred lines in perl
     178    - now just a few lines
     179 2. Can also parse RRD files.
     180    - Now easier
     181    - working on specifying column headers
     182 3. Changing aggregate state
     183    - 5 lines of python code
     185Who's Reporting?
     186 * 20 FOAM aggregates
     187 * GPO SA and GENI CH
     188 * ExoGENI metadata/time series
     189 * InstaGENI
     191Kevin demoed the GMOC DB interface.
     193Future Directions:
     194 * Populate the API with data _from_ GMOC
     195 * more measurements
     196 * parse RSpecs
     197 * support additional languages? currently python
     198 * better visualization of data within the UI/ map
     199 * integration with other projects
     200 * use circuit data operationally
     202=== Discussion: I&M/monitoring and GENI stitching ===
     203Discussion led by Chaos Golubitsky, GPO
     205==== Naming ====
     207Martin Swany: How do we expect links to be named? (eg. seg A and seg B have different names; seg C is composed of A and B)
     209Aaron Helsinger: who picks name between two different networks
     211Martin S: Sender names the link.  (In DCN, links are uni-directional. Each path has distinct properties.  The port and TX link are owned by one side.  The port and RX link are owned by another side.)
     213Tom Lehman: Is there a common link component_id to name the link?
     215Aaron H: Dynamic circuit across physical links may have a different identifier.
     217Aaron H: There is probably a straightforward mapping between URN and real world, but hope just largely a translation problem.  Make sure when we name things in the stitching extension, make sure maps to names on operational names.
     219Chaos: Where are non-GENI URNs coming from?
     221Aaron H/Tom L: ION, DYNES, etc
     223==== VLANs under translation ====
     225How do we know what VLANs are bridged?
     227Tom L: The manifest will have everything in it.
     229Chaos G: May only be available to experimenters.
     231Someone: Collecting info from various places into a common format will help a lot
     233==== Diverse resource types ====
     235Different nodes are running different OSes and environments.  Can we have tools that work across these? Can we help operators understand the network properties of adjacent networks?
     237Martin S: Yes, but we need a common subtrate measurement system (active measurements -- have hypervisor on machine run for you; passive measurement might look different from inside the host)
     239Chaos: There is an analogous !OpenFlow scenario.  May need pcap of what the controller is seeing.
     241Someone: Need ability for experimenters to run active debugging services they don't have permission to run themselves.