| 15 | |
| 16 | == Agenda == |
| 17 | * [#GIMIIMprojectupdate GIMI I&M project update] |
| 18 | - Mike Zink, UMass Amherst |
| 19 | * [#GEMINIIMprojectupdate GEMINI I&M project update] |
| 20 | - Martin Swany, Indiana University |
| 21 | * [#GMOCoperationalmonitoringprojectupdate GMOC operational monitoring project update] |
| 22 | - Kevin Bohan, GRNOC |
| 23 | * [#Discussion:IMmonitoringandGENIstitching Discussion: I&M/monitoring and GENI stitching] |
| 24 | - Chaos Golubitsky, GPO |
| 25 | |
| 26 | == Summary == |
| 27 | |
| 28 | At this session, the GIMI and GEMINI I&M projects and the GMOC monitoring project summarized their status and raised some issues including: |
| 29 | * Does always-on infrastructure monitoring of slices raise opt-in issues? Performance issues? |
| 30 | * Can we have standardized images with embedded monitoring? |
| 31 | * Do we need global naming for entities and measurements? |
| 32 | |
| 33 | We then discussed "what can monitoring and I&M provide to help with the cross-aggregate stitching effort?" |
| 34 | |
| 35 | Topics raised include: |
| 36 | * Will we have consistent naming of links among all participants? Probably, but make sure we can handle non-GENI-aware devices in the path. |
| 37 | * VLAN mappings/translations in an experiment are visible to the experimenter via manifest. Provide a way for operators to get this information. |
| 38 | * We need a substrate measurement infrastructure which experimenters can use to perform active measurements they wouldn’t normally have access to (query a hypervisor, get a pcap of sliver traffic) |
| 39 | |
| 40 | == Detailed Notes == |
| 41 | |
| 42 | === Introduction === |
| 43 | |
| 44 | by Chaos Golubitsky, GPO |
| 45 | |
| 46 | Chaos introduced the session. Both I&M and monitoring tools provide visibility into the state of: |
| 47 | * GENI experiments and slices |
| 48 | * Operational GENI resources, campuses, and aggregates |
| 49 | * GENI-controlled networks, and non-GENI networks which GENI uses |
| 50 | The audience for these tools is both: |
| 51 | * Experimenters who use GENI (and the staffers, instructors, and operators who help them) |
| 52 | * Operators who run GENI infrastructure (especially when GENI runs on shared networks) |
| 53 | |
| 54 | === GIMI I&M project update === |
| 55 | by Mike Zink, UMass Amherst |
| 56 | |
| 57 | Mike gave an overview of the GIMI project and explained how GIMI can be used to monitor slices. |
| 58 | |
| 59 | GIMI supports experimenter running experiments, collecting data, and analyzing measurements. |
| 60 | |
| 61 | GIMI currently works on ExoGENI and they are working on !WiMax support. |
| 62 | |
| 63 | Currently publish images that are preinstalled with tools for measurement. The GEC15 Virtual Box VM contains all of the I&M tools including GIMI, iRODS, and IREEL and will be a major part of Thurs tutorial. |
| 64 | |
| 65 | GIMI supports passive monitoring of slices. |
| 66 | |
| 67 | Mike raised the following issues: |
| 68 | |
| 69 | 1) Monitoring an experimenter slice raises experimenter opt-in issues. |
| 70 | |
| 71 | 2) What will be monitored? |
| 72 | |
| 73 | 3) Does monitoring traffic cause a load to interfere with the experiment? For example, on ExoGENI, monitoring traffic is on a separate network so it's not a problem. |
| 74 | |
| 75 | === GEMINI I&M project update === |
| 76 | by Martin Swany, Indiana University |
| 77 | |
| 78 | Martin gave an overview of GEMINI. |
| 79 | |
| 80 | There is a tutorial today which shows the two steps for installing GEMINI... |
| 81 | |
| 82 | 1) Run instrumentize.py script with appropriate credentials and slice name |
| 83 | |
| 84 | 2) Configure activities through portal |
| 85 | |
| 86 | Martin showed a variety of screenshots. |
| 87 | |
| 88 | GEMINI includes: |
| 89 | * Portal |
| 90 | * Measurement Store (like old MA) -- control summarization, put data reduction inline |
| 91 | * Measurement Point (for perfSONAR using BLIPP) |
| 92 | * Integrate with I&M archive |
| 93 | * Working on integrating with GIMI archive |
| 94 | * GEMINI Global Registry (UNIS) |
| 95 | * Event Messaging Service (high rate notification service in nosql) - getting lots of data off systems |
| 96 | * supports perfSONAR |
| 97 | |
| 98 | What do you get? |
| 99 | * "warm feeling that you know what's going on in your slice" |
| 100 | * passive metrics |
| 101 | * active network measurements |
| 102 | * archiving |
| 103 | |
| 104 | Martin's provided a long list of capabilities (shown in the slides). |
| 105 | |
| 106 | New Features include: |
| 107 | * New Measurement Store |
| 108 | - NoSQL data store |
| 109 | - JSON/REST |
| 110 | * NEW UNIS |
| 111 | * BLiPP |
| 112 | |
| 113 | Campus Infrastructure Monitoring includes: |
| 114 | * !OpenFlow monitoring |
| 115 | * active probes for dynamic resources |
| 116 | -- circuit "acceptance testing" |
| 117 | |
| 118 | Next Steps: |
| 119 | 1. Integrate with ABAC support (has been prototyped) |
| 120 | 2. GENI !OneStop interface -- new cool portal |
| 121 | * exp and measurement orchestration and management |
| 122 | 3. Application metrics with !NetLogger |
| 123 | 4. Configurable, in-line, on-line summarization and data reduction |
| 124 | 5. Cooperation with Control Frameworks/rack operators/GMOC |
| 125 | * common metrics across infrastructures |
| 126 | * reduction of duplicate measurements |
| 127 | * common collection tools |
| 128 | |
| 129 | Discussion Topics: |
| 130 | * What do users and operators want? |
| 131 | * Authentication/Authorization |
| 132 | * measurements should be a first class entity |
| 133 | - standard images should have embedded monitoring |
| 134 | * Ongoing measurements need to be addressed |
| 135 | - access to hypervisor or provide code |
| 136 | - AuthN/Z to initiate new measurement activities |
| 137 | * Global naming, global info service |
| 138 | * Coordinate more on measuring of infra and monitoring applications |
| 139 | |
| 140 | === GMOC operational monitoring project update === |
| 141 | by Kevin Bohan, GRNOC |
| 142 | |
| 143 | Kevin described the new monitoring client and how it's used to report data to GMOC. |
| 144 | |
| 145 | Goals are to: |
| 146 | * Help experimenters see resources before they start using them |
| 147 | * Let campuses see what's happen in their resources |
| 148 | |
| 149 | An issue leading up to GEC14: existing tools were very hard to use. Set about fixing that! |
| 150 | |
| 151 | GMOC Objects API: |
| 152 | * model the state of things in GENI (as a site thinks of it) and submit that to GOMC |
| 153 | * integrate meta-data with time series data |
| 154 | * For example, say I have this aggregate which has these resources which have these interfaces, etc and those have the following stats... |
| 155 | * Previously, had to submit everything at once (even if redundant) |
| 156 | * Now, split that up, so can submit what you know at any point in time |
| 157 | * New Python module supports modeling infrastructure and sending data to GMOC |
| 158 | * "easy to use correct and hard to do it wrong" |
| 159 | |
| 160 | "easy to use correctly": |
| 161 | - previously had to generate hard to write XML |
| 162 | - Don't have to remember arcane metrics names which could previously be quite long |
| 163 | - had to call multiple scripts, but now there is a single gmoc module |
| 164 | "hard to do wrong": |
| 165 | - data is validated before submission BY CLIENT |
| 166 | * IDs are correct format and globally unique (URNs for everything but aggregates) |
| 167 | * object hierarchy is enforced |
| 168 | * various measurement classes, can only set them on appropriate objects |
| 169 | - Partial submissions supported |
| 170 | - Backwards compatible |
| 171 | |
| 172 | Kevin showed a long list of Modeled Network Elements and time series objects used to report statistics. |
| 173 | |
| 174 | Kevin showed a number of code examples: |
| 175 | |
| 176 | 1. Add a measurement manually |
| 177 | - previously several hundred lines in perl |
| 178 | - now just a few lines |
| 179 | 2. Can also parse RRD files. |
| 180 | - Now easier |
| 181 | - working on specifying column headers |
| 182 | 3. Changing aggregate state |
| 183 | - 5 lines of python code |
| 184 | |
| 185 | Who's Reporting? |
| 186 | * 20 FOAM aggregates |
| 187 | * GPO SA and GENI CH |
| 188 | * ExoGENI metadata/time series |
| 189 | * InstaGENI |
| 190 | |
| 191 | Kevin demoed the GMOC DB interface. |
| 192 | |
| 193 | Future Directions: |
| 194 | * Populate the API with data _from_ GMOC |
| 195 | * more measurements |
| 196 | * parse RSpecs |
| 197 | * support additional languages? currently python |
| 198 | * better visualization of data within the UI/ map |
| 199 | * integration with other projects |
| 200 | * use circuit data operationally |
| 201 | |
| 202 | === Discussion: I&M/monitoring and GENI stitching === |
| 203 | Discussion led by Chaos Golubitsky, GPO |
| 204 | |
| 205 | ==== Naming ==== |
| 206 | |
| 207 | Martin Swany: How do we expect links to be named? (eg. seg A and seg B have different names; seg C is composed of A and B) |
| 208 | |
| 209 | Aaron Helsinger: who picks name between two different networks |
| 210 | |
| 211 | Martin S: Sender names the link. (In DCN, links are uni-directional. Each path has distinct properties. The port and TX link are owned by one side. The port and RX link are owned by another side.) |
| 212 | |
| 213 | Tom Lehman: Is there a common link component_id to name the link? |
| 214 | |
| 215 | Aaron H: Dynamic circuit across physical links may have a different identifier. |
| 216 | |
| 217 | Aaron H: There is probably a straightforward mapping between URN and real world, but hope just largely a translation problem. Make sure when we name things in the stitching extension, make sure maps to names on operational names. |
| 218 | |
| 219 | Chaos: Where are non-GENI URNs coming from? |
| 220 | |
| 221 | Aaron H/Tom L: ION, DYNES, etc |
| 222 | |
| 223 | ==== VLANs under translation ==== |
| 224 | |
| 225 | How do we know what VLANs are bridged? |
| 226 | |
| 227 | Tom L: The manifest will have everything in it. |
| 228 | |
| 229 | Chaos G: May only be available to experimenters. |
| 230 | |
| 231 | Someone: Collecting info from various places into a common format will help a lot |
| 232 | |
| 233 | ==== Diverse resource types ==== |
| 234 | |
| 235 | Different nodes are running different OSes and environments. Can we have tools that work across these? Can we help operators understand the network properties of adjacent networks? |
| 236 | |
| 237 | Martin S: Yes, but we need a common subtrate measurement system (active measurements -- have hypervisor on machine run for you; passive measurement might look different from inside the host) |
| 238 | |
| 239 | Chaos: There is an analogous !OpenFlow scenario. May need pcap of what the controller is seeing. |
| 240 | |
| 241 | Someone: Need ability for experimenters to run active debugging services they don't have permission to run themselves. |
| 242 | |
| 243 | |
| 244 | |