Changes between Version 1 and Version 2 of InstrumentationTools/INSTOOLS_final_report
- Timestamp:
- 06/09/16 15:53:15 (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
InstrumentationTools/INSTOOLS_final_report
v1 v2 21 21 aggregate with the ProtoGENI Clearinghouse to become a ProtoGENI edge cluster. 22 22 23 * We implemented the measurement and instrumentationsystem using the ProtoGENI API.23 * We implemented the instrumentation and measurement system using the ProtoGENI API. 24 24 We integrated our existing monitoring and measurement tools into ProtoGENI by 25 25 designing a measurement system architecture which provides a separate measurement 26 26 controller for each experiment. 27 27 28 * We continu ed maintaining and updating the instrumentation and measurement systems.28 * We continuously maintained and updated the instrumentation and measurement system. 29 29 We developed a new web interface to the measurement system based on the 30 30 Drupal content management system and implemented a new monitoring tool 31 to capture Net flow data. We developed the feature31 to capture NetFlow data. We developed the feature 32 32 to support virtual nodes and provide a VNC interface to experiment nodes. 33 33 We developed a portal that provides a single point … … 48 48 We used the ProtoGENI and INSTOOLS in the networking and operating systems classes. 49 49 50 * We made the stable version of our software51 available through the ProtoGENI GIT repository.52 50 53 51 === B. Deliverables made === … … 73 71 ==== 1. Building the Kentucky Aggregate. ==== 74 72 75 We upgraded our existing Edulab facility and then transform ingit into a ProtoGENI73 We upgraded our existing Edulab facility and then transformed it into a ProtoGENI 76 74 edge cluster (aggregate). We began by moving our Edulab cluster into a larger room that would better 77 75 accommodate the additional machines that were to be incorporated into the system. We then fixed and … … 101 99 and Kentucky resources. 102 100 103 ==== 2. Designing and Implementing the instrumentation and measurement system using GENI API. ====104 105 Having developed a better understanding of the ProtoGENI architecture and aggregate API, we began developing101 ==== 2. Designing and Implementing the instrumentation and measurement system using ProtoGENI API. ==== 102 103 Having developed a better understanding of the ProtoGENI architecture and aggregate API, we developed 106 104 an instrumentation and measurement architecture for ProtoGENI that incorporates measurement 107 105 capabilities from our earlier Edulab system. However, because of the differences between Emulab and ProtoGENI, … … 125 123 not available via the ProtoGENI APIs. After discussions with Utah, the ProtoGENI team decided to add a 126 124 new abstraction to the APIs called a manifest that would contain detailed information about resources that 127 should not be in an RSPEC.We then designed, and began using, the manifests that Utah added to their APIs 128 to find the information we needed. While the manifest abstraction was a welcome addition, the manifest 129 implementation and API calls still have not solved all of our issues and have caused us to rethink how and 130 3 131 when topology information should be obtained by our measurement nodes. We have added code to work 132 around these problems, but the issue is worth returning to in the future and may require additional changes 133 to the control framework and its API. 125 should not be in an RSPEC. We then designed, and used, the manifests that Utah added to their APIs 126 to find the information we needed. 134 127 135 128 Because we were learning about the (ever-changing) control framework at the same time that we were 136 129 designing an instrumentation and measurement system to be built on top of it, our design has been evolving 137 as we discover the various features and limitations of the ProtoGENI control framework. We recently130 as we discover the various features and limitations of the ProtoGENI control framework. We 138 131 published a report describing our instrumentation and measurement system (the INSTOOLS system) which 139 132 can be found on the INSTOOLS wiki page at http://groups.geni.net/geni/wiki/InstrumentationTools. Our … … 154 147 data on demand. Users interact with the measurement system through a web interface that allows them to 155 148 select the information they want to see. That information is then dynamically created and updated every five 156 seconds to give users a “live” look at what is occurring in their slice. Ultimately, we plan to use a content 157 management system to give the user even greater control over the information displayed by the measurement 158 system. 149 seconds to give users a “live” look at what is occurring in their slice. 159 150 160 151 Our initial implementation uses … … 165 156 data that is then collected by the MC. The raw data collected is typically stored in RRD database files and 166 157 then converted using rrdtools into graphs that show traffic levels or utilization levels. The MC also houses 167 a web server that provides users with (visual) access to the graphs and charts of measurement data. Our 168 current user interface is a simple PHP-based interface that allows users to select the sliver or link for which 169 they want to see measurement data. The data is then displayed and automatically refreshed every five seconds 170 to give the impression of a “live” view of the running system. 158 a web server that provides users with (visual) access to the graphs and charts of measurement data. 171 159 172 160 This earlier implementation required placing hooks in the ProtoGENI … … 175 163 made it difficult to keep pace with the frequent updates and changes being 176 164 rolled out by the ProtoGENI team. Each time a new version of the ProtoGENI code was released, we had to 177 re implement our changes in the new version. To achieve a level of separation from the ProtoGENI code, we165 re-implement our changes in the new version. To achieve a level of separation from the ProtoGENI code, we 178 166 redesigned our code so that it only interacts with ProtoGENI via the ProtoGENI API calls. In other words, 179 167 we abandoned the modifications that we had made to ProtoGENI code itself, and instead developed ways to … … 181 169 capture the RSPEC before it goes to the ProtoGENI API, adds a measurement controller (MC) node along 182 170 with the code needed on the MC, and then makes the API calls needed to initialize the experiment, the MC, 183 and all the measurement points (nodes). As a result, our code is nowonly dependent on the ProtoGENI API,171 and all the measurement points (nodes). As a result, our code is only dependent on the ProtoGENI API, 184 172 allowing the ProtoGENI implementation to change without affecting our code. 185 173 186 174 Another enhancement we made was to make the instrumentation and measurement deployment and 187 175 teardown independent of the slice setup and teardown. In our original version of the code, we modified the 188 Utah slice creation scripts to set up the in tstrumentation system at the same time. In our most recent version176 Utah slice creation scripts to set up the instrumentation system at the same time. In our most recent version 189 177 of the code, users can create a slice using any of the ProtoGENI approved ways (e.g., via scripts or via the 190 178 Utah flash GUI). After the slice has been established, our new script will discover the topology of the slice … … 192 180 longer needed. 193 181 194 Perhaps the most important advance we made is the ability to instrument slices that span aggregates.195 Our newcode identifies all the aggregates that comprise a slice, locates the component managers for those182 Another important feature is the ability to instrument slices that span aggregates. 183 Our code identifies all the aggregates that comprise a slice, locates the component managers for those 196 184 aggregates, discovers the resources used in each aggregate, and then proceeds to set up an MC in each of 197 185 the aggregates where the slices results will be collected and made available via the web interface. Finally, … … 208 196 209 197 We integrated the Drupal content management system for purposes of displaying the collected measurement 210 information. We nowload Drupal nodes into the database that render the graphs. Because each graph is a211 distinct drupal node, users can define their own views of the measurement data, allowing them to display198 information. We load Drupal nodes into the database that render the graphs. Because each graph is a 199 distinct Drupal node, users can define their own views of the measurement data, allowing them to display 212 200 precisely the information they are interested in. It also allows users to define the theme/look-and-feel of the 213 201 web interface to meet their needs. … … 216 204 to setting up the instrumentation and measurement infrastructure to collect packet data rates, we now set up 217 205 the NetFlow services needed to capture (and categorize) data on a per-flow basis. Similar to our SNMP 218 services, the Netflow capture services send their data to the measurement controller (MC) for processing 219 and viewing. Currently, the flows to be monitored are preconfigured for the user so that they can simply 220 go to the web interface to see most common types of flows (e.g., to well known ports). Changing what is 221 monitored is still a manual task, but we plan to modify that in a future release. 222 223 The experience of adding in a new type of information source (netflow) to our measurement system 206 services, the NetFlow capture services send their data to the measurement controller (MC) for processing 207 and viewing. The flows to be monitored are preconfigured for the user so that they can simply 208 go to the web interface to see most common types of flows (e.g., to well known ports). 209 210 The experience of adding in a new type of information source (NetFlow) to our measurement system 224 211 caused us to think about the problem of simplifying the process of adding new information sources in the 225 212 future. In particular, we wanted it to be easy for users to modify the web interface to display the data they 226 collect on each node or link. To make this possible, we created a web page on the MC where a user can enter213 collect on each node or link. To make this possible, we created a web page on the MC where a user can enter 227 214 a parameterized command that is then used to (automatically) generate all the web pages needed to view 228 that type of data. As a result, it is nowrelatively easy to incorporate new types of (collected) information215 that type of data. As a result, it is relatively easy to incorporate new types of (collected) information 229 216 into the web interface. 230 217 231 As more experiments start to use the ProtoGENI facilities, making efficient use of the resources has become one of our priorities. To that end, we worked with the Utah group to add support for virtual nodes based on OpenVZ so that each physical PC can be used as multiple experimental nodes. Adding support for virtual nodes turned out to be fraught with problems, and so we had to go through several rounds of bug fixes and upgrades to get the latest development version of the code working. In addition we had to manually fix the O PENVZ image for the shared nodes because the change had not been pushed out to the image running at Utah.232 Unlike BSD jails, we found that the OpenVZ image supports the abil ty to run independent SNMP daemons on each OpenVZ node (i.e., one SNMP daemon per VM). This feature allows us to perform per-sliver monitoring without the need to write new monitoring software to understand the OpenVZ virtual interfaces.233 234 Another addition to our software was the ability to access X window software running on the experimental nodes (slivers) via the MC web inter ace. Our goal was to leverage existing network monitoring tools, such as Wireshark and EtherApe?, in order to observe the behavior of experimental nodes. These tools are helpful to collect node statistics and visualize the link traffic. However, they need X window support. To support such access, we added the ability to dynamically load X-window software onto the experimental nodes and then provide indirect access through the MC Web browser and the VNC protocol. It has been added to the the MC's Drupal content management system as one of the menu options. The only way for a user to access the programs running on the experimental nodes is to first login to the Drupal CMS running on the MC. This way, we can assure the slice owner that they are the only ones allowed to access/run these programs. Our Drupal interface currently has two VNC templates that are preconfigured to run xterm and wireshark respectively on the nodes in the slice via VNC. The MC runs a JAVA-based VNC client (in the CMS) that mirrors the VNC connections from the nodes. The VNC communication between the nodes and the MC are protected by a system generated random password that is unique for every slice and invisible to the user.218 As more experiments start to use the ProtoGENI facilities, making efficient use of the resources has become one of our priorities. To that end, we worked with the Utah group to add support for virtual nodes based on OpenVZ so that each physical PC can be used as multiple experimental nodes. Adding support for virtual nodes turned out to be fraught with problems, and so we had to go through several rounds of bug fixes and upgrades to get the latest development version of the code working. In addition we had to manually fix the OpenVZ image for the shared nodes because the change had not been pushed out to the image running at Utah. 219 Unlike BSD jails, we found that the OpenVZ image supports the ability to run independent SNMP daemons on each OpenVZ node (i.e., one SNMP daemon per VM). This feature allows us to perform per-sliver monitoring without the need to write new monitoring software to understand the OpenVZ virtual interfaces. 220 221 Another addition to our software was the ability to access X window software running on the experimental nodes (slivers) via the MC web interface. Our goal was to leverage existing network monitoring tools, such as Wireshark and EtherApe, in order to observe the behavior of experimental nodes. These tools are helpful to collect node statistics and visualize the link traffic. However, they need X window support. To support such access, we added the ability to dynamically load X-window software onto the experimental nodes and then provide indirect access through the MC Web browser and the VNC protocol. It has been added to the the MC's Drupal content management system as one of the menu options. The only way for a user to access the programs running on the experimental nodes is to first login to the Drupal CMS running on the MC. This way, we can assure the slice owner that they are the only ones allowed to access/run these programs. Our Drupal interface has two VNC templates that are preconfigured to run xterm and Wireshark respectively on the nodes in the slice via VNC. The MC runs a JAVA-based VNC client (in the CMS) that mirrors the VNC connections from the nodes. The VNC communication between the nodes and the MC are protected by a system generated random password that is unique for every slice and invisible to the user. 235 222 236 223 To simplify access to a user's measurement data we developed a "portal" system. The portal is a one-stop shop that allows a users to access all their measurement data via a single interface. Instead of requiring the user to visit multiple URLs, where each URL refers to one MC, the user visits a single URL representing the portal. The portal presents a visualization of the topology and allows users to select the measurement data they would like to see. Given the longitude and latitude information for each resource in the slice, the portal can show each resource's actual location on a map with links connecting the resources. If nodes within an aggregate are too close to be distinguished, the portal provides an expanded view to show the logical links among them. By clicking on a node or a link, the users can get a list of measurement information available. They can then choose what data to view. The portal will then present the corresponding tables and/or graphs on the screen. The links can also be color-coded to show the current levels of traffic over the links. … … 244 231 had when viewing the live data. In particular, we 245 232 use OpenVZ containers to provide a virtualized environment in which 246 to run the drupal content management systems that was running on the MC233 to run the Drupal content management systems that was running on the MC 247 234 at the the time the archive was taken. As a result, the user can visit 248 235 the same web pages that were offered by the MC at the time the archive … … 256 243 storage. The files can then be access via the CNRI web interface. 257 244 258 ==== 4. Collaborating with other teams and integrating the INSTOOLS software into the FLACK clien /GUI. ====245 ==== 4. Collaborating with other teams and integrating the INSTOOLS software into the FLACK client/GUI. ==== 259 246 260 247 We worked with the Utah group and the GENI security team while designing … … 274 261 instrumentize button in the FLACK GUI. The GUI will then talk with the 275 262 backend instrumentation servers to instrumentize the slice. While this is 276 occur ing, information about the progress/status of the instrumentize process263 occurring, information about the progress/status of the instrumentize process 277 264 is sent back to the GUI and can be viewed by the user. 278 265 After the slice has been instrumentized, the Go-to-portal button … … 282 269 measurement data he/she is interested in. 283 270 284 The integration of the IN TOOLS with the FLACK interface greatly271 The integration of the INSTOOLS with the FLACK interface greatly 285 272 simplifies the instrumentation process for users, making it much easier 286 273 to use the instrumentation tools. … … 304 291 our Python scripts as flash code making ProtoGENI API calls as 305 292 well as XML-RPC calls to our instrumentation manager (IM). 306 We currentlyrun an instrumentation manager (IM) at each aggregate293 We run an instrumentation manager (IM) at each aggregate 307 294 in concert with the component manager at each aggregate. 308 295 The IM is responsible for sshing to experimental nodes, running … … 322 309 323 310 We demonstrated the INSTOOLS system at the GEC6, GEC7 and GEC10 conferences 324 and gave a tutorialof using the system at the GEC8 and GEC11 conferences.311 and gave the tutorials of using the system at the GEC8 and GEC11 conferences. 325 312 We had several good discussions with other measurement 326 313 groups regarding ways to incorporate their measurement data into … … 331 318 === B. Project participants === 332 319 333 The following individuals have helped with the project in one way or another 334 during the last quarter: 320 The following individuals have helped with the project in one way or another: 335 321 * Jim Griffioen - Project PI 336 322 * Zongming Fei - Project Co-PI … … 362 348 the Internet2 Joint Techs conference held at Clemson University. 363 349 We demonstrated the INSTOOLS system at the GEC6, GEC7 and GEC10 conferences 364 and gave a tutorialof using the system at the GEC8 and GEC11 conferences.365 We have been providing support for the early adopters of our tools.350 and gave the tutorials of using the system at the GEC8 and GEC11 conferences. 351 We have been providing support for the users of our tools. 366 352 367 353 === E. Collaborations ===