Changes between Version 1 and Version 2 of InstrumentationTools/INSTOOLS_final_report


Ignore:
Timestamp:
06/09/16 15:53:15 (8 years ago)
Author:
fei@netlab.uky.edu
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • InstrumentationTools/INSTOOLS_final_report

    v1 v2  
    2121   aggregate with the ProtoGENI Clearinghouse to become a ProtoGENI edge cluster.
    2222
    23  * We implemented the measurement and instrumentation system using the ProtoGENI API.
     23 * We implemented the instrumentation and measurement system using the ProtoGENI API.
    2424   We integrated our existing monitoring and measurement tools into ProtoGENI by
    2525   designing a measurement system architecture which provides a separate measurement
    2626   controller for each experiment.
    2727
    28  * We continued maintaining and updating the instrumentation and measurement systems.
     28 * We continuously maintained and updated the instrumentation and measurement system.
    2929   We developed a new web interface to the measurement system based on the
    3030   Drupal content management system and implemented a new monitoring tool
    31    to capture Netflow data.  We developed the feature
     31   to capture NetFlow data.  We developed the feature
    3232   to support virtual nodes and provide a VNC interface to experiment nodes.
    3333   We developed a portal that provides a single point
     
    4848   We used the ProtoGENI and INSTOOLS in the networking and operating systems classes.
    4949     
    50  * We made the stable version of our software
    51    available through the ProtoGENI GIT repository.   
    5250
    5351=== B. Deliverables made ===
     
    7371==== 1. Building the Kentucky Aggregate. ====
    7472
    75 We upgraded our existing Edulab facility and then transforming it into a ProtoGENI
     73We upgraded our existing Edulab facility and then transformed it into a ProtoGENI
    7674edge cluster (aggregate). We began by moving our Edulab cluster into a larger room that would better
    7775accommodate the additional machines that were to be incorporated into the system. We then fixed and
     
    10199and Kentucky resources.
    102100
    103 ==== 2. Designing and Implementing the instrumentation and measurement system using GENI API. ====
    104 
    105 Having developed a better understanding of the ProtoGENI architecture and aggregate API, we began developing
     101==== 2. Designing and Implementing the instrumentation and measurement system using ProtoGENI API. ====
     102
     103Having developed a better understanding of the ProtoGENI architecture and aggregate API, we developed
    106104an instrumentation and measurement architecture for ProtoGENI that incorporates measurement
    107105capabilities from our earlier Edulab system. However, because of the differences between Emulab and ProtoGENI,
     
    125123not available via the ProtoGENI APIs. After discussions with Utah, the ProtoGENI team decided to add a
    126124new abstraction to the APIs called a manifest that would contain detailed information about resources that
    127 should not be in an RSPEC.We then designed, and began using, the manifests that Utah added to their APIs
    128 to find the information we needed. While the manifest abstraction was a welcome addition, the manifest
    129 implementation and API calls still have not solved all of our issues and have caused us to rethink how and
    130 3
    131 when topology information should be obtained by our measurement nodes. We have added code to work
    132 around these problems, but the issue is worth returning to in the future and may require additional changes
    133 to the control framework and its API.
     125should not be in an RSPEC. We then designed, and used, the manifests that Utah added to their APIs
     126to find the information we needed.
    134127
    135128Because we were learning about the (ever-changing) control framework at the same time that we were
    136129designing an instrumentation and measurement system to be built on top of it, our design has been evolving
    137 as we discover the various features and limitations of the ProtoGENI control framework. We recently
     130as we discover the various features and limitations of the ProtoGENI control framework. We
    138131published a report describing our instrumentation and measurement system (the INSTOOLS system) which
    139132can be found on the INSTOOLS wiki page at http://groups.geni.net/geni/wiki/InstrumentationTools. Our
     
    154147data on demand. Users interact with the measurement system through a web interface that allows them to
    155148select the information they want to see. That information is then dynamically created and updated every five
    156 seconds to give users a “live” look at what is occurring in their slice. Ultimately, we plan to use a content
    157 management system to give the user even greater control over the information displayed by the measurement
    158 system.
     149seconds to give users a “live” look at what is occurring in their slice.
    159150
    160151Our initial implementation uses
     
    165156data that is then collected by the MC. The raw data collected is typically stored in RRD database files and
    166157then converted using rrdtools into graphs that show traffic levels or utilization levels. The MC also houses
    167 a web server that provides users with (visual) access to the graphs and charts of measurement data. Our
    168 current user interface is a simple PHP-based interface that allows users to select the sliver or link for which
    169 they want to see measurement data. The data is then displayed and automatically refreshed every five seconds
    170 to give the impression of a “live” view of the running system.
     158a web server that provides users with (visual) access to the graphs and charts of measurement data.
    171159
    172160This earlier implementation required placing hooks in the ProtoGENI
     
    175163made it difficult to keep pace with the frequent updates and changes being
    176164rolled out by the ProtoGENI team. Each time a new version of the ProtoGENI code was released, we had to
    177 reimplement our changes in the new version. To achieve a level of separation from the ProtoGENI code, we
     165re-implement our changes in the new version. To achieve a level of separation from the ProtoGENI code, we
    178166redesigned our code so that it only interacts with ProtoGENI via the ProtoGENI API calls. In other words,
    179167we abandoned the modifications that we had made to ProtoGENI code itself, and instead developed ways to
     
    181169capture the RSPEC before it goes to the ProtoGENI API, adds a measurement controller (MC) node along
    182170with the code needed on the MC, and then makes the API calls needed to initialize the experiment, the MC,
    183 and all the measurement points (nodes). As a result, our code is now only dependent on the ProtoGENI API,
     171and all the measurement points (nodes). As a result, our code is only dependent on the ProtoGENI API,
    184172allowing the ProtoGENI implementation to change without affecting our code.
    185173
    186174Another enhancement we made was to make the instrumentation and measurement deployment and
    187175teardown independent of the slice setup and teardown. In our original version of the code, we modified the
    188 Utah slice creation scripts to set up the intstrumentation system at the same time. In our most recent version
     176Utah slice creation scripts to set up the instrumentation system at the same time. In our most recent version
    189177of the code, users can create a slice using any of the ProtoGENI approved ways (e.g., via scripts or via the
    190178Utah flash GUI). After the slice has been established, our new script will discover the topology of the slice
     
    192180longer needed.
    193181
    194 Perhaps the most important advance we made is the ability to instrument slices that span aggregates.
    195 Our new code identifies all the aggregates that comprise a slice, locates the component managers for those
     182Another important feature is the ability to instrument slices that span aggregates.
     183Our code identifies all the aggregates that comprise a slice, locates the component managers for those
    196184aggregates, discovers the resources used in each aggregate, and then proceeds to set up an MC in each of
    197185the aggregates where the slices results will be collected and made available via the web interface. Finally,
     
    208196
    209197We integrated the Drupal content management system for purposes of displaying the collected measurement
    210 information. We now load Drupal nodes into the database that render the graphs. Because each graph is a
    211 distinct drupal node, users can define their own views of the measurement data, allowing them to display
     198information. We load Drupal nodes into the database that render the graphs. Because each graph is a
     199distinct Drupal node, users can define their own views of the measurement data, allowing them to display
    212200precisely the information they are interested in. It also allows users to define the theme/look-and-feel of the
    213201web interface to meet their needs.
     
    216204to setting up the instrumentation and measurement infrastructure to collect packet data rates, we now set up
    217205the NetFlow services needed to capture (and categorize) data on a per-flow basis. Similar to our SNMP
    218 services, the Netflow capture services send their data to the measurement controller (MC) for processing
    219 and viewing. Currently, the flows to be monitored are preconfigured for the user so that they can simply
    220 go to the web interface to see most common types of flows (e.g., to well known ports). Changing what is
    221 monitored is still a manual task, but we plan to modify that in a future release.
    222 
    223 The experience of adding in a new type of information source (netflow) to our measurement system
     206services, the NetFlow capture services send their data to the measurement controller (MC) for processing
     207and viewing. The flows to be monitored are preconfigured for the user so that they can simply
     208go to the web interface to see most common types of flows (e.g., to well known ports).
     209
     210The experience of adding in a new type of information source (NetFlow) to our measurement system
    224211caused us to think about the problem of simplifying the process of adding new information sources in the
    225212future. In particular, we wanted it to be easy for users to modify the web interface to display the data they
    226 collect on each node or link. To make this possible, we created a web page on theMC where a user can enter
     213collect on each node or link. To make this possible, we created a web page on the MC where a user can enter
    227214a parameterized command that is then used to (automatically) generate all the web pages needed to view
    228 that type of data. As a result, it is now relatively easy to incorporate new types of (collected) information
     215that type of data. As a result, it is relatively easy to incorporate new types of (collected) information
    229216into the web interface.
    230217
    231 As more experiments start to use the ProtoGENI facilities, making efficient use of the resources has become one of our priorities. To that end, we worked with the Utah group to add support for virtual nodes based on OpenVZ so that each physical PC can be used as multiple experimental nodes. Adding support for virtual nodes turned out to be fraught with problems, and so we had to go through several rounds of bug fixes and upgrades to get the latest development version of the code working. In addition we had to manually fix the OPENVZ image for the shared nodes because the change had not been pushed out to the image running at Utah.
    232 Unlike BSD jails, we found that the OpenVZ image supports the abilty to run independent SNMP daemons on each OpenVZ node (i.e., one SNMP daemon per VM). This feature allows us to perform per-sliver monitoring without the need to write new monitoring software to understand the OpenVZ virtual interfaces.
    233 
    234 Another addition to our software was the ability to access X window software running on the experimental nodes (slivers) via the MC web interace. Our goal was to leverage existing network monitoring tools, such as Wireshark and EtherApe?, in order to observe the behavior of experimental nodes. These tools are helpful to collect node statistics and visualize the link traffic. However, they need X window support. To support such access, we added the ability to dynamically load X-window software onto the experimental nodes and then provide indirect access through the MC Web browser and the VNC protocol. It has been added to the the MC's Drupal content management system as one of the menu options. The only way for a user to access the programs running on the experimental nodes is to first login to the Drupal CMS running on the MC. This way, we can assure the slice owner that they are the only ones allowed to access/run these programs. Our Drupal interface currently has two VNC templates that are preconfigured to run xterm and wireshark respectively on the nodes in the slice via VNC. The MC runs a JAVA-based VNC client (in the CMS) that mirrors the VNC connections from the nodes. The VNC communication between the nodes and the MC are protected by a system generated random password that is unique for every slice and invisible to the user.
     218As more experiments start to use the ProtoGENI facilities, making efficient use of the resources has become one of our priorities. To that end, we worked with the Utah group to add support for virtual nodes based on OpenVZ so that each physical PC can be used as multiple experimental nodes. Adding support for virtual nodes turned out to be fraught with problems, and so we had to go through several rounds of bug fixes and upgrades to get the latest development version of the code working. In addition we had to manually fix the OpenVZ image for the shared nodes because the change had not been pushed out to the image running at Utah.
     219Unlike BSD jails, we found that the OpenVZ image supports the ability to run independent SNMP daemons on each OpenVZ node (i.e., one SNMP daemon per VM). This feature allows us to perform per-sliver monitoring without the need to write new monitoring software to understand the OpenVZ virtual interfaces.
     220
     221Another addition to our software was the ability to access X window software running on the experimental nodes (slivers) via the MC web interface. Our goal was to leverage existing network monitoring tools, such as Wireshark and EtherApe, in order to observe the behavior of experimental nodes. These tools are helpful to collect node statistics and visualize the link traffic. However, they need X window support. To support such access, we added the ability to dynamically load X-window software onto the experimental nodes and then provide indirect access through the MC Web browser and the VNC protocol. It has been added to the the MC's Drupal content management system as one of the menu options. The only way for a user to access the programs running on the experimental nodes is to first login to the Drupal CMS running on the MC. This way, we can assure the slice owner that they are the only ones allowed to access/run these programs. Our Drupal interface has two VNC templates that are preconfigured to run xterm and Wireshark respectively on the nodes in the slice via VNC. The MC runs a JAVA-based VNC client (in the CMS) that mirrors the VNC connections from the nodes. The VNC communication between the nodes and the MC are protected by a system generated random password that is unique for every slice and invisible to the user.
    235222
    236223To simplify access to a user's measurement data we developed a "portal" system. The portal is a one-stop shop that allows a users to access all their measurement data via a single interface. Instead of requiring the user to visit multiple URLs, where each URL refers to one MC, the user visits a single URL representing the portal. The portal presents a visualization of the topology and allows users to select the measurement data they would like to see. Given the longitude and latitude information for each resource in the slice, the portal can show each resource's actual location on a map with links connecting the resources. If nodes within an aggregate are too close to be distinguished, the portal provides an expanded view to show the logical links among them. By clicking on a node or a link, the users can get a list of measurement information available. They can then choose what data to view. The portal will then present the corresponding tables and/or graphs on the screen. The links can also be color-coded to show the current levels of traffic over the links.
     
    244231had when viewing the live data.  In particular, we
    245232use OpenVZ containers to provide a virtualized environment in which
    246 to run the drupal content management systems that was running on the MC
     233to run the Drupal content management systems that was running on the MC
    247234at the the time the archive was taken.  As a result, the user can visit
    248235the same web pages that were offered by the MC at the time the archive
     
    256243storage.  The files can then be access via the CNRI web interface.
    257244
    258 ==== 4. Collaborating with other teams and integrating the INSTOOLS software into the FLACK clien/GUI. ====
     245==== 4. Collaborating with other teams and integrating the INSTOOLS software into the FLACK client/GUI. ====
    259246
    260247We worked with the Utah group and the GENI security team while designing
     
    274261instrumentize button in the FLACK GUI.  The GUI will then talk with the
    275262backend instrumentation servers to instrumentize the slice.  While this is
    276 occuring, information about the progress/status of the instrumentize process
     263occurring, information about the progress/status of the instrumentize process
    277264is sent back to the GUI and can be viewed by the user.
    278265After the slice has been instrumentized, the Go-to-portal button
     
    282269measurement data he/she is interested in.
    283270
    284 The integration of the INTOOLS with the FLACK interface greatly
     271The integration of the INSTOOLS with the FLACK interface greatly
    285272simplifies the instrumentation process for users, making it much easier
    286273to use the instrumentation tools.
     
    304291our Python scripts as flash code making ProtoGENI API calls as
    305292well as XML-RPC calls to our instrumentation manager (IM). 
    306 We currently run an instrumentation manager (IM) at each aggregate
     293We run an instrumentation manager (IM) at each aggregate
    307294in concert with the component manager at each aggregate.
    308295The IM is responsible for sshing to experimental nodes, running
     
    322309
    323310We demonstrated the INSTOOLS system at the GEC6, GEC7 and GEC10 conferences
    324 and gave a tutorial of using the system at the GEC8 and GEC11 conferences.
     311and gave the tutorials of using the system at the GEC8 and GEC11 conferences.
    325312We had several good discussions with other measurement
    326313groups regarding ways to incorporate their measurement data into
     
    331318=== B. Project participants ===
    332319
    333 The following individuals have helped with the project in one way or another
    334 during the last quarter:
     320The following individuals have helped with the project in one way or another:
    335321 * Jim Griffioen - Project PI
    336322 * Zongming Fei - Project Co-PI
     
    362348the Internet2 Joint Techs conference held at Clemson University.
    363349We demonstrated the INSTOOLS system at the GEC6, GEC7 and GEC10 conferences
    364 and gave a tutorial of using the system at the GEC8 and GEC11 conferences.
    365 We have been providing support for the early adopters of our tools.
     350and gave the tutorials of using the system at the GEC8 and GEC11 conferences.
     351We have been providing support for the users of our tools.
    366352
    367353=== E. Collaborations ===