Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of InstrumentationTools/INSTOOLS_final_report

Timestamp:: 06/09/16 14:41:07 (8 years ago)
Author:: fei@netlab.uky.edu
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

InstrumentationTools/INSTOOLS_final_report

                       v1
+[[PageOutline]]
+= INSTOOLS Project Final Report =
+June 2016
+== I. Major accomplishments ==
+We finished all the milestones stated in the contract and its modifications.
+The following highlights our accomplishments for the project.
+=== A. Milestones achieved ===
+During the project period we achieved the following milestones:
+ * We upgraded our Edulab facility including installation and incorporation of
+   new hardware into the Edulab system. We also upgraded the University’s
+   Internet 2 network connection. We installed the latest copy of the Utah
+   ProtoGENI code (i.e., the aggregate manager) on the Kentucky Edulab system
+   resulting in a local aggregate implementation. We integrated the Kentucky
+   aggregate with the ProtoGENI Clearinghouse to become a ProtoGENI edge cluster.
+ * We implemented the measurement and instrumentation system using the ProtoGENI API.
+   We integrated our existing monitoring and measurement tools into ProtoGENI by
+   designing a measurement system architecture which provides a separate measurement
+   controller for each experiment.
+ * We continued maintaining and updating the instrumentation and measurement systems.
+   We developed a new web interface to the measurement system based on the
+   Drupal content management system and implemented a new monitoring tool
+   to capture Netflow data.  We developed the feature
+   to support virtual nodes and provide a VNC interface to experiment nodes.
+   We developed a portal that provides a single point
+   of entry/control across various aggregates, and an archival system with
+   features to archive data for future use and analysis.
+ * We worked with the Utah group and the GENI security team while designing
+   our instrumentation and measurement system, discussing methods for
+   authentication and secure access to slice and measurement resources.
+   In the process we provided Utah with feedback and changes needed to
+   make the code work on our aggregate. We worked with the Utah group and
+   integrated the INSTOOLS software into the FLACK client/GUI.
+ * We provided continual support for the operation of the Kentucky aggregate.
+ * We gave tutorials and demos of the instrumentation tools
+   at GECs to experimenters and educators.
+   We used the ProtoGENI and INSTOOLS in the networking and operating systems classes.
+ * We made the stable version of our software
+   available through the ProtoGENI GIT repository.
+=== B. Deliverables made ===
+ * We made the stable version of our software available via the
+   ProtoGENI GIT repository.
+ * We posted the documentation about INSTOOLS at
+   [http://groups.geni.net/geni/wiki/INSTOOLSSummary http://groups.geni.net/geni/wiki/INSTOOLSSummary]
+ * We posted a tutorial providing step-by-step instructions
+   on how to use INSTOOLS at
+   [http://www.netlab.uky.edu/p/instools/ http://www.netlab.uky.edu/p/instools/]
+== II. Description of work performed during the project period ==
+The following provides a description of our activities and results for the project.
+=== A. Activities and findings ===
+Our activities and findings can be described in the following six aspects.
+==== 1. Building the Kentucky Aggregate. ====
+We upgraded our existing Edulab facility and then transforming it into a ProtoGENI
+edge cluster (aggregate). We began by moving our Edulab cluster into a larger room that would better
+accommodate the additional machines that were to be incorporated into the system. We then fixed and
+upgraded the existing Edulab hardware in preparation for its conversion to the ProtoGENI system. We also
+purchased and installed 24 new PCs and integrated them with the existing 47 machines to bring the total
+number of PCs to just over 70. We installed extra network interfaces in each PC so that each PC has one
+interface on the control network and an additional 4 interfaces on the experimental network. We installed
+existing and newly purchased switches to provide the ports that we needed for the new machines on the
+control and experimental networks. We also installed two power controllers to enable remote power cycling
+of the machines, and we upgraded the University’s pathway to Internet 2 to be a 10 Gbps connection.
+In preparation for the conversion to the ProtoGENI code, we began by upgrading our software to the
+latest release of the Emulab software. Having upgraded to the latest Emulab code, we were ready to convert
+to the ProtoGENI code. The conversion to the ProtoGENI code and its aggregate managers (i.e., the component
+and slice managers) went relatively smoothly, but was not without problems. The process was an
+iterative one, installing and trying the code, finding problems, working with Utah to address the issues, and
+then re-testing the code.
+We originally planned to set up the Kentucky aggregate as an independent ProtoGENI cluster, and then
+connect to the ProtoGENI clearinghouse at a later date. However, after discussing this with the Utah group,
+we decided against this approach and instead connected to the ProtoGENI clearinghouse from the start. As
+a result, the Kentucky aggregate has been connected to–and registered with–the ProtoGENI clearinghouse
+from the start so that Kentucky certificates and resources are known by the ProtoGENI clearinghouse and
+can be verified/looked-up by clients. We operate a local slice authority and component manager that is
+known to the clearinghouse and can be contacted to obtain details of the current resources that have been
+allocated. Because our system is integrated with the Utah clearinghouse, slices can be allocated across Utah
+and Kentucky resources.
+==== 2. Designing and Implementing the instrumentation and measurement system using GENI API. ====
+Having developed a better understanding of the ProtoGENI architecture and aggregate API, we began developing
+an instrumentation and measurement architecture for ProtoGENI that incorporates measurement
+capabilities from our earlier Edulab system. However, because of the differences between Emulab and ProtoGENI,
+we had to take a different approach to the design of the instrumentation and measurement system
+than we did with Edulab. In particular, we decided to use experiment-specific (i.e., slice-specific) measurement
+nodes, creating a local measurement system within each experiment (slice). Slice-specific monitoring
+matches the standard usage model in which users only work with their own experiments and thus are interested
+in collecting measurement information for their own experiment, not the network as a whole. It also
+allows users to keep measurement data private and local, but yet allows data to be made public if desired.
+Another design decision that we made was to build our instrumentation and measurement system using
+the ProtoGENI API instead of tapping directly into the Emulab database system as we had done in our
+previous Edulab implementation. Mapping our system onto the ProtoGENI APIs and infrastructure was
+rather challenging, in part because ProtoGENI itself was under development itself and did not yet have a
+set of guidelines or best practices describing the “approved” or “correct” way to hook into, and interoperate
+with, the ProtoGENI infrastructure. In order to flesh out the details of our design, we had to build components,
+connect them into ProtoGENI via the API, and try them out, often finding problems/limitations of the
+API that in turn required redesigning our architecture (and software). In that sense, fleshing out a detailed
+design for our measurement system was an iterative process. As part of this work, we realized that some
+of the information we needed in order to set up the instrumentation and measurement infrastructure was
+not available via the ProtoGENI APIs. After discussions with Utah, the ProtoGENI team decided to add a
+new abstraction to the APIs called a manifest that would contain detailed information about resources that
+should not be in an RSPEC.We then designed, and began using, the manifests that Utah added to their APIs
+to find the information we needed. While the manifest abstraction was a welcome addition, the manifest
+implementation and API calls still have not solved all of our issues and have caused us to rethink how and
+when topology information should be obtained by our measurement nodes. We have added code to work
+around these problems, but the issue is worth returning to in the future and may require additional changes
+to the control framework and its API.
+Because we were learning about the (ever-changing) control framework at the same time that we were
+designing an instrumentation and measurement system to be built on top of it, our design has been evolving
+as we discover the various features and limitations of the ProtoGENI control framework. We recently
+published a report describing our instrumentation and measurement system (the INSTOOLS system) which
+can be found on the INSTOOLS wiki page at http://groups.geni.net/geni/wiki/InstrumentationTools. Our
+architectural design divides the system into the following components: measurement setup, data capture,
+data collection, data storage, data processing, measurement control, access, and presentation. The design
+attempts to separate the instrumentation and measurement functionality from the control framework functionality
+as much as possible so that the code for both can evolve independently. As a result, our INSTOOLS
+system primarily interacts with the control framework via the ProtoGENI API. We limited the number of
+modifications we had to make to the control framework to a handful of places in the setup code where we
+insert calls to invoke our INSTOOLS measurement setup code. Instead of deploying our own INSTOOL
+servers and services, our setup code automatically creates the necessary services by secretly adding additional
+GENI resources to a user’s slice and then configuring those resources to carry out the INSTOOLs
+measurement tasks. In other words, the measurement infrastructure is itself part of the slice/experiment.
+In regards to the design of the user interface and display of data, we decided to take a lazy-evaluation
+approach. The raw data collected by the system is saved as RRD database files. Instead of generating graphs
+and image files immediately for potential use in the future, our system generates graphs from the RRD
+data on demand. Users interact with the measurement system through a web interface that allows them to
+select the information they want to see. That information is then dynamically created and updated every five
+seconds to give users a “live” look at what is occurring in their slice. Ultimately, we plan to use a content
+management system to give the user even greater control over the information displayed by the measurement
+system.
+Our initial implementation uses
+hooks into the ProtoGENI control framework to invoke our code that sets up a measurement control (MC)
+node for each slice. Measurement control nodes form the heart of our measurement system, controlling
+software on the Measurement Points (MPs), collecting data from MPs, and even making the data available
+to users via a web server. As part of the setup, each sliver (MP) launches software to capture measurement
+data that is then collected by the MC. The raw data collected is typically stored in RRD database files and
+then converted using rrdtools into graphs that show traffic levels or utilization levels. The MC also houses
+a web server that provides users with (visual) access to the graphs and charts of measurement data. Our
+current user interface is a simple PHP-based interface that allows users to select the sliver or link for which
+they want to see measurement data. The data is then displayed and automatically refreshed every five seconds
+to give the impression of a “live” view of the running system.
+This earlier implementation required placing hooks in the ProtoGENI
+code (i.e., rewriting and enhancing parts of the ProtoGENI source code)
+and
+made it difficult to keep pace with the frequent updates and changes being
+rolled out by the ProtoGENI team. Each time a new version of the ProtoGENI code was released, we had to
+reimplement our changes in the new version. To achieve a level of separation from the ProtoGENI code, we
+redesigned our code so that it only interacts with ProtoGENI via the ProtoGENI API calls. In other words,
+we abandoned the modifications that we had made to ProtoGENI code itself, and instead developed ways to
+achieve the same results simply by making calls to the ProtoGENI API. In particular, we wrote scripts that
+capture the RSPEC before it goes to the ProtoGENI API, adds a measurement controller (MC) node along
+with the code needed on the MC, and then makes the API calls needed to initialize the experiment, the MC,
+and all the measurement points (nodes). As a result, our code is now only dependent on the ProtoGENI API,
+allowing the ProtoGENI implementation to change without affecting our code.
+Another enhancement we made was to make the instrumentation and measurement deployment and
+teardown independent of the slice setup and teardown. In our original version of the code, we modified the
+Utah slice creation scripts to set up the intstrumentation system at the same time. In our most recent version
+of the code, users can create a slice using any of the ProtoGENI approved ways (e.g., via scripts or via the
+Utah flash GUI). After the slice has been established, our new script will discover the topology of the slice
+and instrument it appropriately. We also created scripts to remove instrumentation from a slice when it is no
+longer needed.
+Perhaps the most important advance we made is the ability to instrument slices that span aggregates.
+Our new code identifies all the aggregates that comprise a slice, locates the component managers for those
+aggregates, discovers the resources used in each aggregate, and then proceeds to set up an MC in each of
+the aggregates where the slices results will be collected and made available via the web interface. Finally,
+our code installs and initiates the measurement software on the resources in each of the aggregates, with
+measurement data from each resource being directed to the appropriate MC. Our scripts also allow a user
+to instrument only portions of a slice by selecting which aggregates should be instrumented and which ones
+should not be instrumented.
+==== 3. Maintaining and updating the instrumentation and measurement system. ====
+After the initial instrumentation and measurement system is built, we have been
+in the process of continuously maintaining and updating the system. We describe
+several major updates to the system here.
+We integrated the Drupal content management system for purposes of displaying the collected measurement
+information. We now load Drupal nodes into the database that render the graphs. Because each graph is a
+distinct drupal node, users can define their own views of the measurement data, allowing them to display
+precisely the information they are interested in. It also allows users to define the theme/look-and-feel of the
+web interface to meet their needs.
+We completed our implementation of support for NetFlow data collected within a user’s slice. In addition
+to setting up the instrumentation and measurement infrastructure to collect packet data rates, we now set up
+the NetFlow services needed to capture (and categorize) data on a per-flow basis. Similar to our SNMP
+services, the Netflow capture services send their data to the measurement controller (MC) for processing
+and viewing. Currently, the flows to be monitored are preconfigured for the user so that they can simply
+go to the web interface to see most common types of flows (e.g., to well known ports). Changing what is
+monitored is still a manual task, but we plan to modify that in a future release.
+The experience of adding in a new type of information source (netflow) to our measurement system
+caused us to think about the problem of simplifying the process of adding new information sources in the
+future. In particular, we wanted it to be easy for users to modify the web interface to display the data they
+collect on each node or link. To make this possible, we created a web page on theMC where a user can enter
+a parameterized command that is then used to (automatically) generate all the web pages needed to view
+that type of data. As a result, it is now relatively easy to incorporate new types of (collected) information
+into the web interface.
+As more experiments start to use the ProtoGENI facilities, making efficient use of the resources has become one of our priorities. To that end, we worked with the Utah group to add support for virtual nodes based on OpenVZ so that each physical PC can be used as multiple experimental nodes. Adding support for virtual nodes turned out to be fraught with problems, and so we had to go through several rounds of bug fixes and upgrades to get the latest development version of the code working. In addition we had to manually fix the OPENVZ image for the shared nodes because the change had not been pushed out to the image running at Utah.
+Unlike BSD jails, we found that the OpenVZ image supports the abilty to run independent SNMP daemons on each OpenVZ node (i.e., one SNMP daemon per VM). This feature allows us to perform per-sliver monitoring without the need to write new monitoring software to understand the OpenVZ virtual interfaces.
+Another addition to our software was the ability to access X window software running on the experimental nodes (slivers) via the MC web interace. Our goal was to leverage existing network monitoring tools, such as Wireshark and EtherApe?, in order to observe the behavior of experimental nodes. These tools are helpful to collect node statistics and visualize the link traffic. However, they need X window support. To support such access, we added the ability to dynamically load X-window software onto the experimental nodes and then provide indirect access through the MC Web browser and the VNC protocol. It has been added to the the MC's Drupal content management system as one of the menu options. The only way for a user to access the programs running on the experimental nodes is to first login to the Drupal CMS running on the MC. This way, we can assure the slice owner that they are the only ones allowed to access/run these programs. Our Drupal interface currently has two VNC templates that are preconfigured to run xterm and wireshark respectively on the nodes in the slice via VNC. The MC runs a JAVA-based VNC client (in the CMS) that mirrors the VNC connections from the nodes. The VNC communication between the nodes and the MC are protected by a system generated random password that is unique for every slice and invisible to the user.
+To simplify access to a user's measurement data we developed a "portal" system. The portal is a one-stop shop that allows a users to access all their measurement data via a single interface. Instead of requiring the user to visit multiple URLs, where each URL refers to one MC, the user visits a single URL representing the portal. The portal presents a visualization of the topology and allows users to select the measurement data they would like to see. Given the longitude and latitude information for each resource in the slice, the portal can show each resource's actual location on a map with links connecting the resources. If nodes within an aggregate are too close to be distinguished, the portal provides an expanded view to show the logical links among them. By clicking on a node or a link, the users can get a list of measurement information available. They can then choose what data to view. The portal will then present the corresponding tables and/or graphs on the screen. The links can also be color-coded to show the current levels of traffic over the links.
+We also connected our INSTOOLS system with two
+archival systems: (1) the University of Kentucky Archive Service (UKAS)
+and (2) the CNRI Archive Service.
+Our UKAS archive service not only
+provides a repository for data, but it also provides a computational
+environment where we can recreate the same look-and-feel the user
+had when viewing the live data.  In particular, we
+use OpenVZ containers to provide a virtualized environment in which
+to run the drupal content management systems that was running on the MC
+at the the time the archive was taken.  As a result, the user can visit
+the same web pages that were offered by the MC at the time the archive
+was taken.
+We have also incorporated support for the CNRI archive service and its
+concept of workspaces.  In particular, our system is able to store
+the data files containing measurements information into a CNRI workspace
+associated with a slice.  After the data has been stored in the workspace,
+the system adds the necessary metadata needed by the CNRI archive to move
+the measurement files from the workspace to the CNRI archive for permanent
+storage.  The files can then be access via the CNRI web interface.
+==== 4. Collaborating with other teams and integrating the INSTOOLS software into the FLACK clien/GUI. ====
+We worked with the Utah group and the GENI security team while designing
+our instrumentation and measurement system, discussing methods for
+authentication and secure access to slice and measurement resources.
+In particular, we collaborated with the Utah team to integrate
+the INSTOOLS software with the FLACK client/GUI.
+Instead of requiring users to run a set of scripts to instrumentize their
+experiment, we modified the FLACK GUI so that users can instrumentize their
+slice simply by clicking on a button in the GUI.
+In particular, we added two new buttons to the FLACK GUI,
+one to add
+instrumentation to a slice and the other to
+access the portal site.
+To instrumentize an existing slice, a user simply needs to click on the
+instrumentize button in the FLACK GUI.  The GUI will then talk with the
+backend instrumentation servers to instrumentize the slice.  While this is
+occuring, information about the progress/status of the instrumentize process
+is sent back to the GUI and can be viewed by the user.
+After the slice has been instrumentized, the Go-to-portal button
+is enabled, allowing the user to view the topology
+and the traffic moving across various nodes and links.
+The user can pick any node or link in the experiment to observe the
+measurement data he/she is interested in.
+The integration of the INTOOLS with the FLACK interface greatly
+simplifies the instrumentation process for users, making it much easier
+to use the instrumentation tools.
+From a user's perspective, the integration also enables a sort of single
+sign-on service in which the user authenticates to the FLACK client, but
+then the FLACK client and the backend instrumentation manager handles
+all the authentication on behalf of the user to the long lists of
+services that comprise the instrumentation system.
+The portal button on the FLACK client takes the user
+directly to our GENI Monitoring Portal (GMP) so that the user
+does not need to remember URLs or have to login to the GMP
+on their own.
+To integrate the instrumentation tools with the FLACK client,
+we redesigned our software so that the installation and deployment
+of the instrumentation software is done by a backend instrumentation
+manager that, like a component manager, is responsible for allocating
+and setting up instrumentation infrastructure.  As a result, the
+user interface components of our system could be separated out
+and integrated into the FLACK client.  This involved rewriting
+our Python scripts as flash code making ProtoGENI API calls as
+well as XML-RPC calls to our instrumentation manager (IM).
+We currently run an instrumentation manager (IM) at each aggregate
+in concert with the component manager at each aggregate.
+The IM is responsible for sshing to experimental nodes, running
+initialization scripts, setting up the MC nodes, etc and, like a component
+manager, has the authority needed to carry out these tasks.
+==== 5. Supporting the operation of the Kentucky aggregate and the use of INSTOOLS. ====
+We provided continual support to the operation of the Kentucky aggregate
+and enabled experimentation through the ProtoGENI clearinghouse.
+We identified multiple GENI experiments that will use our Instrumentation Tools.
+We made contact with these experimenters and have been providing support as they
+begin to use the tools.  We established a web site
+with documentation/tutorials/examples to help experimenters get started.
+==== 6. Giving the tutorials and demos at GECs and using the tools in classes. ====
+We demonstrated the INSTOOLS system at the GEC6, GEC7 and GEC10 conferences
+and gave a tutorial of using the system at the GEC8 and GEC11 conferences.
+We had several good discussions with other measurement
+groups regarding ways to incorporate their measurement data into
+our measurement interface. We used the GENI and INSTOOLS in several
+of our networking and operating systems classes.
+=== B. Project participants ===
+The following individuals have helped with the project in one way or another
+during the last quarter:
+ * Jim Griffioen - Project PI
+ * Zongming Fei - Project Co-PI
+ * Hussamuddin Nasir - Technician/programmer
+ * Xiongqi Wu - Research Assistant
+ * Jeremy Reed - Research Assistant
+ * Lowell Pike - Network administrator
+ * Woody Marvel - Network administrator
+=== C. Publications (individual and organizational) ===
+ * James Griffioen and Zongming Fei, "Automatic Creation of Experiment-specific Measurement Infrastructure," In the proceedings of the First Workshop on Performance Evaluation of Next-Generation Networks (Neteval09), Boston MA, April 2009.
+ * GENI Report: J. Griffioen, Z. Fei, H. Nasir, Architectural Design and Specification of the INSTOOLS Measurement System, December 2009.
+ * Jonathon Duerig, Robert Ricci, Leigh Stoller, Matt Strum, Gary Wong, Charles Carpenter, Zongming Fei, James Griffioen, Hussamuddin Nasir, Jeremy Reed, Xiongqi Wu, "Getting started with GENI: A user tutorial," ACM SIGCOMM Computer Communication Review (CCR), vol.42, no.1, pp.72-77, January 2012.
+ * James Griffioen, Zongming Fei, Hussamuddin Nasir, Xiongqi Wu, Jeremy Reed, Charles Carpenter, "The Design of an Instrumentation System for Federated and Virtualized Network Testbeds", Proc. of the First IEEE Workshop on Algorithms and Operating Procedures of Federated Virtualized Networks (FEDNET), Maui, Hawaii, April 2012.
+ * James Griffioen, Zongming Fei, Hussamuddin Nasir, Xiongqi Wu, Jeremy Reed, Charles Carpenter, "Measuring experiments in GENI," Computer Networks, vol.63, pp.17-32, 2014.
+=== D. Outreach activities ===
+We participated in the GENI Measurement conference and
+were involved in the activities of GENI measurement working group.
+We gave a talk about our work at the Neteval 2009 held in Boston and
+the Internet2 Joint Techs conference held at Clemson University.
+We demonstrated the INSTOOLS system at the GEC6, GEC7 and GEC10 conferences
+and gave a tutorial of using the system at the GEC8 and GEC11 conferences.
+We have been providing support for the early adopters of our tools.
+=== E. Collaborations ===
+Most of our collaborations were with the Utah ProtoGENI team. We were actively
+involved in the bi-weekly meetings of the ProtoGENI cluster. We also had discussions with other measurement
+groups including the OnTimeMeasure group at Ohio State and the S3 Monitor group at Purdue.
+We have also had conversations with the GENI security teams and
+members of other clusters regarding the design of security aspects of the measurement system.
+=== F. Other Contributions ===