wiki:netKarma/GEC14_report

Version 42 (modified by scjensen@umail.iu.edu, 7 years ago) (diff)

--

NetKarma

Report for the period ending GEC14

Beth Plale, School of Informatics and Computing, Indiana University
Chris Small, InCNTRE, Indiana University

Summary

In this quarter we brought together the multiple Adaptors that have been developed for different types of GENI experiments within the easy-to-use web interface of the NetKarma Portal. At GEC14 we presented a tutorial to GENI experimenters that allowed them to see how they can capture provenance describing their experiments and how visualizing the provenance of their experiments using NetKarma can assist in both communicating the results of their experiments and analyzing their results.

A basic tenet of the Karma provenance system and NetKarma in particular has been the capture of provenance while minimizing the effort required of researchers to instrument their code or manually enter provenance or metadata. From our experience with capturing provenance and metadata, if the process cannot be automated, the capture will not be done since it is an extra burden on researchers. The NetKarma project uses an "Adaptor" approach that scavenges provenance and metadata annotations with the goal of minimizing the effort of GENI experimenters. Early efforts focused first on experiments run using GUSH, and with minor changes that Jeannie Albrecht's group at Williams College incorporated into GUSH, NetKarma can extract provenance from GUSH logs and generate provenance graphs through NetKarma using the Open Provenance Model (OPM). To visualize the provenance graphs generated, we extended the widely used third-party Cytoscape visualization tool with NetKarma plug-ins to retrieve provenance from a NetKarma server and apply layouts that convert the OPM graphs retrieved into visualizations experimenters can use.

This approach was extended at GEC12 to extract additional annotations about an experiment from the GMOC database. starting at GEC12, a NetKarma server was made available to GENI experimenters to allow them to upload log files to a NetKarma server located at Indiana University and visualize the resulting graphs in Cytoscape. To demonstrate the applicability of the NetKarma approach to a wider gamut of GENI experimenters, we worked with the GPO to identify experiments that could benefit from provenance. starting with NS2 experiments performed by researchers at Clemson that were presented in a demo and poster at GEC13, we have now extended that effort to include ORBIT experiments and provenance generated based on manifest files such as the XSP experiment used in the NetKarma tutorial at GEC14.

In GEC14 we brought all of the components of NetKarma together in the easy-to-use NetKarma Portal that enables experimenters to define a collection for an experiment and add log files and other data to the Portal using a drag-and-drop interface. On the back-end, the Portal uses NetKarma Adaptors and a NetKarma server at Indiana University to extract provenance and metadata; generating OPM graphs that can be visualized either through the portal using the web-based version of Cytoscape or downloaded and visualized using the Cytoscape desktop version and the NetKarma plug-ins.

The ORBIT experiments shown at GEC14 illustrate the initial goals of the NetKarma project in that experiments using the ORBIT Traffic Generator (OTG) can capture detailed provenance, download provenance graphs from the Portal, and visualize them with no additional instrumentation. At GEC14, we also presented how the provenance graph generated by NetKarma can be incorporated into the MDOD descriptor for an experiment, and how the DOI generated for an experiment through the NetKarma Portal can be incorporated into the MDOD that is archived.

Milestones Delivered

S4.c Provenance System Demo; I&M Contributions

NetKarma Tutorial and Demo

At GEC14 we presented a tutorial of the NetKarma Portal which is layered on top of the NetKarma server as well as NetKarma plug-ins for visualizing large provenance graphs. After first providing participants with an overview of what provenance is and how it can be beneficial to GENI experimenters, the tutorial continued with three hands-on components that allowed GENI experimenters to capture provenance, ingest and visualize provenance in the NetKarma Portal, and download provenance graphs from the portal for visualization in Cytoscape using the GENI NetKarma plug-ins.

The tutorial first walked the participants through an experiment using Emulab ProtoGENI resources to run an experiment doing a GridFTP transfer based on the eXtensible Session Protocol (XSP). Since the focus was on provenance generation, the resources had already been requested and configured, and the participants ran scripts loaded in the tutorial VM on VirtualBox to run the experiment – generating a log from the XSP experiment, some basic bandwidth measurements using iperf, and downloading the manifest for the resources they are using. The NetKarma adaptor which is incorporated into the NetKarma portal used these two log files and the manifest in the second hands-on portion of the tutorial to extract provenance and load it into NetKarma through the Portal.

The second part of the tutorial shows how you can take and data related to an experiment and create a simple way to store, refer to and analyse the results and other information about the experiment. Participants of the tutorial created an experiment on a demo version of the NetKarma Portal. Creating an experiment created a DOI handle to create a universal way to refer to all the data artifacts and information involved in creating the experiment. The tutorial participants then took the sample results and the manifest file from the XSP experiment and uploaded it to the Portal. The Portal analysed the files uploaded and ingested them into the Karma provenance system. Information on what network resources and the workflow of the experiment is gleaned from the uploaded data in order to construct a model of what happened in the experiment. Users could use the portal to visualise the new data products produced and download both a OpenProvence Model graph.

The third hands-on portion of the tutorial showed participants how they can easily download a provenance graph generated by NetKarma from the Portal. The downloaded graph is an XML file based on the Open Provenance Model (OPM) standard for provenance. For very large provenance graphs, the NetKarma plug-in can be used to visualize and explore large provenance graphs using the desktop version of Cytoscape (the web version of cytoscape is used in the Portal). In this section participants downloaded provenance graphs for experiments we had run on both NS2 and ORBIT and loaded into the portal using the corresponding NetKarma adaptors. The provenance graphs for these experiments differ from the XSP experiment in that whereas the adaptor for the XSP experiment uses the manifest to incorporate the topology of the experiment into the provenance graph, the focus of the NS2 and ORBIT adaptors is on capturing the success of each packet transferred. This focus on the packet level results in large provenance graphs, but as participants were able to experience by loading the graphs for both experiments into the copy of Cytoscape included in the tutorial VM, they could quickly see the performance of NS2 and ORBIT experiments as to packets successfully transferred or dropped by using visualization capabilities included in the NetKarma plug-in.

The full instructions, including step-by-step graphics and illustrations of the provenance generation, Portal, and visualization process are included in the tutorial instructions available at: http://d2i.indiana.edu/wiki/NetKarma_Tutorial

During the Demo session at GEC14 we also presented a live demo that followed up on our tutorial to show the latest version of the NetKarma portal, the visualization of an ORBIT experiment using the latest version of the NetKarma visualization plug-in for Cytoscape, and the propsed MDOD schema with NetKarma provenance.

NetKarma and the Measurement Data Object Descriptor (MDOD)

Following up on our proposed changes to the MDOD at GEC13 and subsequent discussions, we presented an XML schema of our proposed MDOD that was shared with the Measurement & Instrumentation group to stimulate discussion and we presented a poster at the GEC14 Demo session along with the schema and an example based on the XSP experiment used in the tutorial. The schema is discussed further in the sectin below on work performed this quarter. At GEC14, we also participated in the follow-on discussion Wednesday afternoon at GEC14 regarding next steps for the MDOD which was hosted by Harry Mussman and Jeanne Ohren of BBN. The proposed MDOD schema, supporting schemas, and the example based on the XSP experiment are included below in the documents section.

Work Performed this Quarter

NetKarma Portal

The NetKarma Portal simplifies the analysis of experiments and provides a way to a visualise, save and reference GENI Experiments. Using the NetKarma portal experimenters can easily recreate existing experiments using the information used in creating the experiment. It also visualizes the workflow and topology of experiments to aid in the understanding of the exact workflow and circumstances of an experiment.

A key part of the scientific method is that experiments must be reproducible. In order to reproduce experiments you will need to have knowledge of exactly the circumstances of an experiment. This includes not only the results but the specification of the experiment. This is especially an issue in large scale network experiments that GENI makes possible. The initial conditions may be complex and include hundreds of resources. The NetKarma strives to make the reproducibility of experiments simple by providing a place to store all the files used to create an experiment, logs from the process, such as GUSH logs and the results themselves. The NetKarma Portal also provides value for the initial experimenter by producing data products from the raw files uploaded. Another possible use of the NetKarma Portal is an archival mechanism as part of the NSF Data Management requirements for all NSF funded research.

Diagram

The first action when using the NetKarma Portal is to create a new experiment. This allows all data uploaded to be grouped together in that experiment. Creating the experiment also creates a DOI handle that can be used to reference the data. The DOI handle can be used for example in the references of any papers written about the work. The DOI data reference will then allow a reader of the paper to retrive all relevant data about the experiment that created the work. It also allows the linking of published work to the GENI framework that help produce the work. The DOI handle can also be used with any DOI resolver such as CrossRef or the one at http://dx.doi.org/ to search for the work. The NetKarma Portal utilizes the EZID web service and DataSite organization to provide unique handles. The home page also provides a listing of all experiments that have been previously created in the Portal. These experiments can be searched and browsed by other researchers.

Portal home page

Once an experiment is created the experimenter can easily load files by just dragging and dropping them on the experiment page. When a file is uploaded it is analysed for useful information and if it contains information in a format that the NetKarma Portal can understand and parse is ingested into the Karma DB service. All files are archived as well as analysed.

Portal Experiment Page

The Karma DB contains a model of the workflow of the experiment. This can include what processes were run in which order and the resources the ran on. The portal makes information stored in the Karma DB available in two ways. 1) It creates a visual graph using the Cytoscape Web tool directly in the portal 2) It makes the OPM graph XML representation available for download. Using the OPM file directly allows you to use the CytoScape desktop tool which can produce more elaborate and customizable visualization than available directly in the portal.

Portal WorkFlow In addition to workflow visualization the NetKarma Portal also provides a topological view if files contain topology representations are uploaded in to the portal. The Portal understands GENI API UNIS and NMWG formats used in GEMINI and PerfSonar. Uploading these files produce a CytoScape Web representation similar to the workflow representation. Each element in the visualization can be moved and selected to obtain more information about each element.

Portal Topology Page

The final portion of the NetKarma prtal is the archive tab. This tab gives you access to all files uploaded to the portal. Anyone with access to the Experiment can download these files and use them to help comprehend or recreate the experiment.

Portal Archive



NetKarma Visualization and Adaptor Enhancements Building on the work we presented at the GEC13 demo session on visualizing the provenance of experiments run on NS2, for GEC14 we developed an Adaptor for ingesting information from WiMAX experiments run on ORBIT and generating a provenance graph through NetKarma based on the Open Provenance Model (OPM) standard. Since the NS2 experiment is running in a simulated environment, we were able to capture the provenance of each packet and whether it was successfully received or dropped. In the ORBIT environment, such complete information is not available - we can determine what packets were sent, but not at a packet level which were successfully received and which were dropped.

Two of the goals of NetKarma are to: (1) provide experimenters with the provenance of their experiment based on the events that occurred in their experiment, and (2) to capture provenance while minimizing the additional instrumentation needed in an experiment to capture provenance; ideally with no additional instrumentation. For ORBIT experiments run using the ORBIT traffic generator (OTG), the NetKarma adaptor can capture provenance wiht no additional instrumentation as illustrated in the following diagram.

As seen in this diagram, the provenance of an experiment run on ORBIT can be captured in NetKarma using the ORBIT NetKarma adaptor that ingests data that is already generated as part of the experiment. This includes: (1) the script for the ORBIT experiment, (2) data retrieved using the OMF interface's "info" and "list clients" methods, and (3) data retrieved from the OML database. The adaptor combines these data sources to generate event notification messages that are sent to the NetKarma server, where they are combined to create the annotated OPM graph for the experiment.

The provenance graph displayed in the above image is based on the “Many-to-One Communication” example in the OTG tutorial. This experiment consists of seven senders and a single receiver. In this experiment we varied the size and the rate of the packets sent by each of the senders. The experiment was run on GENI ORBIT resources at Rutgers University and is displayed using the NetKarma visualization plug-in for the Cytoscape visualization tool. As with other provenance graphs captured in NetKarma, the plug-in tool loads an OPM graph from Karma and converts the OPM XML file into a Cytoscape visualization.

The Plug-in has been enhanced to include an ORBIT visualization style that can be applied to color-code the packets sent and received based on the origin (in this example, which of the seven OTG nodes sent the packet). In this visualization we can see that the OTR receiver node in the upper left corner received packets only from two of the nodes since all of the packets received are colored either yellow or green to correspond with the OTG node that sent them. Zooming in, experimenters can review the metadata annotations for each packet or node that had been harvested by the NetKarma Adaptor as shown in the following screen capture:

The provenance graph for the above illustration is available here or can be downloaded from the NetKarma Portal (as participants did in the NetKarma tutorial at GEC14). The ORBIT visualization style which color-codes the nodes in the provenance graph based on their origin is included in the latest Netkarma plug-in source code which is available from the Karma SourceForge page: https://sourceforge.net/projects/karmatool/

Capturing the Provenance of an XSP Experiment

Network experiments based on the eXtensible Session Protocol (XSP) were identified in GEC13 as the second type of GENI experiments to be used for evaluating NetKarma. Unlike the NS2 experiments where the provenance was captured for each packet being sent, in the XSP experiment, provenance capture focused on getting provenance from the annotated RSpec returned as the manifest when resources are requested and adding annotations based on performance measurements captured during the experiment. Following discussions with Martin Swany (Indiana University) and Ezra Kissel (University of Delaware) at GEC13 and subsequent follow-up conversations, the provenance was captured for an XSP experiment executing a GridFTP transfer using Phoebus gateway nodes run on Emulab resources. a NetKarma Adaptor was developed and incorporated into the NetKarma Portal that extracts provenance from the manifest, the XSP log, and network bandwidth measurements taken using iperf. Following is an illustration of the provenance captured from an XSP experiment:

This experiment was also used in the NetKarma tutorial at GEC14. The Emulab resources were provisioned ahead of the tutorial, but then participants ran an XSP GridFTP transfer on those resources, generating the XSP log and iperf measurements, downloading the manifest for the resources being used, and then uploaded these 3 files to an experiment they defined in the NetKarma Portal and visualized the provenance through the Portal.

The provenance graph generated for the XSP experiment was also used as the sample provenance graph embedded in the proposed MDOD as described below.

Proposed MDOD Schema Integrating Provenance

At GEC13 we delivered a whitepaper on proposed changes to the MDOD structure and incorporating provenance into the MDOD. In the I&M session at GEC13, Giridhar Manepalli of CNRI presented a summary of the MDOD and some open issues. Based on his presentation, subsequent discussions with Giridhar, Harry Mussman and Jeanne Ohren of BBN, and NetKarma meetings after GEC 13, we revised our proposal to reflect these discussions and competed a draft of the schema which is included below in the publications and documents section.

Some of the changes that are reflected are:

  • The MDOD, is broadened to describe the experiment and measurements relating to the experiment. The original vision of the MDOD as presented by Harry Mussman was to describe all of the measurements related to an experiment. the proposed schema is extended to include the provenance of the experiment based on the OPM graph generated by NetKarma. Since the provenance graph for an experiment can be large, the annotations are stripped out of the provenance graph, leaving the structure of actors, processes, and artifacts. The example included below embeds the provenance for the XSP experiment. Removing the annotations reduced the size by 75%.
  • An open issue discussed at GEC13 was how the identifier for the MDOD should be generated and whether any semantics should be embedded in the identifier. The revised proposal uses a DOI generated by NetKarma that can be used to link back to the data captured for an experiment in the Portal. the DOI would be used when an MDOD is being archived, but also allows for relative identifiers (path within the experiment) or other assigned IDs prior to an experiment being archived. Using the DOI leverages existing technology that can be used to track and update the ownership and custody of the measurement data.
  • Since the MDOD needs to be dynamic and handle new measurement capabilities being added in GENI, we propose using an approach adopted in other metadata schemata such as the FGDC schema which has long been used in spatial data where keywords are defined based on an external source such as a controlled vocabulary. This allows terms to be precisely defined, but avoids continual updates to the underlying schema.
  • Nesting of MDODs, but not a strict hierarchy. An experiment may use measurement data created by an operator or aggregate provider, or may incorporate measurements from multiple runs done over time. Allowing MDODs to nest or reference other existing MDODs results in a more flexible and lighter weight structure.

At GEC14 we presented a poster that outlined this structure and a proposed lifecycle for the MDOD as reflected in the following diagram:

Both the GEMINI and GIMI Instrumentation & Measurement projects are working towards a "GENI Experimenter Portal Service" that would allow an experimenter to setup measurements for an experiment and initialize the collection of measurements. This is a natural source for the initial generation of the MDOD. In some cases the measurement data and MDOD may be registered directly in the GENI Measurement Data Archive Service or the measurement data and MDOD could be uploaded with other logs, the manifest, and other data to the NetKarma Portal to capture the provenance, metadata, and data related to an experiment. The NetKarma Portal generates a provenance graph through NetKarma based on the OPM standard, saving the full provenance graph in the Portal, and can incorporate the provenance without annotations in the MDOD.

Attachments (22)