Version 29 (modified by, 10 years ago) (diff)



Report for the period ending GEC13

Beth Plale, School of Informatics and Computing, Indiana University
Chris Small, InCNTRE, Indiana University


In this quarter we brought together the multiple Adaptors that have been developed for different types of GENI experiments within the easy-to-use web interface of the NetKarma Portal and presented a tutorial to GENI experimenters that allowed them to see how they can capture provenance describing their experiments and how visualizing the provenance of their experiments using the NetKarma visualization tools can assist in both communicating the results of their experiments and assist them in analyzing their results.

A basic tenet of the Karma provenance system and NetKarma in particular has been the capture of provenance while minimizing the effort required of scientists and researchers to instrument their code or manually enter metadata. From our experience with capturing provenance and metadata, if the process cannot be automated, the capture will not be done since it is an extra burden on researchers. The NetKarma project developed an "Adaptor" approach that scavenges provenance and metadata annotations with a goal of minimizing the effort of GENI experimenters. Early efforts focused first on experiments run using GUSH, and with minor changes that Jeannie Albrecht's group at Williams College incorporated into GUSH, we were able to extract provenance from GUSH logs and generate provenance graphs through NetKarma using the Open Provenance Model (OPM). To enable the visualization of the provenance graphs being generated, we extended the widely used third-party Cytoscape visualization tool with NetKarma plug-ins for retrieving provenance graphs from a NetKarma server and applying layout and formatting capabilities that convert the OPM graphs retrieved into visualizations experimenters can use.

This approach was extended at GEC12 to also extract additional annotations about an experiment from the GMOC database, and starting at GEC12 we made a NetKarma server available to GENI experimenters to allow them to upload log files to the remote NetKarma server at Indiana University and visualize the resulting graphs in Cytoscape. To demonstrate the applicability of the NetKarma approach to a wider gamut of GENI experimenters, we worked with the GPO to identify experiments that could benefit from Provenance, and starting with NS2 experiments performed by researchers at Clemson that were presented in a demo and poster at GEC13, we have now extended that effort to include ORBIT experiments and provenance generated based on manifest files such as the XSP experiment used in the NetKarma tutorial at GEC14.

In GEC14 we brought all of the components of NetKarma together in an easy-to-use NetKarma Portal that enables experimenters to define a collection for an experiment and add log files and other data to the Portal using a drag-and-drop interface. On the back-end, the Portal uses the NetKarma Adaptors and a NetKarma server at Indiana University to extract provenance and metadata; generating OPM graphs that can be visualized either through the portal using the web-based version of Cytoscape or downloaded and visualized using the Cytoscape desktop version with the NetKarma plug-ins.

The ORBIT experiments shown at GEC14 illustrate the initial goals of the NetKarma project in that experiments using the ORBIT Traffic Generator (OTG) can capture detailed provenance, download the graph from the Portal, and visualize it with no additional instrumentation. At GEC 14, we also presented how the provenance graph generated by NetKarma can be incorporated into the MDOD descriptor for an experiment, and how the DOI generated for an experiment through the NetKarma Portal cna be incorporated into the MDOD that is archived.

Milestones Delivered

S4.c Provenance System Demo; I&M Contributions

NetKarma Tutorial and Demo

At GEC14 we presented a tutorial of the NetKarma Portal which is layered on top of the NetKarma server as well as NetKarma plug-ins for visualizing large provenance graphs. After first providing participants with an overview of what provenance is and how it can be beneficial to GENI experimenters, the tutorial continued with three hands-on components that allowed GENI experimenters to capture provenance, ingest and visualize provenance in the NetKarma Portal, and download provenance graphs from the portal for visualization in Cytoscape using the GENI NetKarma plug-ins.

The tutorial first walked the participants through an experiment using Emulab ProtoGENI resources to run an experiment doing a GridFTP transfer based on the eXtensible Session Protocol (XSP). Since the focus was on provenance generation, the resources had already been requested and configured, and the participants ran scripts loaded in the tutorial VM on VirtualBox to run the experiment – generating a log from the XSP experiment, some basic bandwidth measurements using iperf, and downloading the manifest for the resources they are using. The NetKarma adaptor which is incorporated into the NetKarma portal used these two log files and the manifest in the second hands-on portion of the tutorial to extract provenance and load it into NetKarma through the Portal.

<Portal section>

The third hands-on portion of the tutorial showed participants how they can easily download a provenance graph generated by NetKarma from the Portal. The downloaded graph is an XML file based on the Open Provenance Model (OPM) standard for provenance. For very large provenance graphs, the NetKarma plug-in can be used to visualize and explore large provenance graphs using the desktop version of Cytoscape (the web version of cytoscape is used in the Portal). In this section participants downloaded provenance graphs for experiments we had run on both NS2 and ORBIT and loaded into the portal using the corresponding NetKarma adaptors. The provenance graphs for these experiments differ from the XSP experiment in that whereas the adaptor for the XSP experiment uses the manifest to incorporate the topology of the experiment into the provenance graph, the focus of the NS2 and ORBIT adaptors is on capturing the success of each packet transferred. This focus on the packet level results in large provenance graphs, but as participants were able to experience by loading the graphs for both experiments into the copy of Cytoscape included in the tutorial VM, they could quickly see the performance of NS2 and ORBIT experiments as to packets successfully transferred or dropped by using visualization capabilities included in the NetKarma plug-in.

The full instructions, including step-by-step graphics and illustrations of the provenance generation, Portal, and visualization process are included in the tutorial instructions available at:

During the Demo session at GEC14 we also presented a live demo that followed up on our tutorial to show the latest version of the NetKarma portal, the visualization of an ORBIT experiment using the latest version of the NetKarma visualization plug-in for Cytoscape, and the propsed MDOD schema with NetKarma provenance.

NetKarma and the Measurement Data Object Descriptor (MDOD)

Following up on our proposed changes to the MDOD at GEC13 and subsequent discussions, we presented an XML schema of our proposed MDOD that was shared with the Measurement & Instrumentation group to stimulate discussion and we presented a poster at the GEC14 Demo session along with the schema and an example based on the XSP experiment used in the tutorial. The schema is discussed further in the sectin below on work performed this quarter. At GEC14, we also participated in the follow-on discussion Wednesday afternoon regarding next steps for the MDOD which was hosted by Harry Mussman and Jeanne Ohren of BBN. The proposed MDOD schema, supporting schemas, and the example based on the XSP experiment are included below in the documents section.

Work Performed this Quarter

NetKarma Portal

<add Portal Details>

NetKarma Visualization and Adaptor Enhancements Building on the work we presented at the GEC13 demo session on visualizing the provenance of experiments run on NS2, for GEC14 we developed an Adaptor for ingesting information from WiMAX experiments run on ORBIT and generating a provenance graph through NetKarma based on the Open Provenance Model (OPM) standard. Since the NS2 experiment is running in a simulated environment, we were able to capture the provenance of each packet and whether it was successfully received or dropped. In the ORBIT environment, such complete information is not available - we can determine what packets were sent, but not at a packet level which were successfully received and which were dropped.

Two of the goals of NetKarma are to: (1) provide experimenters with the provenance of their experiment based on the events that occurred in their experiment, and (2) to capture provenance while minimizing the additional instrumentation needed in an experiment to capture provenance; ideally with no additional instrumentation. For ORBIT experiments run using the ORBIT traffic generator (OTG), the NetKarma adaptor can capture provenance wiht no additional instrumentation as illustrated in the following diagram.

As seen in this diagram, the provenance of an experiment run on ORBIT can be captured in NetKarma using the ORBIT NetKarma adaptor that ingests data that is already generated as part of the experiment. This includes: (1) the script for the ORBIT experiment, (2) data retrieved using the OMF interface's "info" and "list clients" methods, and (3) data retrieved from the OML database. The adaptor combines these data sources to generate event notification messages that are sent to the NetKarma server, where they are combined to create the annotated OPM graph for the experiment.

The provenance graph displayed in the above image is based on the “Many-to-One Communication” example in the OTG tutorial. This experiment consists of seven senders and a single receiver. In this experiment we varied the size and the rate of the packets sent by each of the senders. The experiment was run on GENI ORBIT resources at Rutgers University and is displayed using the NetKarma visualization plug-in for the Cytoscape visualization tool. As with other provenance graphs captured in NetKarma, the plug-in tool loads an OPM graph from Karma and converts the OPM XML file into a Cytoscape visualization.

The Plug-in has been enhanced to include an ORBIT visualization style that can be applied to color-code the packets sent and received based on the origin (in this example, which of the seven OTG nodes sent the packet). In this visualization we can see that the OTR receiver node in the upper left corner received packets only from two of the nodes since all of the packets received are colored either yellow or green to correspond with the OTG node that sent them. Zooming in, experimenters can review the metadata annotations for each packet or node that had been harvested by the NetKarma Adaptor as shown in the following screen capture:

The provenance graph for the above illustration is available here or can be downloaded from the NetKarma Portal (as participants did in the NetKarma tutorial at GEC14). The ORBIT visualization style which color-codes the nodes in the provenance graph based on their origin is included in the latest Netkarma plug-in source code which is available from the Karma SourceForge page:

Capturing the Provenance of an XSP Experiment

Network experiments based on the eXtensible Session Protocol (XSP) were identified in GEC13 as the second type of GENI experiments to be used for evaluating NetKarma. Unlike the NS2 experiments where the provenance was captured for each packet being sent, in the XSP experiment, provenance capture focused on getting provenance from the annotated RSpec returned as the manifest when resources are requested and adding annotations based on performance measurements captured during the experiment. Following discussions with Martin Swany (Indiana University) and Ezra Kissel (University of Delaware) at GEC143 and subsequent follow-up conversations, the provenance was captured for an XSP experiment executing a GridFRP transfer using Phoebus gateway nodes run on Emulab resources. a NetKarma Adaptor was developed and incorporated into the NetKarma Portal that extracts provenance from the manifest, the log generated by XSP, and network bandwidth measurements taken using iperf. Following is an illustration of the provenance captured from an XSP experiment:

This experiment was also used in the NetKarma tutorial at GEC14. The Emulab resources were provisioned ahead of the tutorial, but then participants ran an XSP GridFTP transfer on those resources, generating the XSP log and iperf measurements, downloading the manifest for the resources being used, and then uploaded these 3 files to an experiment they defined in the NetKarma Portal and visualized the provenance through the Portal.

The provenance graph generated for the XSP experiment was also used as the sample provenance graph embedded in the proposed MDOD as described below.

Proposed MDOD Schema Integrating Provenance

At GEC13 we delivered a whitepaper on proposed changes to the MDOD structure and incorporating provenance into the MDOD. In the I&M session at GEC13, Giridhar Manepalli of CNRI presented a summary of the MDOD and some open issues. Based on his presentation, subsequent discussions with Giridhar and Harry Mussman and Jeanne Ohren of BBN, and NetKarma meetings after GEC 13, we revised our proposal to reflect these discussions and competed a draft of the schema which is included below in the publicatinos and documents section.

Some of the changes that are reflected are:

  • The MDOD, is broadened to describe the experiment and measurements relating to the experiment. The original vision of the MDOD as presented by Harry Mussman was to describe all of the measurements related to an experiment. the proposed schema is extended to include the provenance of the experiment based on the OPM graph generated by NetKarma. Since the provenance graph for an experiment can be large, the annotations are stripped out of the provenance graph, leaving the structure of actors, processes, and artifacts. The example included below embeds the provenance for the XSP experiment. Removing the annotations reduced the size by 75%.
  • An open issue discussed at GEC13 was how the identifier for the MDOD should be generated and whether any semantics should be embedded in the identifier. The revised proposal uses a DOI generated by NetKarma that can be used to link back to the data captured for an experiment in the Portal. the DOI would be used when an MDOD is being archived, but also allows for relative identifiers (path within the experiment) or other assigned IDs prior to an experiment being archived. Using the DOI leverages existing technology that can be used to track and update the ownership and custody of the measurement data.
  • Since the MDOD needs to be dynamic and handle new measurement capabilities being added in GENI, we propose using an approach adopted in other metadata schemata such as the FGDC schema which has long been used in spatial data where keywords are defined based on an external source such as a controlled vocabulary. This allows terms to be precisely defined, but avoids continual updates to the underlying schema.
  • Nesting of MDODs, but not a strict hierarchy. An experiment may use measurement data created by an operator or aggregate provider, or may incorporate measurements from multiple runs done over time. Allowing MDODs to nest or reference other existing MDODs results in a more flexible and lighter weight structure.

At GEC14 we presented a poster that outlined this structure and a proposed lifecycle for the MDOD as reflected in the following diagram:

Both the GEMINI and GIMI Instrumentation & Measurement projects are working towards a "GENI Experimenter Portal Service" that would allow an experimenter to setup measurements for an experiment and initialize the collection of measurements. This is a natural source for the initial generation of the MDOD. In some cases the measurement data and MDOD may be registered directly in the GENI Measurement Data Archive Service or the measurement data and MDOD could be uploaded with other logs, the manifest, and other data to the NetKarma Portal to capture the provenance, metadata, and data related to an experiment. The NetKarma Portal generates a provenance graph through NetKarma based on the OPM standard, saving the full provenance graph in the Portal, and can incorporate the provenance without annotations in the MDOD. The NetKarma Portal provides a means to create a hierarchy of the data, measurements, and provenance related to an experiment, so as this collection is bundled and submitted to the archive, the entire bundle can be described by an MDOD, using the portal-generated DOI as an ID for the bundle and including other MDODs describing measurement data by reference. This MDOD bundle is depicted graphically as follows:

Project Participants

During this period, active participants in the NetKarma project included: Beth Plale and Chris Small, as well as Scott Jensen, Postdoctoral Fellow in the Data To Insight Center, and students Peng Chen, Devarshi Ghoshal, and Yuan Luo. Robert Ping provided project management for the project.


  • Katherine Cameron (Clemson University) – ORBIT experiments on DDoS attacks and ORBIT configuration
  • Nilanjan Paul (Rutgers) – ORBIT experiment configuration
  • Fraida Fund (NYU-Poly WITest Lab) ORBIT reservation system
  • Martin Swany (Indiana University) and Ezra Kissel (University of Delaware) – Discussion of measurement metadata capture in the MDOD. XSP-based experiment used in NetKarma tutorial
  • Harry Mussman (BBN) – Metadata capture for measurement data in the MDOD
  • Jeanne Ohren (BBN) – tutorial coordination and MDOD
  • Ahmet Babaoglu (North Carolina State University) capturing NetKarma notifications as events in IMF Messaging

Publications & Documents

Posters at the GEC14 Poster and Demo Session:

  • GENI Provenance, Instrumentation and Measurement: Integrating Provenance into the GENI Measurement Data Object Descriptor PDF

Step-by-Step Instructions for the NetKarma Tutorial at GEC14:

Slides from Presentation on NetKarma at the GEC14 Experimenter's Roundtable: slides

Proposed MDOD Schema for capturing the Metadata and Provenance of GENI Experiments: MDOD_Provenance_0.2.xsd

Supporting schemas imported into the proposed MDOD schema:

Sample MDOD example based on capturing the metadata and provenance for an XSP experiment as used in the NetKarma tutorial: SampleMDOD-XSP.xml

GENI Documents


Attachments (22)