wiki:Gec7InstMeasWGAgenda

Version 44 (modified by hmussman@bbn.com, 15 years ago) (diff)

--

I&M WG meeting at GEC7: Agenda and Notes

Wednesday, March 17, 2010, 3:30pm - 5:30pm
Room: Presidents 2

Introductions

3:30pm All

Major WG issues and goals

3:35pm Paul Barford (University of Wisconsin)
slides (15min)

Goals:
+ An infrastructure for gathering, analyzing and archiving measurements in GENI

  • Support for broad range of experiments
  • Tight integration with control framework(s)
  • Efficient, easy to use, broadly deployed, diverse capabilities, secure, privacy-aware, etc.

+ WG facilitates communication and coordination between I & M projects

  • Toward reaching overall goals quickly and effectively
  • Includes architectures, specifications, partnerships

Issues:
+ Architectural (v0.1 on wiki)

  • Use cases
  • Schema
  • Sensors, components and protocols
  • UI and integration
  • Authentication and privacy
  • torage and analysis

+ Practical

  • Test and evaluation
  • Deployment, configuration and support

Project Reviews

Each of the following speakers was asked to include in their talk a review of how they address the following GENI I&M architecture priority topics:
+ Common terminology; best granularity of functions
+ Measurement data schema; common after MPs, before and within MCs, MDAs; what is included in meta-data?
+ Measurement plane; options; expect nodes with 3 or 2 NICs

Instrumentation Tools Project (1642)

3:50pm Jim Griffioen (University of Kentucky)
slides

Goals:
+ Integrate Univ of Kentucky Emulab into ProtoGENI (completed in year 1)
+ Reimplement Univ of Kentucky Edulab instrumentation and measurement tools to work in the ProtoGENI environment.
+ Support automatic generation of instrumentation and measurement infrastructure on a per-slice basis.

Structure:
+ Measurement Points
+ Measurement Controller

Functional Components

  1. Setup: deploy and initialize topology-specific software and services
  2. Capture: capture measurement data
  3. Collection: move data to processing/storage environments
  4. Storage: store data on a temporary, short term, long term, and archival basis
  5. Processing: filter, convert, aggregate, summarize, etc., data
  6. Presentation: present data to users in meaningful ways
  7. Access Protection: protect resources and data
  8. Measurement Control: Dynamically control the above components

Note: Conventional network management solutions exist for 2. through 8.

Mapping functions to proposed services:

  1. Setup – MO?
  2. Capture - MP
  3. Control – MO?
  4. Collection - MC
  5. Storage - MDA
  6. Processing - MAP
  7. Access Control – MO?
  8. Presentation - MAP

Measurement data schema:
+ Data-specific formats will exist at multiple times and places but should be invisible to the generic measurement infrastructure
+ Should leverage expertise of those with experience in this area (DatCat, PerfSONAR, etc).

Measurement plane:
+ Virtual interfaces abound in GENI.

  • Unknown how virtual interfaces map to physical NICs.

+ Shared paths are OK, but require QoS.

  • Setting up QoS planes is challenging; particularly if planes are per-slice.

OMF/OML Project (1660)

4:05pm Max Ott (NICTA)
slides

Goals:

  • All experiment output in one place
  • Capturing everything – most importantly meta data
  • Separation of concerns

– Instrumenting
– Collecting

  • Minimizing measurement collection overhead

– Time
– Traffic interference

  • Support for steerable experiments

– Access to data in different places

Concepts:
+ MPoints
+ Filters
– Stddev
– Average
– First
– Histogram
+ Processing/Caching
+ Steering or Feedback
+ MStreams
+ Dynamic Schema
+ Visualization

Supported applications:

  • Traffic Generation/Measurements

– OTG … Traffic Generator
– Iperf

  • Monitoring

– Libtrace
– Libsigar
– Spectrum Analyzer
– GPS
– (Weather)

  • Components

– TinyOS/Motes
– (GnuRadio)

LAMP using perfSONAR (1788)

4:20pm Guilherme Fernandes (for Martin Swany) (University of Delaware)
slides
+ perfSONAR is a multi-domain performance monitoring framework, which defines a set of protocol standards for sharing data between measurement and monitoring systems

Architecture:
+ Interoperable network measurement middleware designed as a Service Oriented Architecture (SOA):

  • Components are Web Services (WS) based

+ Several unique components and design considerations, all of which operate in a cooperative yet independent manner

  • Each functionality is separated into a specific function
  • Clients and servers interact through scripted, XML Based protocols
  • Measurement data is encoded in expressive XML formats

Components:
+ Infrastructure

  • Lookup Service
  • Topology Service
  • Authentication Service

+ Services

  • Measurement Point (MP) Service
  • Measurement Archive (MA) Service
  • Resource protector

+ Analysis and visualization

Open protocols and schema:
+ Base network measurement schema

  • OGF Network Measurement Working Group

+ Topology Schema

  • OGF Network Markup Language (NML-)WG
  • Includes Topology Network ID

+ perfSONAR Protocol Documents

  • OGF Network Measurement and Control (NMC-)WG

Base network measurement schema:
+ Measurement Data, a set of measurement events that have some value or values at a particular time
+ Measurement Metadata, the details about the set of measurement data

Measurement metadata:
+ Subject (Noun)

  • The measured/tested entity (who)
  • E.g. A pair of hosts (end-point-pair), or a Layer 3 interface

+ EventType (Verb)

  • What type of measurement, value, or event occurred
  • Characteristic, tool output, or generic event
  • E.g. latency, bandwidth, utilization, or simply iperf

+ Parameters (Adjectives and Adverbs)

  • How, or under what conditions, did this event occur?
  • E.g. buffer sizes used, TCP vs ICMP packets

+ Key

  • Shortcut substituted in place of previous three items
  • No predefined format

Measurement data: + Datum: The actual result (values) of measurement.

  • Can contain time (e.g. a Time element or attribute).
  • Existence of an event might point to the case where there no additional value
  • As in “Link up/down” or threshold events

+ Time: Representation of a time stamp or time range in a specified format.

  • Must be extensible since even agreement about the right structure is not easy, e.g. UNIX timestamp vs NTP time

Schema namespaces and extensibility:
+ A namespace: http://ggf.org/ns/nmwg/base/2.0/[[BR]]

  • MAY NOT be a URL

+ We encode the measurement/event type in the namespace (and as a standalone element)
+ We use Data and Metadata elements and vary the namespaces of the specific elements
+ Extensibility achieved through hierarchy with delegation

  • Similar to OIDs in the IETF management world

+ The NM-WG has a hierarchy of network characteristics

+ However, not all tools are cleanly mapped onto the Characteristic space

  • Often a matter of some debate

+ Organization-rooted tools namespace addresses this

Topology Schema:
+ Topology schema grew from network measurement description

  • Reusable “Subject” elements for common cases
  • Also reduces redundancy
  • Relationships between measurement Subjects

+ Structured by layers and the same elements recurring there (Base, L2, L3, L4)

  • networks as graphs

+ Elements:

  • Domain
  • Node
  • Port
  • Link
  • Network
  • Path
  • Service

+ Varied by namespaces (extensibility)

  • Reuse visualization logic, etc.
  • Validate layer- or technology-specific attributes

+ Used by perfSONAR, IDC Protocol (ION, OSCARS, AutoBAHN), Phoebus

  • Currently calling it the UNIS Topology Schema

+ OGF NML-WG to unify NDL and UNIS Topology schema

  • Happening as we speak at OGF28

LAMP objectives:
+ Collaborate on defining a common but extensible format for data storage and exchange for GENI I&M systems

  • Use perfSONAR NM-WG schema as starting point
  • Identify new characteristics/tools namespaces

+ Develop a representation of GENI topology to be used to describe measurements and experiment configuration

  • UNIS topology schema can be easily extended

+ Collaborate with related GENI measurement and security projects on a common GENI I&M architecture

  • The new GENI I&M Arch. Draft defines very similar services (MP, MC, MDA, MAP), and new ones (MO)
  • perfSONAR is a good starting point, not currently a final solution (for GENI);
  • Use cases have been different, but much can be reused and the framework can be extended

DatCat Project

4:35pm Brad Huffaker (CAIDA, UC San Diego Supercomputing Center)
slides

Goal:
+ DatCat was designed to improve data sharing by providing a unified metadata database for Internet data.
+ Make easy for users:

  • finding data sets of interest
  • adding new data sets to the catalog
  • annotating data sets in the catalog

+ DOES NOT store data

Database scheme:
+ Collection

  • logical group of files(paper, project,...)

+ Data files

  • raw data files (traceroutes, logs dumps, ...)

+ Packages

  • downloadable files (single file, tarball, ...)

+ Locations

  • how to get the packages (URL, contact address, ...)

Annotations:
+ Provide an extensible naming space for assigning domain specific values to files.
+ each user has their own hierarchical name space

  • passive.IPv4.packet_count
  • active.RTT_95th_percentile

+ both data contributors and general DatCat users may attach annotations
+ any user may assign “note” annotations to any object

Metadata fields:
+ collection

  • fields: name, contents, summary, motivation, creators/primary contact/contributor, start/end time, keywords, short description/description/description URL
  • annotations: note

+ data

  • fields: name, creators/primary contact/contributor, keywords,format, file size, start/end time, duration, geographic/network location, time zone, MD5, description, creation process
  • annotations: passive.IPv4.packet_count, passive.IPv4.TCP.dst.port_count, cfg.passive.capture_len, AS_count, active.trace_count, active.RTT_10th_percentile, .....

+ location

  • fields: package, creators, primary contact, status, download procedure, download URL, geographic/logistic location, availability

Submission tools:
+ Perl API

  • useful for integrating into existing data management systems
  • flexible, but need to write code:

+ subcat

  • different approach (declarative)
  • describe metadata in human-friendly text files (YAML)
  • CAIDA provides tools to extract additional metadata (data-to-yaml)
  • subcat intuitively joins information together

DatCat web portal:
+ Browse collections
+ Search collections
+ Search data

Lessons learned:
+ file-level metadata hard

  • hard to fix errors across thousands of files
  • hard to display thousands of files
  • hard to generate

+ submission process too cumbersome for most users

  • majority of metadata is shared between files, creator, creation process, location, etc
  • many researchers are not programmers
  • researchers have limited time and motivation

+ Lots of redundant information:

  • For a single contribution, a majority of data objects have identical metadata shared across a large number of data objects.
  • could be solved by pushing subcat-type categories into the database

+ Move to stand-alone collections

  • contributors will only need to fill in the collection information
  • shorten search path from collection to locations

+ better to have lots of collections, than lots of files

GENI I&M Architecture

4:55pm Harry Mussman (GPO) slides
GENI I&M Architecture document (15min)
v0.1 DRAFT includes proposed I&M services and proposed configuration

Purpose: + Provide a comprehensive and ordered list of topics that must be addressed for a complete architecture + Identify the priority topics that the WG needs to address first + Pull together contributions by the WG though Spiral 2

Plan: + Now : v0.1 DRAFT completed, by GPO; see http://groups.geni.net/geni/wiki/GeniInstrumentationandMeasurementsArchitecture + By GEC8: v0.5 draft, by GPO, with contributions from WG + By GEC9: v1.0 draft, reviewed by WG

Document outline:

  1. Document Scope
  2. Introduction
  3. Definition and configuration of I&M services
  4. Interfaces, protocols and schema for Measurement Data (MD)
  5. Ownership of MD and privacy of owners
  6. Interfaces, protocols and APIs for using I&M services
  7. Basic GENI I&M use cases
  8. MD transport via the GENI Measurement Plane
  9. Discovery, authorization, assignment and binding of GENI I&M services
  10. Measurement Orchestration (MO) service
  11. Measurement Point (MP)
  12. Time-stamping MD
  13. Measurement Collection (MC) service
  14. Measurement Analysis and Presentation (MAP) service
  15. Measurement Data Archive (MDA) service
  16. Additional GENI I&M use cases

Based on GENI I&M Capabilities Catalog (v0.1), these GENI projects have comprehensive, end-to-end capabilities: + OML (ORBIT Measure Library) in OMF (ORBIT Mgmt Framework)

  • (Ott, NICTA and Gruteser, WINLAB/Rutgers, 1660)

+ Instrumentation Tools

  • (Griffioen, Univ Kentucky, 1642)

+ perfSONAR for network measurements

  • (Zekauskas, I2 and Swany, Univ Delaware, 1788)

+ Scalable Sensing Service

  • (Fahmy, Purdue and Sharma, HP Labs, 1723)

+ OnTimeMeasure

  • (Calyam, Ohio Super Ctr, 1764)

After considering projects with comprehensive, end-to-end capabilities, here are five services they have in common: + Measurement Orchestration (MO) service

  • (p/o Experiment Control service, uses a language to orchestrate I&M services)

+ Measurement Point (MP) service

  • (instrumentation that taps into a network and/or systems, links and/or nodes, to capture measurement data and format it using a standardized schema)

+ Measurement Collection (MC) service

  • (programmable systems that collect, combine, transform and cache measurement data)

+ Measurement Analysis and Presentation (MAP) service

  • (programmable systems that analyze and then present measurement data)

+ Measurement Data Archive (MDA) service

  • (measurement data repository, index and portal)

Expected range of implementations: + Small-scale implementations might put all I&M services within one aggregate, and even in one server

  • interfaces between services would be internal to the aggregate, or even internal to the server

+ Large-scale implementations might have I&M services distributed over many aggregates

  • with measurement data flowing between services
  • with orchestration mechanisms based upon message exchanges

Discussion topics: + Are these five services a complete group of I&M services? + Are these good names for the five I&M services? + Is this five the right granularity for I&M services? + Is this a complete and flexible configuration for I&M services? + Can this configuration accommodate the range from small-scale to large-scale implementations? + How can we obtain a consensus, so that we can set a firm foundation for the other topics?

Interfaces, protocols and schema for measurement data: + Issues:

  • This topic suggested at GEC6 meeting: Common schema for MD
  • Can we identify a common set of interfaces, protocols and schema for MD, or at least a limited number of types?
  • What needs to be included in the MD schema?

+ Approach:

  • Assume all MD after MPs follows this common set of interfaces, protocols and schema
  • Start with definition of MD schema
  • Next, understand [8. MD Transport via GENI Measurement Plane]
  • Then, complete first set of interfaces and protocols

From GENI I&M Capabilities Catalog (v0.1), these GENI projects (and others) are working on data schema and/or data archives: + perfSONAR for network measurements (Swany, Univ Delaware, 1788) + IMF project (Dutta, NC State, 1718) + Embedded Real-Time Measurements (Bergman, Columbia, 1631) + GENI Meta-Operations Center (Herron, Indiana Univ, 1604) + netKarma: GENI Provenance Registry (Pale and Small, Indiana Univ, 1706) + DatCat project at http://www.datcat.org/ (Klaffy, CAIDA) + Crawdad project at http://crawdad.cs.dartmouth.edu/ (Kotz, Dartmouth) + Amazon Simple Storage Service + Data-Intensive Cloud Control (Zink and Cecchet, UMass Amherst, 1709 ) + Experiment Mgmt System (Lannom and Manepalli, CNRI, 1663) + others?

  • What can we learn from these projects?

Discussion topics: + Standardized interfaces between measurement services

  • Pt-to-pt vs pt-to-multipoint (e.g., pub/sub)
  • Stream vs bulk transfer
  • Disconnection operation expected, or not.

+ Protocols for moving measurement data

  • Streaming data
  • Bulk-transfer of data

+ Schema for measurement data

  • Data record identifier
  • Annotation, or meta data
  • Data types and values, with timestamps

+ How can we obtain a consensus on first set of intfc’s/protocols/schema for MD? + What is the process for extending the set?

GENI measurement plane: + Issue:

  • Need to understand how MD traffic flows are transported by the GENI Measurement Plane before the interfaces and protocols for MD can be fully defined

+ Approach:

  • Understand current view of GENI Control Plane and Experiment Plane
  • Consider options for GENI Measurement Plane to transport MD flows, using networks that implement GENI Control and Experiment Planes

Next Steps for WG

5:10pm Bruce Maggs (Duke University)

Wrap-up, review of action items and issues for planary

5:25pm Harry Mussman (GPO)
slides

5:30pm Adjourn

6:30pm BoF dinner, organized by Harry Mussman, location Parizade Restaurant

Attachments (7)