wiki:InstMeasTopic_4.3UseCasesInfrastructure

Version 10 (modified by chaos@bbn.com, 13 years ago) (diff)

--

T3) I&M Use Cases for Infrastructure Measurement, and Support for Operators

1) Goals

From Sec. 2 of the GENI I&M Architecture document:

In addition, the GENI operations staff require extensive and reliable instrumentation and measurement capabilities to monitor and troubleshoot the GENI suite and its constituent entities. Some of this data will be made available to experimenters, to help them conduct useful and repeatable experiments.

The GMOC, providing GENI-wide operator services, needs to monitor essentially all GENI infrastructure on a 24x7 basis. In this case, the GMOC Operator will gather, analyze and present MD that monitors hundreds of infrastructure elements.

2) Tasks

Provide a concise but complete definition of I&M Use Cases for Infrastructure Measurement

Identify the support that should be available to operators

Update the GENI I&M Architecture document:

Sec. 3.3. I&M Use cases for Central Operators (i.e., GMOC)
Sec. 3.4. I&M Use cases for Aggregate Providers and Operators
Sec. 4.2.2 Typical Arrangements of I&M Services: For Operator Gathering MD from GENI Infrastructure
Sec. 4.2.3 Typical Arrangements of I&M Services: For Experimenters Gathering MD from their Slice and from GENI Infrastructure
Sec. 4.3.3 Type 3 I&M Service: Common Service with MD for Multiple Slices

Use as guidance in the design of GENI I&M tools, particularly for the GEMINI and GIMI projects

3) Team

LEAD Martin Swany (Indiana U)
Guilherme Fernandes (?)
Eric Boyd (Internet2)
Jason Zurawski (Internet2)
Prasad Calyam (Ohio Super Center)
Chris Small, for NetKarma (Indiana U)
Ilia Baldine, for ExoGENI racks (RENCI)
Jonathan Mills (RENCI)
?, for InstaGENI racks (HP)
?, for GMOC
Sarah Edwards (GPO)
Chaos Golubitsky (GPO)
Harry Mussman (GPO)

4) Meetings

(organized calls or meetings before GEC13?)

Review conclusions in pre-meeting at GEC13

Review with working team at GEC13

Review with operators, monitoring team at GEC13

5) Open Issues

6) Definition

Definition of infrastructure measurements:

1) Passive measurements and monitoring of clusters/racks, including transport switches, etc.
2) Event monitoring, provides log entries
3) Active measurements of IP networks, of Layer 2 and OpenFlow paths

7) Passive Measurement and Monitoring

1) Supported Options

1) Aggregate operator establishes MP to gather MD via SNMP, organizes it into time-series data, and formulates MDOD

1) Directly from cluster/switch/etc.
2) Via Ganglia
3) Via Nagios and Check_MK (Open Monitoring Distributor) (OMD)
4) Via Cacti

2) MD users:

1) Local aggregate operator
2) GMOC (when authorized)
3) Experimenter (when authorized)

3) MD interface and transfers:

1) Time-series data, gathered by perfSONAR MA, presented at perfSONAR interface, MDOD registered at global UNIS, can be pulled by authorized user, and presented using perfSONAR service
2) Time-series data, gathered by BLiPP, pushed using AMQP to Periscope service, and presented using Periscope service
3) Time-series data, pushed using OML protocol, to OML server, and presented using GIMI service
4) Time-series data, pushed using GMOC protocol, to GMOC server, and presented using ? service
5) Time-series data, published to XML messaging service, can be subscribed by authorized user, and presented using ? service
6) Time-series data, published to XML messaging service, subscribed transfer service that pushes using GMOC protocol, to GMOC server, and presented using ? service

2) Open Issues

1) Identify which passive measurement and monitoring options are to be supported
2) Need final definition of MDOD schema, and MDOD creation software

8) Event Monitoring

1) Supported Options

1) Aggregate operator establishes MP to issue Event Records (ERs)

2) ER users:

1) Local aggregate operator
2) GMOC (when authorized)
3) Clearinghouse (when authorized)
4) Experimenter (when authorized)

3) ER interface transfers:

1) ER schema, published to XML messaging service, can be subscribed by authorized user, logged using ? service, presented using ? service

2) Open Issues

1) Identify which event monitoring options are to be supported
2) Need final definition of Event Record schema, XML format defined by NetKarma, adapted from MDOD; include active up/down events

9) Active Measurements

1) Supported Options

1) MD owner:

1) Aggregate operator
2) GMOC
3) Experimenter

2) Owner establishes slice, includes active measurement services, configures emasurements, and formulates MDOD

1) Persistent long-term slice
2) On-demand short-term slice, i.e., for troubleshooting

3) MD users:

1) Owner
2) Aggregate operator (when authorized)
3) GMOC (when authorized)
3) Experimenter (when authorized)

4) MD interface and transfers:

1) Time-series data, gathered by perfSONAR MA, presented at perfSONAR interface, MDOD registered at global UNIS, can be pulled by authorized user, and presented using perfSONAR service
2) Time-series data, pushed using OML protocol, to OML server, and presented using GIMI service

5) Active measurements:

1) For IP networks, i.e., ping and iperf
2) Specialized for L2 networks
3) Specialized for OF networks

2) Open Issues

1) Identify which active measurement options are to be supported
2) Define active measurement strategy for L2 networks
(Prasad Calyam) 3) Define active measurement strategy for OF networks
(Prasad Calyam)

3) Baseline Process

Baseline infrastructure measurement process:

1) Setup persistent or on-demand infrastructure measurement slice.
2) Configure and make active measurements.
3) Gather MD, and observe as it is gathered; formulate MDOD.
4) Store MD in collector, describe with MDOD, and register MDOD so that MD can be shared.
5) Typically share MD with Aggregate Operator, GMOC and/or Experimenters, per policy written into MDOD.
6) Pull MD out of collector, analyze and visualize.
7) Archive MD with MDOD.
8) Share archived MD with others, per policy included within MDOD.
9) Pull MD out of archive, to analyze and/or visualize.

10) I&M Support for Operators

What support must be provided for Operator? how?