wiki:OperationalMonitoring/DataSchema

Context Navigation

Version 14 (modified by chaos@bbn.com, 10 years ago) (diff)
--

Data needed to meet operational monitoring use cases

Data needed to meet operational monitoring use cases

This is a working page for the operational monitoring project. It is a draft representing work in progress.

This page will eventually document the schema or schemas using which the datastore polling API? will request data from datastores. Right now, it simply lists the pieces of data which are needed in order to meet the short list of use cases on which are focusing for GEC19 and GEC20. As such, this page is closely tied to the use case component details pages at http://www.gpolab.bbn.com/monitoring/components/.

Table of all data

Types of data:

measurement: a time-series value which is collected frequently
state: an existing set of relations which may change as a result of an event
config: data which is unlikely to change frequently, but should be polled occasionally in case it has changed

To fix:

the schema below shows how to use metadata to get the list of control and dataplane interfaces, so we shouldn't need to use sums in the interface counters, but the table still says "sum"

Subject	Metric	Type	Units	Description	Use Cases
shared compute node	CPU utilization	measurement	percent		3
shared compute node	swap free	measurement	percent	percent of total swap which is free	3
shared compute node	memory total	config	bytes	total physical memory on the node	3
shared compute node	memory used	measurement	bytes	total memory in active use on the node	3
shared compute node	disk part max used	measurement	percent	highest percent utilization of any local partition	3
shared compute node	ctrl net max bytes	config	integer	sum of maximum bytes per second available on all control interfaces	3
shared compute node	ctrl net RX bytes	measurement	integer	sum of bytes received on all control interfaces since last reset	3
shared compute node	ctrl net TX bytes	measurement	integer	sum of bytes transmitted on all control interfaces since last reset	3
shared compute node	ctrl net max packets	config	integer	sum of maximum packets per second available on all control interfaces	3
shared compute node	ctrl net RX packets	measurement	integer	sum of packets received on all control interfaces since last reset	3
shared compute node	ctrl net TX packets	measurement	integer	sum of packets transmitted on all control interfaces since last reset	3
shared compute node	ctrl net RX errs	measurement	integer	sum of receive errors on all control interfaces since last reset	3
shared compute node	ctrl net TX errs	measurement	integer	sum of transmit errors on all control interfaces since last reset	3
shared compute node	ctrl net RX drops	measurement	integer	sum of receive drops (how does it know?) on all control interfaces since last reset	3
shared compute node	ctrl net TX drops	measurement	integer	sum of transmit drops on all control interfaces since last reset	3
shared compute node	data net max bytes	config	integer	sum of maximum bytes per second available on all dataplane interfaces	3
shared compute node	data net RX bytes	measurement	integer	sum of bytes received on all dataplane interfaces since last reset	3
shared compute node	data net TX bytes	measurement	integer	sum of bytes transmitted on all dataplane interfaces since last reset	3
shared compute node	data net max packets	config	integer	sum of maximum packets per second available on all dataplane interfaces	3
shared compute node	data net RX packets	measurement	integer	sum of packets received on all dataplane interfaces since last reset	3
shared compute node	data net TX packets	measurement	integer	sum of packets transmitted on all dataplane interfaces since last reset	3
shared compute node	data net RX errs	measurement	integer	sum of receive errors on all dataplane interfaces since last reset	3
shared compute node	data net TX errs	measurement	integer	sum of transmit errors on all dataplane interfaces since last reset	3
shared compute node	data net RX drops	measurement	integer	sum of receive drops (how does it know?) on all dataplane interfaces since last reset	3
shared compute node	data net TX drops	measurement	integer	sum of transmit drops on all dataplane interfaces since last reset	3
shared compute node	is available	measurement	boolean	is the node considered to be online as the result of a simple check at the given time?	3
aggregate	current sliver list	state	list	list of slivers (URN + UUID) currently existing or reserved on the aggregate	6
sliver	slice URN/UUID	state	string	unique identifier of slice mapped to sliver (URN + UUID)	6
sliver	creation time	state	timestamp	creation time of sliver	6
sliver	expiration time	state	timestamp	current expiration time of sliver	6
sliver	creator URN	state	string	URN of sliver creator	6
sliver	resources	state	list	list of resource URNs on which the sliver has a current reservation	6
slice	creator	state	string	URN of slice creator	6
slice	participants	state	list	list of experimenters who have privileges on a slice	6
experimenter	email	state	string	contact address for experimenter	6
config datastore	current datastore list	config	list	list of local datastores to query for GENI monitoring data	3, 6
local datastore	data types	config	list	list of data types which the datastore supports	3, 6

Details of data needed to meet all use cases

We haven't specced out the exact syntax of what information aggregators will get from the config datastore which will tell them what other datastores to query for information. It will need to include at least the location of each datastore, what data types you can ask that datastore for, and presumably information about what aggregates that datastore supports, etc. I put in a couple of placeholder items for this at the bottom of the list, but it'll need to be fleshed out.

Data needed to meet use case 3

Use case description: Track node compute utilization, interface, and health statistics for shared rack nodes, and allow operators to get notifications when they are out of bounds.

Proposed components for this use case

In general, for this use case, we want:

CPU utilization: it's pretty standard for this to be a percentage, so we'll do that too.
Memory utilization: there's not as much of a standard for this. Purely as an alert metric, "swap free" is a good indication of when the node is too busy. That doesn't tell you much about whether the node's memory is active over time. I believe ganglia reports the difference of two stats from /proc/meminfo, "Active" - "Cached", and calls that "Memory Used". Is that a good/well-understood metric?
Disk utilization: i am partial to ganglia's "part max used" check, which looks at the local utilization of all local partitions on a node, and reports the fullest (highest) utilization percent it sees. It doesn't tell you what your problem is, but it tells you if you have a problem, and it's a single metric regardless of the number of partitions on a node.
Network utilization: in order to measure utilization, i think we want metrics for control traffic and dataplane traffic, each of which is the sum of counters for all control or dataplane interfaces of the node (if there is more than one of either). Linux /proc/net/dev reports rx_bytes, rx_packets, rx_errs, rx_drops, and the same four items for tx. So that would be 16 pieces of data per node. Does that seem right, or does that seem like too much? Another thing i don't know is, where in the system do we want to translate a number into a rate --- is it actually correct to just report these numbers as integers upstream, and have the aggregator be responsible for generating a rate, or is it better for a rate to be created locally?
Node availability: this is not intended as a detailed check of whether the node is usable for some particular experimental purpose --- that would be out of scope for this use case. It's more like a simple "is this thing on?" check. It would be fine for this to be reported as "OK" if any other data is received from the node at a given time, and "not okay" otherwise, or it would be fine for the aggregate to try to ping the node control plane and report that. This doesn't have to be consistent, and shouldn't be complicated.
Node health metrics: people suggested we might want to alert on RAID failures and on NTP sync issues. I'd like to keep track of those requests, but they're not part of the initial thin thread, so they won't be included here.
We probably also need some form of metadata about each node: not collected all the time, but available for periodic query. For instance, we probably need to know what type of VM server it is (for general information), and what the maximum values are for any metrics we're reporting as rates or counters (e.g. network utilization) rather than as percentages, because we can't tell if we're hitting the maximum if we don't know what the maximum is.

Data needed to meet use case 6

Use case description: Find out what slivers will be affected by a maintenance or outage of some resource, and get contact information for the owners of those slivers so targeted notifications can be sent

Proposed components for this use case

In general, for this use case, we want:

Sliver data:
- What slivers exist on a GENI aggregate right now: i think we always want "right now" even if the outage isn't going to be right now --- if reservations are implemented and thus there's an idea of known slivers that will exist in the future but don't exist yet, we'll want that. But, while a reporting tool might choose to omit slivers which are expiring before the time of interest, it might choose not to on the grounds that slivers often get renewed --- it should be up to the tool, so always report the maximum number of slivers the AM knows about now or in the future.
- Information about each sliver:
  - Sliver URN and UUID
  - Slice URN and UUID
  - Creation and expiration times
  - Creator (maybe this is optional because some AMs will always tell us to ask the SA? not sure)
  - Resources this sliver has reserved:
    - URN of each named resource of types: bare-metal host, shared host, VLAN, flowspace (what else?)
Slice experimenter data: for each relevant slice URN and UUID, find out from the authority:
- Experimenters affiliated with the slice (creator, participants)
- E-mail contact info for each of those experimenters

Proposed data schema

I propose a schema based on, and partially compatible with, http://unis.incntre.iu.edu/schema/20120709/, for measurement data and metadata. In particular, i suggest using:

http://unis.incntre.iu.edu/schema/20120709/domain: metadata about an aggregate (including a list of resources at that aggregate)
http://unis.incntre.iu.edu/schema/20120709/node: metadata about a resources (including resource properties termed "config" above, and a list of relevant ports (or of all ports, whichever is easier) on that resource)
http://unis.incntre.iu.edu/schema/20120709/port: metadata about a network interface (including resource properties termed "config" above)

http://www.gpolab.bbn.com/monitoring/schema/20140131/data (not yet posted): novel schema based on a combination of http://unis.incntre.iu.edu/schema/20120709/metadata and http://unis.incntre.iu.edu/schema/20120709/tsdatum:

{
  "$schema": "http://json-schema.org/draft-03/hyper-schema#",
  "id": "http://www.gpolab.bbn.com/monitoring/schema/20140131/data#",
  "description": "Operational monitoring data",
  "name": "Data",
  "type": "object",
  "additionalProperties": false,
  "extends": {
    "$ref": "http://unis.incntre.iu.edu/schema/20120709/metadata#"
  },
  "properties": {
    "units": {
      "units": "Valid units for the values in this metric",
      "type": "string",
      "required": true
    },
    "description": {
      "description": "Description of this metric",
      "type": "string",
      "required": false
    },
    "tsdata": {
      "description": "Time-series data",
      "name": "tsdata",
      "type": "array",
      "required": false
    }
  }
}

For now, we would use the ops_monitoring namespace for operations monitoring, meaning:
- When we add monitoring-relevant optional properties to objects, we'll put them in an ops_monitoring dictionary
- When we setup operations monitoring measurements, we'll give them the eventType ops_monitoring:<something>

Data schema usage example

Some examples usages of the above schemas to encode metadata and data needed for use cases 3 and 6 follow.

These examples assume the following (fictitious) local datastore URLs. These are arbitrary, and any place they appear, they can be replaced by whatever name the deployers prefer. For simplicitly, i've shown one local datastore per aggregate here, but the architecture does not require that --- it would be perfectly fine to have one datastore for relational metadata and one for the data itself, or one for certain types of relational metadata and one for others.

https://datastore.instageni.gpolab.bbn.com/: datastore for gpo-ig

Data about an aggregate

Aggregates are indexed by GENI-agreed short name and described using the domain schema. Examples:

gpo-ig:

{
  'id': 'gpo-ig',
  'selfRef': 'https://datastore.instageni.gpolab.bbn.com/domains/gpo-ig',
  'urn': 'urn:publicid:IDN+instageni.gpolab.bbn.com+authority+cm',
  'ts': 1391192685740849,
  'nodes': [
    { 
      'href': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc1',
    },
    { 
      'href': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc2',
    },
  ],
}

Data about a node

Nodes have an ID which is a URL-sanitized version of their URN and are described using the node schema. Examples:

pc1 node at gpo-ig:

{
  'id': 'instageni.gpolab.bbn.com_node_pc1',
  'ts': 1391192705275101,
  'selfRef': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc1',
  'urn': 'urn:publicid:IDN+instageni.gpolab.bbn.com+node+pc1',
  'properties': {
    'ops_monitoring': {
      'mem_total_kb': 50331648,
    },
  },
  'ports': [
    { 
      'href': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1%3Aeth0',
    }
    { 
      'href': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1%3Aeth1',
    }
    { 
      'href': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1%3Aeth2',
    }
    { 
      'href': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1%3Aeth3',
    }
  ],
}

Data about an interface

Interfaces have an ID which is a URL-sanitized version of their URN and are described using the port schema. Notes:

I adopted the control/experimental terminology for interface roles from ProtoGENI listresources output. We could also use control/data; at any rate, we should be consistent among all monitoring uses.
All bandwidths are total fiction. I didn't even count the zeroes.

Examples:

pc1:eth0 (control) interface at gpo-ig:

{
  'selfRef': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1:eth0',
  'urn': 'urn:publicid:IDN+instageni.gpolab.bbn.com+interface+pc1:eth0',
  'ts': 1391194147100678,
  'id': 'instageni.gpolab.bbn.com_interface_pc1:eth0',
  'properties': {
    'ops_monitoring': {
      'role': 'control',
      'max_bps': 10000000,
      'max_pps': 1000000,
    },
  },
}

pc1:eth1 (dataplane) interface at gpo-ig:

{
  'selfRef': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1:eth1',
  'urn': 'urn:publicid:IDN+instageni.gpolab.bbn.com+interface+pc1:eth1',
  'ts': 1391194147100678,
  'id': 'instageni.gpolab.bbn.com_interface_pc1:eth1',
  'properties': {
    'ops_monitoring': {
      'role': 'experimental',
      'max_bps': 10000000,
      'max_pps': 1000000,
    },
  },
}

Measurements used for the use cases

Measurements have an opaque ID which is generated by the local datastore which serves them, and must be persistent, so that the caller has the option of asking for the measurement by ID. They are described using the data schema outlined above. Examples:

CPU utilization metric on pc1:

{
  'id': '1',
  'subject': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc1',
  'eventType': 'ops_monitoring:cpu_utilization',
  'description': 'CPU utilization percentage',
  'units': 'percent',
  'tsdata': {
    { 'ts': 1391198716651283, 'val': 0.45, },
    { 'ts': 1391198776651284, 'val': 0.44, },
    { 'ts': 1391198836651284, 'val': 0.44, },
    { 'ts': 1391198896651284, 'val': 0.47, },
    { 'ts': 1391198956651284, 'val': 0.46, },
    { 'ts': 1391199016651285, 'val': 0.47, },
  },
}

Percentage of swap available on pc1:

# Metadata and data for the swap_free metric on pc1 at gpo-ig.
{
  'id': '2',
  'subject': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc1',
  'eventType': 'ops_monitoring:swap_free',
  'description': 'Percentage of swap available',
  'units': 'percent',
  'tsdata': {
    { 'ts': 1391198716651283, 'val': 0.95, },
    { 'ts': 1391198776651284, 'val': 0.95, },
    { 'ts': 1391198836651284, 'val': 0.95, },
    { 'ts': 1391198896651284, 'val': 0.95, },
    { 'ts': 1391198956651284, 'val': 0.95, },
    { 'ts': 1391199016651285, 'val': 0.95, },
  },
}

Memory in active use on pc1:

{
  'id': '3',
  'subject': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc1',
  'eventType': 'ops_monitoring:mem_active_kb',
  'description': 'Memory in active use',
  'units': 'kilobytes',
  'tsdata': {
    { 'ts': 1391198716651283, 'val': 20030048, },
    { 'ts': 1391198776651284, 'val': 20031148, },
    { 'ts': 1391198836651284, 'val': 20031148, },
    { 'ts': 1391198896651284, 'val': 22222222, },
    { 'ts': 1391198956651284, 'val': 22222222, },
    { 'ts': 1391199016651285, 'val': 22222222, },
  },
}

Download in other formats:

Plain Text