Changes between Version 10 and Version 11 of OperationalMonitoring/DataSchema


Ignore:
Timestamp:
01/31/14 17:26:32 (6 years ago)
Author:
chaos@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • OperationalMonitoring/DataSchema

    v10 v11  
    5353|| local datastore     || data types             || config      || list        || list of data types which the datastore supports                                        || 3, 6            ||
    5454
    55 == Data needed to meet all use cases ==
     55== Details of data needed to meet all use cases ==
    5656
    5757We haven't specced out the exact syntax of what information aggregators will get from the config datastore which will tell them what other datastores to query for information.  It will need to include at least the location of each datastore, what data types you can ask that datastore for, and presumably information about what aggregates that datastore supports, etc.  I put in a couple of placeholder items for this at the bottom of the list, but it'll need to be fleshed out.
    5858
    59 == Data needed to meet use case 3 ==
     59=== Data needed to meet use case 3 ===
    6060
    6161Use case description: Track node compute utilization, interface, and health statistics for shared rack nodes, and allow operators to get notifications when they are out of bounds.
     
    7171 * We probably also need some form of metadata about each node: not collected all the time, but available for periodic query.  For instance, we probably need to know what type of VM server it is (for general information), and what the maximum values are for any metrics we're reporting as rates or counters (e.g. network utilization) rather than as percentages, because we can't tell if we're hitting the maximum if we don't know what the maximum is.
    7272
    73 == Data needed to meet use case 6 ==
     73=== Data needed to meet use case 6 ===
    7474
    7575Use case description: Find out what slivers will be affected by a maintenance or outage of some resource, and get contact information for the owners of those slivers so targeted notifications can be sent
     
    8989   * Experimenters affiliated with the slice (creator, participants)
    9090   * E-mail contact info for each of those experimenters
     91
     92== Proposed data schema ==
     93
     94I propose a schema based on, and partially compatible with, [http://unis.incntre.iu.edu/schema/20120709/], for measurement data and metadata.  In particular, i suggest using:
     95 * [http://unis.incntre.iu.edu/schema/20120709/domain]: metadata about an aggregate (including a list of resources at that aggregate)
     96 * [http://unis.incntre.iu.edu/schema/20120709/node]: metadata about a resources (including resource properties termed "config" above, and a list of relevant ports (or of all ports, whichever is easier) on that resource)
     97 * [http://unis.incntre.iu.edu/schema/20120709/port]: metadata about a network interface (including resource properties termed "config" above)
     98 * http://www.gpolab.bbn.com/monitoring/schema/20140131/data (not yet posted): novel schema based on a combination of [http://unis.incntre.iu.edu/schema/20120709/metadata] and [http://unis.incntre.iu.edu/schema/20120709/tsdatum]:
     99{{{
     100{
     101  "$schema": "http://json-schema.org/draft-03/hyper-schema#",
     102  "id": "http://www.gpolab.bbn.com/monitoring/schema/20140131/data#",
     103  "description": "Operational monitoring data",
     104  "name": "Data",
     105  "type": "object",
     106  "additionalProperties": false,
     107  "extends": {
     108    "$ref": "http://unis.incntre.iu.edu/schema/20120709/metadata#"
     109  },
     110  "properties": {
     111    "units": {
     112      "units": "Valid units for the values in this metric",
     113      "type": "string",
     114      "required": true
     115    },
     116    "description": {
     117      "description": "Description of this metric",
     118      "type": "string",
     119      "required": false
     120    },
     121    "tsdata": {
     122      "description": "Time-series data",
     123      "name": "tsdata",
     124      "type": "array",
     125      "required": false
     126    }
     127  }
     128}
     129}}}
     130 * For now, we would use the `ops_monitoring` namespace for operations monitoring, meaning:
     131   * When we add monitoring-relevant optional properties to objects, we'll put them in an `ops_monitoring` dictionary
     132   * When we setup operations monitoring measurements, we'll give them the eventType `ops_monitoring:<something>`
     133
     134=== Data schema usage example ===
     135
     136Some examples usages of the above schemas to encode metadata and data needed for use cases 3 and 6 follow.
     137
     138These examples assume the following (fictitious) local datastore URLs.  These are arbitrary, and any place they appear, they can be replaced by whatever name the deployers prefer.  For simplicitly, i've shown one local datastore per aggregate here, but the architecture does ''not'' require that --- it would be perfectly fine to have one datastore for relational metadata and one for the data itself, or one for certain types of relational metadata and one for others.
     139 * `https://datastore.instageni.gpolab.bbn.com/`: datastore for gpo-ig
     140
     141==== Data about an aggregate ====
     142
     143Aggregates are indexed by GENI-agreed short name and described using [http://unis.incntre.iu.edu/schema/20120709/domain# the domain schema].  Examples:
     144 * gpo-ig:
     145{{{
     146{
     147  'id': 'gpo-ig',
     148  'selfRef': 'https://datastore.instageni.gpolab.bbn.com/domains/gpo-ig',
     149  'urn': 'urn:publicid:IDN+instageni.gpolab.bbn.com+authority+cm',
     150  'ts': 1391192685740849,
     151  'nodes': [
     152    {
     153      'href': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc1',
     154    },
     155    {
     156      'href': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc2',
     157    },
     158  ],
     159}
     160}}}
     161
     162==== Data about a node ====
     163
     164Nodes have an ID which is a URL-sanitized version of their URN and are described using [http://unis.incntre.iu.edu/schema/20120709/node# the node schema].  Examples:
     165 * pc1 node at gpo-ig:
     166{{{
     167{
     168  'id': 'instageni.gpolab.bbn.com_node_pc1',
     169  'ts': 1391192705275101,
     170  'selfRef': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc1',
     171  'urn': 'urn:publicid:IDN+instageni.gpolab.bbn.com+node+pc1',
     172  'properties': {
     173    'ops_monitoring': {
     174      'mem_total_kb': 50331648,
     175    },
     176  },
     177  'ports': [
     178    {
     179      'href': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1%3Aeth0',
     180    }
     181    {
     182      'href': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1%3Aeth1',
     183    }
     184    {
     185      'href': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1%3Aeth2',
     186    }
     187    {
     188      'href': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1%3Aeth3',
     189    }
     190  ],
     191}
     192}}}
     193
     194==== Data about an interface ====
     195
     196Interfaces have an ID which is a URL-sanitized version of their URN and are described using [http://unis.incntre.iu.edu/schema/20120709/port# the port schema].  Notes:
     197 * I adopted the control/experimental terminology for interface roles from ProtoGENI listresources output.  We could also use control/data; at any rate, we should be consistent among all monitoring uses.
     198 * All bandwidths are total fiction.  I didn't even count the zeroes.
     199
     200Examples:
     201 * pc1:eth0 (control) interface at gpo-ig:
     202{{{
     203{
     204  'selfRef': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1:eth0',
     205  'urn': 'urn:publicid:IDN+instageni.gpolab.bbn.com+interface+pc1:eth0',
     206  'ts': 1391194147100678,
     207  'id': 'instageni.gpolab.bbn.com_interface_pc1:eth0',
     208  'properties': {
     209    'ops_monitoring': {
     210      'role': 'control',
     211      'max_bps': 10000000,
     212      'max_pps': 1000000,
     213    },
     214  },
     215}
     216}}}
     217 * pc1:eth1 (dataplane) interface at gpo-ig:
     218{{{
     219{
     220  'selfRef': 'https://datastore.instageni.gpolab.bbn.com/ports/instageni.gpolab.bbn.com_interface_pc1:eth1',
     221  'urn': 'urn:publicid:IDN+instageni.gpolab.bbn.com+interface+pc1:eth1',
     222  'ts': 1391194147100678,
     223  'id': 'instageni.gpolab.bbn.com_interface_pc1:eth1',
     224  'properties': {
     225    'ops_monitoring': {
     226      'role': 'experimental',
     227      'max_bps': 10000000,
     228      'max_pps': 1000000,
     229    },
     230  },
     231}
     232}}}
     233
     234==== Measurements used for the use cases ====
     235
     236Measurements have an opaque ID which is generated by the local datastore which serves them, and must be persistent, so that the caller has the option of asking for the measurement by ID.  They are described using the data schema outlined above.  Examples:
     237 * CPU utilization metric on pc1:
     238{{{
     239{
     240  'id': '1',
     241  'subject': 'https://datastore.instageni.gpolab.bbn.com/nodes/instageni.gpolab.bbn.com_node_pc1',
     242  'eventType': 'ops_monitoring:cpu_utilization',
     243  'description': 'CPU utilization percentage',
     244  'units': 'percent',
     245  'tsdata': {
     246    { 'ts': 1391198716651283, 'val': 0.45, },
     247    { 'ts': 1391198776651284, 'val': 0.44, },
     248    { 'ts': 1391198836651284, 'val': 0.44, },
     249    { 'ts': 1391198896651284, 'val': 0.47, },
     250    { 'ts': 1391198956651284, 'val': 0.46, },
     251    { 'ts': 1391199016651285, 'val': 0.47, },
     252  },
     253}
     254}}}