Context Navigation

Changes between Version 6 and Version 7 of OperationalMonitoring/DataSchema

Timestamp:: 01/21/14 16:21:22 (10 years ago)
Author:: chaos@bbn.com
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

OperationalMonitoring/DataSchema

-                      v6
+                      v7
  * '''Node availability''': this is ''not'' intended as a detailed check of whether the node is usable for some particular experimental purpose --- that would be out of scope for this use case.  It's more like a simple "is this thing on?" check.  It would be fine for this to be reported as "OK" if any other data is received from the node at a given time, and "not okay" otherwise, or it would be fine for the aggregate to try to ping the node control plane and report that.  This doesn't have to be consistent, and shouldn't be complicated.
  * '''Node health metrics''': people suggested we might want to alert on RAID failures and on NTP sync issues.  I'd like to keep track of those requests, but they're not part of the initial thin thread, so they won't be included here.
+ * We probably also need some form of metadata about each node: not collected all the time, but available for periodic query.  For instance, we probably need to know what type of VM server it is (for general information), and what the maximum values are for any metrics we're reporting as rates or counters (e.g. network utilization) rather than as percentages, because we can't tell if we're hitting the maximum if we don't know what the maximum is.
 Restating that in tabular form:
 …
  * [http://www.gpolab.bbn.com/monitoring/components/use_case_06.html Proposed components for this use case]
+In general, for this use case, we want:
+ * '''Sliver data''':
+   * '''What slivers exist on a GENI aggregate right now:''' i think we always want "right now" even if the outage isn't going to be right now --- if reservations are implemented and thus there's an idea of known slivers that will exist in the future but don't exist yet, we'll want that.  But, while a reporting tool might choose to omit slivers which are expiring before the time of interest, it might choose not to on the grounds that slivers often get renewed --- it should be up to the tool, so always report the maximum number of slivers the AM knows about now or in the future.
+   * '''Information about each sliver''':
+     * Sliver URN and UUID
+     * Slice URN and UUID
+     * Creation and expiration times
+     * Creator (maybe this is optional because some AMs will always tell us to ask the SA?  not sure)
+     * Resources this sliver has reserved:
+       * URN of each named resource of types: bare-metal host, shared host, VLAN, flowspace (what else?)
+ * '''Slice user data''': for each relevant slice URN and UUID, find out from the authority:
+   * Users affiliated with the slice (creator, participants)
+   * E-mail contact info for each of those users