wiki:OperationalMonitoring/DatastorePolling

Version 9 (modified by rirwin@bbn.com, 10 years ago) (diff)

--

Proposed API to poll local datastores for monitoring data

This is a working page for the operational monitoring project. It is a draft representing work in progress.

API Basics

This page describes the API to be used to poll Local datastores for monitoring data. The API will be developed with a polling mechanism first. Overview:

  • Polls are done from the Aggregator to the Local datastore via a REST API which returns JSON text. All JSON text is
  • First, the Aggregator polls a Local datastore at '/info/' of the local store url to get information about what the datastore has. This is done through multiple polls, outlined below.
  • Second, the Aggregator polls for time-series data at '/data/' to get event-based or measurement data. The query contains a set of event types, a set of object IDs, and timestamp filters, outlined below.

Info Queries

The aggregator retrieves from a config local datastore (to be developed and is hardcoded for now), a set of urls of local datastores that are associated with one or more aggregate managers (AMs). This allows the aggregator to poll local datastores for information about the data associated with each local datastore and AM at <datastore_url>/info/aggregate/<aggregate_id>. The information returned is a list of properties about the aggregate as well as a list of nodes, slivers, and VLANs (slivers and VLANs not implemented as of 17-Feb). Here is a detailed example of an aggregate info query.

The aggregator will then query for information about particular objects one at a time (i.e., a node). The query returns a list of properties about the object as well as a list of associated objects (i.e., a node's interfaces). In this example, a node info query will be at <datastore_url>/info/node/<node_id>. Here is a a detailed example of a node info query. The aggregator would repeat for other nodes it received in the list in the aggregate info query.

Depending on what the aggregator is supporting above it in the form of different monitoring applications, it may continue querying the local datastore about objects or resources it has data about. They are of similar form. Here are links to examples:

If the aggregator is tracking sliver and slice info, the aggregator may also be directed to a slice authority local datastore, which is a different datastore than one containing shared host nodes. Here is a detailed example slice query, and here is a detailed example of a user query. Both of these queries are of similar form with <datastore_url>/info/slice/<slice_id> and <datastore_url>/info/user/<user_id> respectively.

As of 17-Feb, we currently do not support querying multiple objects in a single info query. For example, <datastore_url>/info/node/ will not return all the nodes' info at that local datastore.

Data Queries

Once an aggregator has information about the objects it wants to query, it can begin querying at <datastore_url>/data/. The data queries require three filters and passed in a python dictionary format:

  • "eventType": is a list of events or measurements to query. In the reference implementation these equate to database tables. This page contains a list of eventTypes
      "eventType": ["mem_used","cpu_util"]
    
  • "ts": is a dictionary of timestamp filters. It is recommended to provide a ts dictionary that includes at least a greater than filter (gte or gt) to limit the amount of windowed data. The accepted filters are gt, gte, lt, and lte.
       "ts":{"gte":1391192225475202,"lt":1391192225480000}
    
  • "obj": is a dictionary of objects of a single type. The dictionary contains a type, which can be a node, interface, sliver, etc., and an id list, which lists the id's to return data about.
        "obj":{"type":"node","id":["404-ig-pc1","404-ig-pc2"]
    

Here is a complete example

   <datastore_url>/data/q?={"filters":{"eventType": ["mem_used","cpu_util"],"ts":{"gte":1391192225475202,"lt":1391192225480000},"obj":{"type":"node","id":["404-ig-pc1","404-ig-pc2"]}

The REST noun and JSON format include the following:

  • Noun describing what type of data it is
  • Metadata about the response in the response like which local datastore and type of response.
  • Result groupings that group together all time-value pairs which are keyed by all combinations of other attributes (e.g. aggregate-id and resource-id). See the example below for clarification.

The reference implementations for A - E from the component diagram leverages the schema for both the REST API and table structure in the databases. Reusing a common schema for the datastores and REST API has eased development thus far.

The list of nouns and attributes associated with each noun are being formulated here.

Example: The current prototype has components get the following from the config store:

schema["mem_used"] = [("id","varchar"), ("ts", "int8"), ("v", "float4")]
http://127.0.0.1:5000/data/q?={"filters":{"eventType": ["mem_used","cpu_util"],"ts":{"gte":1391192225475202,"lt":1391192225480000},"obj":{"type":"node","id":["404-ig-pc1","404-ig-pc2"]}}}
{
"response_type": "data_poll",
"data_type": "memory_util", 
"results": 
  [{
   "aggregate_id": "404-ig",
   "resource_id": "compute_node_1", 
   "measurements": {"ts": 1391192225381372, "v": 27.3}, {"ts": 1391192225589189, "v": 27.3}, {"ts": 1391192225792371, "v": 27.3}
   }, 
   {
   "aggregate_id": "404-ig",
   "resource_id": "compute_node_2", 
   "measurements":{"ts": 1391192225381987, "v": 29.5}, {"ts": 1391192225589468, "v": 29.5}
  }]
}

Simple API in Noun Hierarchy and Set of Filters

The goal of the aggregator is to gather data about a collection of resources, so having a lot of /noun1/noun2/noun3/noun4/etc. may not be necessary since /noun1 will gather all of items whenever possible. For example "/memory_util/<compute_node_id>" is not in the initial implementation.

The same thing applies to filters provided with '?' after the noun. The "?since=<senconds since epoch>" filter is the only required filter now. Other filters in consideration are data transformation filters like sampling_rate or average_since and simpler filters like greater_than or less_than.

Security

Each Local store authenticates and encrypts data to Aggregators using certificates.

Configuration (OUT OF DATE, UPDATED SOON)

The Aggregator queries the Config Local datastore for a list of all Local stores for connection information (URL/port) and the nouns each local store possesses. This will also use a simple REST API called "/local_info" and potentially "/local_info/local_store_i".

The Aggregator can query the Local store using the "/info/<noun>" to get information about the Local stores time-value pair collections. To follow the example above, the REST call

http://127.0.0.1:5000/info/memory_util

would yield

{
"response_type": "info_poll",
"data_type": "memory_util", 
"measured_components": 
  [ {"aggregate_id":"aggregate_1", "resource_id":"compute_node_1"} , 
    {"aggregate_id":"aggregate_1", "resource_id":"compute_node_2"} ]
}

The Aggregator is configured by its operator to query a set of local stores for a set of their data.

Reference Prototypes

The reference prototype includes components A - E to test end-to-end functionality of the entire system, which exceeds the scope of the topic of this page. The requirements include topics for the REST calls and JSON responses as well as security.

The reference prototype will be made available soon (said 1/30/2014).

Additional Resources

See REST Overview for a nice introduction to REST, and JSON Overview for nice introduction to JSON.