wiki:OperationalMonitoring/DatastorePolling

Version 31 (modified by rirwin@bbn.com, 5 years ago) (diff)

--

Proposed API to poll local datastores for monitoring data

This is a working page for the operational monitoring project. It is a draft representing work in progress.

API Basics

This page describes the API to be used to poll Local datastores for monitoring data. The API will be developed with a polling mechanism first. Overview:

  • Polls are done from the collector to the Local datastore via a REST API which returns JSON text.
  • First, the Collector polls a Local datastore at '/info/' of the local store url to get information about what the datastore has. This is done through multiple polls, outlined below.
  • Second, the collector polls for time-series data at '/data/' to get event-based or measurement data. The query contains a set of event types, a set of object IDs, and timestamp filters, outlined below.

Info Queries

The collector retrieves from a config local datastore (to be developed and is hardcoded for now), a set of urls of local datastores that are associated with one or more aggregate managers (AMs). This allows the collector to poll local datastores for information about the data associated with each local datastore and AM at <datastore_url>/info/aggregate/<aggregate_id>. The information returned is a list of properties about the aggregate as well as a list of nodes, slivers, and VLANs. Here is a detailed example of an aggregate info query.

The collector will then query for information about particular objects one at a time (i.e., a node). The query returns a list of properties about the object as well as a list of associated objects (i.e., a node's interfaces). In this example, a node info query will be at <datastore_url>/info/node/<node_id>. Here is a a detailed example of a node info query. The collector would repeat for other nodes it received in the list in the aggregate info query.

Depending on what the collector is supporting above it in the form of different monitoring applications, it may continue querying the local datastore about objects or resources it has data about. They are of similar form. Here are links to examples:

If the collector is tracking sliver and slice info, the collector may also be directed to a slice authority local datastore, which is a different datastore than one containing shared host nodes. Here is a detailed example slice query, and here is a detailed example of a user query. Both of these queries are of similar form with <datastore_url>/info/slice/<slice_id> and <datastore_url>/info/user/<user_id> respectively.

The API does not support querying multiple objects in a single info query. For example, <datastore_url>/info/node/ will not return all the nodes' info at that local datastore.

Data Queries

Once an collector has information about the objects it wants to query, it can begin querying at <datastore_url>/data/. The data queries require three filters and passed in a python dictionary format:

  • "eventType": is a list of events or measurements to query. In the reference implementation these equate to database tables. This page contains a list of eventTypes
      "eventType": ["ops_monitoring:mem_used_kb","ops_monitoring:cpu_util"]
    
  • "ts": is a dictionary of timestamp filters. The accepted filters are gt, gte, lt, and lte. Queries to the local datastore must have either a gt or gte entry accompanied with a lt or lte filter.
        "ts":{"gte":1391192225475202,"lt":1391192225480000}
    
  • "obj": is a dictionary of objects of a single type. The dictionary contains a type, which can be a node, interface, sliver, etc., and an id list, which lists the id's to return data about.
        "obj":{"type":"node","id":["instageni.gpolab.bbn.com_node_pc1","instageni.gpolab.bbn.com_node_pc2"]}
    
  • Here is the complete example all together.
       <datastore_url>/data/?q={"filters":{"eventType": ["ops_monitoring:mem_used_kb","ops_monitoring:cpu_util"],"ts":{"gte":1391192225475202,"lt":1391192225480000},"obj":{"type":"node","id":["instageni.gpolab.bbn.com_node_pc1","instageni.gpolab.bbn.com_node_pc2"]}
    

The response to this REST data query is a list of data items. Here are a few example responses of a single data item. Here is an example of a bulk query response

Security

Access to the local datastore is restricted through the use of certificates enabling an SSL connection. Anyone running a collector or testing to see if their datastore is responding properly to queries will need a tool certificate. To get a tool certificate, follow these instructions. We suggest using a key without a passphrase for convenience. A passphrase-less key has a high enough level of security for our purposes. For the tool name, use collector. When you get your certificate, cat it with your private key to create a new file that has both parts. The resulting file is what you will need to give to your collector to enable it to access a datastore.

When a new request comes in, the following occurs:

  • The webserver makes sure the SSL certificate is signed by a GENI trust anchor. The local datastore webserver is configured to do this check.
  • The certificate is passed along to the application which parses out the Subject Alternative Name URN.
  • Last, the URN is checked to see if it is on the whitelist (those with permission to access operational data).
  • The request is answered as outlined above on this page.

The whitelist is maintained centrally in our git repository and the local datastores can poll to update their whitelist (infrequently).

Here is link to an email archive with slightly more detail of this security proposal.