Changes between Version 15 and Version 16 of OperationalMonitoring/Overview


Ignore:
Timestamp:
04/18/14 10:25:45 (10 years ago)
Author:
rirwin@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • OperationalMonitoring/Overview

    v15 v16  
    3030}}}
    3131
    32 Although there are multiple collectors, a single monitoring application uses only a single collector.
    33 
    34 == Use Cases ==
    35 
    36 === Simple Rack Health (use case 3) ===
    37 
    38 Use case description: Track node compute utilization, interface, and health statistics for shared rack nodes, and allow operators to get notifications when they are out of bounds
    39 
    40 Use case implementation story: Node statistics are time-series data, and are either collected on the node and pushed to the compute aggregate, or polled from each node by the compute aggregate (doesn't matter for our purposes). Statistics end up in a local database on each rack. Any group of operators that wants to send notifications on these statistics runs a collector, which polls all racks of interest to that group. The collector shares current values with an alerting service, which sends alerts.
    41 
    42 {{{
    43 #!html
    44 
    45 <table border="0" style="float:center">
    46   <tr>
    47        <td> <img src="http://groups.geni.net/geni/attachment/wiki/OperationalMonitoring/Overview/use_case_3.png?format=raw" width="285" height="300"> </td>
    48   </tr>
    49   <caption align="bottom"> <b> Simple Rack Health Statistics </b></caption>
    50 </table>
    51 }}}
    52 
    53 === Sliver Usage (use case 6) ===
    54 
    55 Use case description: Find out what slivers will be affected by a maintenance or outage of some resource, and get contact information for the owners of those slivers so targeted notifications can be sent
    56 
    57 Use case implementation story: Aggregates collect up-to-date information about what slivers exist and what resources they have reserved (including sliver details such as expiration time), and make this information available via a local datastore. GENI trust authorities (e.g. clearinghouses) collect up-to-date information about experimenters and their contact information, and make this information available via a local datastore. Operators who want to be able to get this information run a collector which can query the relevant datastores (since this is an on-demand real-time query, the collector doesn't need to be active all the time, though it may be). The collector data is used to run a report listing affected experimenters and their contact info.
    58 {{{
    59 #!html
    60 
    61 <table border="0" style="float:center">
    62   <tr>
    63        <td> <img src="http://groups.geni.net/geni/attachment/wiki/OperationalMonitoring/Overview/use_case_6.png?format=raw" width="300" height="285"> </td>
    64   </tr>
    65   <caption align="bottom"> <b> Sliver Resource Allocations </b></caption>
    66 </table>
    67 }}}
    68 === External Checks Datastore for Control and Data Plane Monitoring ===
     32Although there are multiple collectors, a single monitoring application uses only a single collector; however this is not a strict requirement of the system.
    6933
    7034
    71 {{{
    72 #!html
    73 
    74 <table border="0" style="float:center">
    75   <tr>
    76        <td> <img src="http://groups.geni.net/geni/attachment/wiki/OperationalMonitoring/Overview/extck_store_cp_dp.png?format=raw" width="400" height="285"> </td>
    77   </tr>
    78   <caption align="bottom"> <b> External Check Datastore part of Monitoring Control and Data Planes </b></caption>
    79 </table>
    80 }}}
    81 
    82 === Aggregate Datastores for Control and Data Plane Monitoring ===
    83 
    84 
    85 {{{
    86 #!html
    87 
    88 <table border="0" style="float:center">
    89   <tr>
    90        <td> <img src="http://groups.geni.net/geni/attachment/wiki/OperationalMonitoring/Overview/agg_store_cp_dp.png?format=raw" width="400" height="285"> </td>
    91   </tr>
    92   <caption align="bottom"> <b> Aggregate Datastores part of Monitoring Control and Data Planes </b></caption>
    93 </table>
    94 }}}
    95