[[PageOutline]] = GENI Monitoring Alerts = The GENI monitoring alerts system is based on the detection of events based on metric data that polled from remote systems. Raw data is published to a queueing system, which allows multiple complex event queries to operate on the same data stream in parallel. Output of complex queries can generate Nagios alerts, log results to a database, or both. == Poll to raw metric stream == As part of the polling process raw data is both recorded in a database and pushed to a queue. The queue serves as a fanout interface for a one-to-many raw metric subscription service. [[Image(https://www.rabbitmq.com/img/tutorials/python-three-overall.png)]]* In the previous figure ''P'' represents our polling agent, which publishes data to a queue exchange represented by ''X''. Clients, designated as ''C1'' and ''C2'', subscribe to exchanges by binding their own queues to exchanges. In the example, data published by ''P'' is replicated by ''X'' to client queues ''amq.gen-RQ6..'' for client ''C1'' and ''amq.gen-As8...'' for client ''C2''. == Stream query of metric stream == The publish/subscribe queuing system allows streams of raw metric data to be replicated between many processes in parallel. This allows us to instantiate one or more complex event processing engines ''CEPE'' per replicated data stream and one or more queries inside of each CEPE. We make use of the Esper [http://www.espertech.com/] CEPE. ==== Esper complex event processing engine ==== Esper allows us to analyze large volumes of incoming messages or events, regardless of whether incoming messages are historical or real-time in nature. Esper filters and analyzes events in various ways, and respond to conditions of interest. An example of the Esper CEPE architecture is shown in the figure below. [[Image(http://www.espertech.com/images/products_esp_cep.jpeg)]]** Simply, ''CEPE queries'' are pattern-based (matching) subscriptions describing a possible future event. If the described event occurs, a described output is emitted from the CEPE. ==== Esper query format ==== In a typical database we query existing data based on some declarative language. We can think of and Esper query like an upside down SQL, where if events occur in the future, results will be emitted. The Using the ESPER query language, ''EPL'' (similar to SQL) complex events can are described. The EPL language reference and examples can be found here: [[http://esper.sourceforge.net/esper-0.7.5/doc/reference/en/html/EQL.html]] Consider the following EPL query: {{{ select count(*) from MyEvent(somefield = 10).win:time(3 min) having count(*) >= 5 }}} * There exist a stream of events named ''MyEvent''. * In the ''MyEvent'' stream there are events that contain a field named: ''somefield'' * In a 3 minute window, if ''somefield'' = 10 five or more times, emit data. Just as traditional relational databases, and their related SQL queries, use specific data type operations based on column data types, data streams processed by Esper are defined by strongly typed object classes. In the previous EPL query the ''somefield'' field would have to defined as a numeric time in order for mathematical comparison to work. ==== GENI monitoring stream data format ==== For GENI Monitoring alerts, we use the LogTick class shown in the code block below: {{{ public static class LogTick { String urn; String metric; long ts; double value; public LogTick(String urn, String metric, long ts, double value) { this.urn = urn; this.metric = metric; this.ts = ts; this.value = value; } public String getUrn() {return urn;} public String getMetric() {return metric;} public long getTs() {return ts;} public double getValue() {return value;} @Override public String toString() { return "urn:" + urn + " metric:" + metric + " timestamp:" + ts + " value:" + value; } } }}} ==== Example GENI monitoring stream queries ==== Note how the following data types are used in the example queries. {{{ ... String urn; String metric; long ts; double value; ... }}} * If metric ''gpo:is_available'' is set to ''1'' emit ''OK'' {{{ select urn, metric, ts, value, 'OK' AS alertlevel from LogTick(metric='gpo:is_available') where value = 1 }}} * If metric ''gpo:is_available'' is set to ''1'' emit ''CRITICAL'' {{{ select urn, metric, ts, value, 'CRITICAL' AS alertlevel from LogTick(metric='gpo:is_available') where value = 0 }}} * If a urn with the metric ''gpo:is_available'' is observed once, but not observed again for 60 min emit ''WARNING'' {{{ select a.urn AS urn, a.metric AS metric, a.ts AS ts , 'WARNING' AS alertlevel from pattern [ every a=LogTick(metric='gpo:is_available') -> (timer:interval(60 min)) and not LogTick(urn=a.urn) ] group by a }}} * Ping times greater than 10,000ms {{{ select * from LogTick(metric='ping_rtt_ms') where value > 10000.0 }}} * If a urn is seen and then not seen again for 60min {{{ select count(*) from pattern [ every a=LogTick -> (timer:interval(60 min)) and not LogTick(urn=a.urn) ] group by a }}} == Creating stream queries == 1. Login to the GENI Monitoring site: [[http://genimon.uky.edu]] 2. Click on the ''Alerting System'' under the ''GENI Reporting'' tab, as shown in the figure below. [[Image(http://groups.geni.net/geni/raw-attachment/wiki/GENIMonitoring/Alerts/side_bar.png)]] 3. On the Alert page click on ''Build New Alert'' on the top right of the screen, shown in the figure below. [[Image(http://groups.geni.net/geni/raw-attachment/wiki/GENIMonitoring/Alerts/alert_page.png)]] 4. You are now in the stream query builder page, shown in the figure below. [[Image(http://groups.geni.net/geni/raw-attachment/wiki/GENIMonitoring/Alerts/build_page.png)]] 5. On the stream query builder page, click on ''Query Node'' under ''Add Alert Node'', shown in the figure below. [[Image(http://groups.geni.net/geni/raw-attachment/wiki/GENIMonitoring/Alerts/add_query.png)]] 6. In the query node fill in the ''Query Name'' and ''Query String'' fields. The query name field should describe your query and the query string should be a valid EPL query, which uses the ''LogTick'' class. 7. Click on the left edge of your query node and connect your query node to the ''source node''. The source node is the source of LogTick events, based on raw polling metrics. An example query is shown in the figure below. [[Image(http://groups.geni.net/geni/raw-attachment/wiki/GENIMonitoring/Alerts/query.png)]] 8. You must now provide a destination for the query output. On the stream query builder page, click on ''Destination Node'' under ''Add Alert Node'', shown in the figure below. [[Image(http://groups.geni.net/geni/raw-attachment/wiki/GENIMonitoring/Alerts/add_query.png)]] 9. Using the dropdown box on your destination node select your query destination, then connect your destination node to your query node, much how you connected your query node to your source node. 10. Once a source, query and destination have been configured, as shown in the figure below, click on ''Submit Alert'' on the ''Alert Building Tools'' toolbar. [[Image(http://groups.geni.net/geni/raw-attachment/wiki/GENIMonitoring/Alerts/add_dest.png)]] == References == *Image from RabbitMQ tutorial [https://www.rabbitmq.com/tutorials/tutorial-three-python.html] **Image from Esper [http://www.espertech.com/]