Changes between Version 1 and Version 2 of GENI-Infrastructure-Portal/MonitoringReqts


Ignore:
Timestamp:
12/14/11 14:18:13 (10 years ago)
Author:
sedwards@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENI-Infrastructure-Portal/MonitoringReqts

    v1 v2  
    1111   * Would like to ensure:
    1212     * We don’t ignore any important pieces
    13      * Architecture decisions reflect needs of monitoring so that GENI Clearinghouse, I&M, etc serve our needs
     13     * Architecture decisions reflect needs of monitoring so that GENI
     14     Clearinghouse, Instrumentation & Measurement, etc serve our needs
    1415     * Where possible, we build tools which can be adapted to new software when it becomes available
    1516 * Therefore we need to answer the following questions:
     
    2122
    2223 * GENI Requires Monitoring
     24The following are some relevant requirements from the
     25[attachment:SysReqDoc/GENI-SE-SY-RQ-02.0.pdf  GENI System Requirements Document] (July 7, 2009)
     26
    2327   * 10.2-3 Visible operational status
    2428     * The GENI system shall make sufficient data available that researchers and maintainers will be able to evaluate the availability and operational status of the system.
     
    3337     * Examples: ProtoGENI, !PlanetLab, Orca
    3438   * Campuses
    35      * Which host & run aggregates
     39     * Which host & operate aggregates
    3640     * Which only host aggregates
    3741   * Backbone & Regional Networks
     
    4044   * Experimenters
    4145
    42  * Aggregates, Racks and the GENI CH
    43    * Each GENI rack is a SINGLE aggregate
     46 * Aggregates, Racks and the GENI Clearinghouse
     47   * Each GENI rack is a ''single'' aggregate
    4448     * Therefore requirements are the same as for aggregates
    4549   * Aggregates can outsource (some of) their responsibilities to the GENI Clearinghouse
     
    5054   * Management
    5155     * Act of fixing problems and responding to requests
    52    * What does monitoring & management involve?
    53     * Observe unexpected events
    54       * THEN fix what’s wrong
    55     * Observe expected events
    56       * THEN develop policy for fixing what’s wrong
    57       * THEN fix what’s wrong (by responding to monitoring)
    58     * Plan for the future
    59       * Monitor long-term trends in resource usage
    60       * THEN provision resources to meet forecasted needs
     56 * What does monitoring & management involve?
     57   * Observe unexpected events
     58     * THEN fix what’s wrong
     59   * Observe expected events
     60     * THEN develop policy for fixing what’s wrong
     61     * THEN fix what’s wrong (by responding to monitoring)
     62   * Plan for the future
     63     * Monitor long-term trends in resource usage
     64     * THEN provision resources to meet forecasted needs
    6165
    6266 * What makes GENI different?
     
    6670   * Interactions between groups are governed by GENI federation agreements (e.g. aggregate provider agreement) and mutual understanding.
    6771
    68  * Motherhood & Apple Pie
    69    * We are not covering:
    70      * Monitoring and management which fits entirely within the purview of aggregates, campuses, etc
    71    * For example, we will do (but not discuss here)
     72 * The requirements below do not cover:
     73   * Monitoring and management which fits entirely within the purview of aggregates, campuses, etc
     74   * For example, we will do (but not discuss in these requirements) the following items
    7275     * Keeping logs
    7376     * Obeying local laws and policy
    7477     * Answering the phone when someone has a concern
    75      * … and tie your shoes and everything else.
    76    * These things do NOT make GENI different
    77 == Top-level aspects of GENI Monitoring and Management ==
    78 
    79 These are the top-level aspects of GENI monitoring and management...
    80 
    81  * Top-level Aspects of GENI Monitoring
     78     * … and everything else.
     79   * The above items are not included in the requirements below because they do NOT make GENI different
     80== GENI Monitoring and Management Requirements Summary ==
     81
     82These are the summary GENI monitoring and management requirements...
     83
     84 A. GENI Monitoring Requirements
    8285    1. Information must be shareable
    83     1. Information must be collected
    84     1. Information must be available when needed
    85     1. Cross-GENI operational statistics collected and synthesized to indicate GENI as a whole is working 
    86     1. Preserve privacy of users (opt-in, experimenters, other users
     86    2. Information must be collected
     87    3. Information must be available when needed
     88    4. Cross-GENI operational statistics must be collected and synthesized to indicate GENI as a whole is working 
     89    5. Preserve privacy of users (opt-in, experimenters, other users
    8790    of resources)
    8891
    89  * Top-level Aspects of GENI Management
     92 B. GENI Management Requirements
    9093   1. For both debugging and security problems:
    91    1. Must be possible to escalate events
    92    1. Meta-operations and aggregate operators must work together to resolve problems in a timely manner
    93    1. Must be possible to do an emergency stop in case of a problem
    94    1. Orgs must manage GENI resources consistent with local policy and best practices
     94       1. Must be possible to escalate events
     95       2. Meta-operations and aggregate operators must work together to resolve problems in a timely manner
     96   2. Must be possible to do an emergency stop in case of a problem
     97   3. Organizations must manage GENI resources consistent with local policy and best practices
    9598       * e.g security procedures, logging, backups, etc
    96    1. Develop policies for monitoring
    97    1. All parties should implement agreed upon policies
    98    1. Security of GENI as a whole and its pieces
    99 
    100 == Breakdown of top level requirements ==
    101 Each of the top-level aspects of GENI monitoring and management,
     99   4. Develop policies for monitoring
     100   5. All parties should implement agreed upon policies
     101   6. Secure GENI as a whole and secure the pieces of GENI
     102
     103== Monitoring & Management Requirements Details ==
     104Each of the above GENI monitoring and management requirements,
    102105is broken down in more detail below.
    103106
     
    106109[[Color(orange,partially implemented according to some plan or there exists an unimplemented plan)]], or [[Color(red,not implemented and no plan)]].
    107110
    108  * Requirement: Cross-GENI Monitoring
     111 * A.4 Requirement: Cross-GENI Monitoring
    109112    * ''GENI monitoring is more than the sum of the monitoring at GENI’s parts. In order to know if GENI is working properly, additional monitoring is required beyond that done by each of its constituent pieces.''
    110     * Collect and synthesize additional operational statistics which indicate whether GENI is working
     113    * Must collect and synthesize additional operational statistics which indicate whether GENI is working
    111114       * e.g. meso-scale ping tests, topology
    112     * Collect cross-GENI stats
    113     * Make cross-GENI stats available when needed
    114 
    115  * Requirement: Privacy
     115    * Must collect cross-GENI stats
     116    * Must make cross-GENI stats available when needed
     117
     118 * A.5 Requirement: Privacy
    116119   * Preserve privacy of users (opt-in, experimenters, other users of resources)
    117120      * [[Color(red,TBD – This is an area needing major discussion)]]
    118121
    119  * Requirement: Troubleshooting & Event Escalation
     122 * B.1 Requirement: Troubleshooting & Event Escalation
    120123   * For both debugging and security problems:
    121       * Meta-operations and aggregate operators must work together to resolve problems
     124      * B.1.1 Must be possible to escalate events
     125      * B.1.2 Meta-operations and aggregate operators must work together to resolve problems
    122126        * Aggregates must advertise resources accurately
    123127           * (threshold) statically --> [[Color(green,Fill out aggregate page)]]
     
    126130        * Aggregates cooperate with meta-operations on the resolution of security events
    127131        * Aggregates cooperate with LLR on the resolution of security events
    128       * Must be possible to escalate events
    129 
    130 
    131  * Requirement: Emergency Stop
     132
     133
     134
     135 * B.2 Requirement: Emergency Stop
    132136   * Must be possible to do an emergency stop in case of a problem
    133137   * Must maintain POC information at meta-operations
     
    140144
    141145 * Requirement: Policy
    142    * Orgs must manage GENI resources consistent with local policy and best practices (e.g security procedures, logging, backups, etc)
     146   * B.3  Organizations must manage GENI resources consistent with local policy and best practices (e.g security procedures, logging, backups, etc)
    143147   * In general, follow local policy and procedures
    144148     * Follow best practices which if not followed would affect other members of the GENI community
    145      * Develop policies for monitoring
    146    * All parties should implement agreed upon policies
     149     * B.4 Develop policies for monitoring
     150   * B.5 All parties should implement agreed upon policies
    147151     * Follow Aggregate Provider Agreement
    148152     * Follow LLR
    149153     * Follow other GENI policies as they come into effect
    150154
    151  * Requirement: Security
    152    * Security of GENI as a whole and its pieces
     155 * B.6 Requirement: Security
     156   * Secure GENI as a whole and secure the pieces of GENI
    153157   * Two things we want to prevent:
    154158      * Compromise of GENI resources
     
    162166
    163167 * Requirement: Info must be shareable/collected/available
    164    * Information must be shareable
     168   * A.1 Information must be shareable
    165169     * Consistent definitions of data
    166170     * Consistent data exchange format
     
    169173     * The following benefit from shared common processes:
    170174       * Accessing data, finding data, visualizing data
    171    * Information must be collected
     175   * A.2 Information must be collected
    172176     * Verify continued successful data collection
    173177     * Debug collection and reliability outages
    174    * Information must be available when needed
     178   * A.3 Information must be available when needed
    175179     * Privacy of data must be maintained
    176180
     
    178182 * Data Definitions
    179183   * Consistent definition of data
    180       * Relational data
     184      * Relational data -- data which explains the relationship
     185      between entities and resources
    181186        * Resources (incl. connectivity)
    182187        * List of aggregates
     
    184189        * List of users
    185190        * Aggregate contact information
    186       * Timeseries data
     191      * Timeseries data -- data collected repeatedly at a regular interval
    187192        * Examples: Host and network statistics
    188       * Events
     193      * Event data -- data with information about a unique event
     194      occuring at a single point in time
    189195        * Examples: SNMP Traps
    190196
     
    204210
    205211 * General: Using Data
    206     * Sharing Data
    207        * --> [[Color(green,publish to central DB at GMOC)]]
    208        * --> [[Color(green,publish locally via webpage or local API)]]
    209        * --> [[Color(red,TBD: publish via a distributed mechanism)]]
    210     * Accessing, Finding and Visualizing Data
    211        * --> [[Color(green,GMOC Portals)]]
    212        * --> [[Color(green,GMOC SNAPP Interface (with search))]]
    213        * --> [[Color(green,GMOC data available to interested consumers via API)]]
    214        * --> [[Color(red,TBD: More to do here)]]
     212    * Sharing Data is currently done in three ways
     213       * Sharing Data --> [[Color(green,publish to central DB at GMOC)]]
     214       * Sharing Data  --> [[Color(green,publish locally via webpage or local API)]]
     215       * Sharing Data --> [[Color(red,TBD: publish via a distributed mechanism)]]
     216    * Accessing, Finding and Visualizing Data is currently done in four ways
     217       *  Accessing, Finding and Visualizing Data --> [[Color(green,GMOC Portals)]]
     218       *  Accessing, Finding and Visualizing Data --> [[Color(green,GMOC SNAPP Interface (with search))]]
     219       *  Accessing, Finding and Visualizing Data --> [[Color(green,GMOC data available to interested consumers via API)]]
     220       *  Accessing, Finding and Visualizing Data--> [[Color(red,TBD: More to do here)]]
    215221
    216222 * Other people who need data