Changes between Version 5 and Version 6 of MonitoringArch


Ignore:
Timestamp:
06/06/12 18:51:18 (12 years ago)
Author:
sedwards@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MonitoringArch

    v5 v6  
    385385both in the name of the data and the definition of said data.   For
    386386example, if interface counters from switches at different campuses are
    387 stored with different names it may be difficult to find and easily compare the traffic rates at various points in the network.  Likewise, if some interfaces report byte counters over an interval and others report current data rates then the comparison between these two items may not be accurate or simple.
     387stored with different names it may be difficult to find and easily compare the traffic rates at various points in the network.  Likewise, if some interfaces report byte counters over an interval and others report current data rates, then the comparison between these two items may not be accurate or simple.
    388388
    389389This means that data that is published to the GMOC database should use a naming system that is consistent across data submitters in two ways:
     
    399399In addition, it is possible for a member of GENI to submit interesting data about another part of GENI and therefore it is important for the name of the resources to be consistent in order for that data to be useful.  For example, meta-operations could run a test to see if each aggregate manager (AM) responds to a basic AM API call to determine that the AM is up and responding to requests.  In this case, meta-operations needs to identify the aggregate it is testing by the same name the aggregate uses to describe itself when submitting their own monitoring data.
    400400
    401 Where they are guaranteed to be unique, urns should be used to
     401Where they are guaranteed to be unique, URNs should be used to
    402402describe items like users and aggregates.  In some
    403 circumstances, urns are not unique and in these cases urns combined
     403circumstances, URNs are not unique, and in these cases URNs combined
    404404with another identifier should be used.  For example, AM API v3 will
    405 ensure that slices can be identified by the combination of a slice urn
     405ensure that slices can be identified by the combination of a slice URN
    406406and a UUID which together are globally unique.  Names defined by equipment manufacturers should be used where appropriate (MAC addresses, interface names, etc).  Finally if none of
    407407these identifiers are appropriate, a consistent naming scheme is chosen
     
    421421There are three types of data:
    422422    1. Some data CAN be shared publicly including:
    423        * existence of slice, its name, and its identifiers (urn, UUID), and
     423       * existence of slice, its name, and its identifiers (URN, UUID), and
    424424       * slice is active (has resources)
    425425    2. Some data CAN be shared IF the data is protected by access controls.
     
    430430
    431431
    432 GENI experimenters should be warned that their slice name is public.  Experimenter credentials may contain the user e-mail and this may be shared with aggregates they are using and GMOC, but should not be shared outside GMOC.  In addition, basic information about an experiment (eg nodes used, interface stats) will be publicly available as it may be needed to debug GENI-wide problems as well as issues with individual experiments.
     432GENI experimenters should be warned that their slice name is public.  Experimenter credentials may contain the user e-mail and this may be shared with aggregates they are using and with GMOC, but should not be shared outside GMOC.  In addition, basic information about an experiment (eg nodes used, interface stats) will be publicly available as it may be needed to debug GENI-wide problems as well as issues with individual experiments.
    433433
    434434Individual organizations should follow their own privacy policies in determining what to share. 
     
    460460==== Troubleshooting & Event Escalation ====
    461461
    462 In order to resolve both troubleshooting and security problems, meta-operations, aggregate operators, campuses, and experimenters must work together to resolve problems
     462In order to resolve both troubleshooting and security problems, meta-operations, aggregate operators, campuses, and experimenters must work together
    463463
    464464To facilitate this, aggregates must advertise their resources accurately.  At a minimum aggregates must advertise their resources on an [AvailableAggregates aggregate wiki page].  It is preferred that they advertise their resources via the [http://groups.geni.net/geni/wiki/GeniApi AM API].
     
    495495==== Security ====
    496496The role of security in monitoring and management is to prevent the compromise of GENI resources and to prevent the use of GENI resources to compromise other entities.
    497 Meanwhile, we should allow interesting research for which experimenters and operations may have to coordinate.
    498 
    499 GENI operators should follow best practices to hinder compromise and also detect and respond to compromise when it is detected.
     497Meanwhile, we should allow interesting research which may require experimenters and operations to coordinate.
     498
     499GENI operators should follow best practices to hinder compromise and also detect and respond to successful compromise.
    500500GENI experimenters should notify meta-operations if they are doing something which could be dangerous or cause an outage so that this can be tracked like other outages.
    501 Monitoring should not introduce security concerns (eg by overloading a resource), but may be used to monitor whether best practices which might affect others are being followed (eg detect software version are out of date).
     501Monitoring should not introduce security concerns (eg by overloading a resource), and may help to detect potential security issues (eg outdated software).
    502502
    503503=== Architectural Overview: Conclusion ===
     
    512512Data should be collected in a consistent manner taking care to ensure the privacy of data where appropriate. 
    513513
    514 Monitoring should be done in a manner consistent with GENI policies including maintaining point of contact information used in troubleshooting, emergency stop, and as part of other policies.
     514Monitoring should be done in a manner consistent with GENI policies including maintaining point of contact information used in troubleshooting and emergency stop.
    515515
    516516== Appendix: Actor, Interface and Data Details ==
     
    521521Each of the actors in the monitoring architecture described [#ActorOverview previously] are described below in more detail.
    522522
    523 The descriptions include a definition of the actor, the operator of that aggregate (where appropriate), a list of the interfaces used by this actor, the data items that it both generates and stores, and any services provided to others.
     523The descriptions include a definition of the actor, the operator of that actor (where appropriate), a list of the interfaces used by this actor, the data items that it both generates and stores, and any services provided to others.
    524524
    525525==== Meta-operations ====
     
    528528operations with a focus on inter-aggregate operations.  (Note that aggregates and campuses are responsible for operating their own resources.)
    529529
    530 '''Operator:''' GENI Meta-Operations Center (GMOC)
     530'''Operator:'''  Indiana University GrNOC (currently)
    531531
    532532'''Interfaces:'''
     
    576576
    577577==== I&M ====
    578 '''Definition:''' Instrumentation and measurement provides a place for members of the GENI community (especially experimenters) to archive data and make it available for others to use.
     578'''Definition:''' Instrumentation and measurement provides a set of services and APIs for members of the GENI community (especially experimenters) to archive data and make it available for others to use.  I&M services are defined by the I&M working group.
    579579
    580580'''Interfaces:'''
     
    587587'''Definition:''' Aggregates provide resources (e.g. network and compute
    588588resources) to the broader GENI community.  Examples include:
    589 ProtoGENI, !PlanetLab, and !OpenFlow.
     589ProtoGENI, !PlanetLab, and !OpenFlow.  (Aggregates may be distributed over widely separated physical locations and operated by different organizations, unlike campuses.)
    590590
    591591Note that in particular, each GENI rack (such as ExoGENI and InstaGENI) is an aggregate.
     
    611611
    612612==== Campus ====
    613 '''Definition:''' Equipment on a campus which hosts GENI resources, aggregates, or parts of aggregates.  Some campuses also operate aggregates in addition to hosting them.  Campuses are usually colleges or univeresities, but they may be a city, business, or other administrative entity that owns resources.
     613'''Definition:''' Organization and equipment on a campus that hosts GENI resources, aggregates, or parts of aggregates.  Some campuses also operate aggregates in addition to hosting them.  Campuses are usually colleges or univeresities, but they may be a city, business, or other administrative entity that owns resources.
    614614
    615615'''Operator:''' Campus IT
     
    712712     * [#GENI-widedata GENI-wide data]
    713713
    714  * '''Data format:''' Currently this is done by requesting the [http://gmoc-db.grnoc.iu.edu/web-services/gen_api.pl previous 10 minutes of time series data] via HTTP.
    715 
    716714 * '''Frequency of update:''' on demand
    717715
    718  * '''Purpose:''' Allows any entity to pull monitoring data from the meta-operations database.
     716 * '''Purpose:''' Allows any entity to pull monitoring data from the meta-operations [http://gmoc.grnoc.iu.edu/gmoc/index/gmoc-live-db.html database].
    719717
    720718==== Publish data (into the GMOC DB) ====
     
    862860===== Interface: Anyone -> Campus/Aggregates =====
    863861 * '''Description:'''
    864        * Point-of-contact (POC) information provides a way to reach campuses and aggregates via e-mail or telephone.
     862       * Point-of-contact (POC) information provides a way to reach campuses and aggregates via e-mail or telephone.  This information is not public, but is made available to those with appropriate permissions.
    865863
    866864 * '''Data:'''
     
    872870===== Interface: Campus -> Regionals =====
    873871 * '''Description:'''
    874        * Campuses are responsible for communicating with their regional
    875        networks, which aren't part of GENI.
     872       * Campuses are responsible for communicating with regional
     873       networks, if they aren't part of GENI.
    876874
    877875===== Interface: Various operators -> Backbone Networks =====
     
    908906 * geographical location
    909907
     908Could also upload:
     909 * campus equipment status
     910
     911In addition, logs should be kept and made available on request.
     912
     913
    910914==== L2 Network Resources Data ====
    911915For each L2 network resource:
     
    918922Given a slice owned by a GENI experimenter:
    919923 * Find resources on slice.
    920  * Find state of resources on each sliver (active/down)? Plus historical versions of this info.
     924 * Find state of resources on each sliver (active/down). Plus historical versions of this info.
    921925 * What is the utilization of each sliver resource (as appropriate for its type: active processes, disk space used, flowspace rule count, bandwidth). Plus historical versions of this info.
    922926