Changes between Initial Version and Version 1 of MonitoringArch


Ignore:
Timestamp:
06/06/12 17:28:51 (12 years ago)
Author:
sedwards@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MonitoringArch

    v1 v1  
     1[[PageOutline]]
     2
     3= Monitoring Architecture =
     4
     5Version 1.2 June 6, 2012
     6
     7== Document Overview ==
     8This monitoring architecture document addresses operational monitoring in the context of GENI.
     9
     10The members of the GENI federation are highly dependent on each other.  Operational monitoring largely consists of cooperatively resolving problems, and this architecture describes the data and interfaces needed for the various participants in GENI to share the information required to solve these problems.
     11
     12The document is broken into four major sections:
     13 * The first section outlines the [#PurposeScope purpose and scope] of this document and defines monitoring.
     14 * In the second section, seven [#UseCases use cases] motivate the subsequent discussion of the monitoring architecture by describing ways it can be used by various members of the GENI community.
     15 * In the third section, the GENI monitoring [#ArchitecturalOverview architecture] describes actors who submit and query monitoring data using standard interfaces.  In addition, a taxonomy of monitoring data and some principles for collecting that data accurately and ensuring privacy are defined.  And finally, there is a discussion of how monitoring relates to policies about troubleshooting and event escalation.
     16 * The fourth section is an [#Appendix:ActorInterfaceandDataDetails appendix] containing detailed descriptions of the actors, interfaces, and data.
     17
     18
     19== Purpose & Scope ==
     20The ''purpose'' of this document is to define a common understanding of GENI monitoring and management and how it relates to other parts of GENI.  It is intended to be understandable to both operations and non-operations members of the GENI community.
     21
     22The ''scope'' of the document is monitoring and management as it applies to GENI as a whole and not the standard monitoring done inside aggregates, campuses, and regional and backbone networks.  This meta-operations approach to monitoring and management is described in the [http://groups.geni.net/geni/attachment/wiki/GENIMetaOps/GENI-Concept-of-Operations-v2.1.doc GMOC Concept of Operations] document.
     23
     24=== Overview of Operational Monitoring Architecture ===
     25
     26The operational monitoring architecture describes operational monitoring use cases which motivate the document and then describes the architecture including various actors, monitoring interfaces, and data collected.
     27
     28There are different ways of collecting data in GENI and the operational monitoring architecture describes one way of doing so (others include the future GENI Clearinghouse and Instrumentation and Measurement (I&M) infrastructure). In general, there is an authoritative party which is responsible for making each piece of data available. However that data can be shared or cached with other entities that need it. Likewise, there may be several portals or interfaces for accessing and viewing the data, but each portal should retrieve data from it's authoritative source.
     29
     30=== Path Forward ===
     31
     32This document describes the desired end state for operational monitoring to be acheived in Spiral 5. This section explains the current state of these implementations, as of spring 2012, and the path towards that end state.
     33
     34The GENI Clearinghouse is under development in Spiral 4.  Until the GENI Clearinghouse becomes operational, the GPO ProtoGENI Slice Authority (SA) will be monitored and will provide the authoritative user and slice data needed in Spiral 4.  When the GENI Clearinghouse becomes operational, it will take over reporting authoritative user and slice data while starting to report authoritative project data.
     35
     36Slice e-mail is currently supported via standalone scripts that can be run against the various SAs.  When the GENI Clearinghouse comes online, it will provide slice e-mail access.
     37
     38Point of contact and geographical location information are currently collected and maintained by GMOC and stored in the GMOC database.  This supports the expected out-of-band contact among various GENI actors.
     39
     40As per the monitoring architecture, GMOC currently provides a ticketing system and a calendar system for tracking outages.
     41
     42Aggregates and campuses in the GENI meso-scale currently report data to GMOC using current interfaces.  This will continue into the future.  As the monitoring interfaces are refined, those deployments will be updated.  As new aggregates come online, they will implement these interfaces.
     43
     44Currently no experimenters submit data to meta-operations (GMOC), but we will work to do this with appropriate interested parties as the need arises.
     45
     46Instumentation and Monitoring projects are under development in Spiral 4.  As they come online they will become consumers of operational monitoring data.
     47
     48The LLR representative will need remote priviledged access to both GMOC and GENI Clearinghouse data.  This mechanism will be worked out in Spiral 5.
     49
     50The GMOC data interface is in a variety of stages of implementation.  The time series interface was implemented and used extensively in Spiral 3.  The relational interface is being expanded and deployed in Spiral 4.  There are no plans to implement the event interface earlier than Spiral 5.
     51
     52
     53
     54
     55=== What is monitoring & management? ===
     56''Monitoring'' is the act of collecting data and measuring what is
     57happening.
     58
     59''Management'' is the act of tracking and fixing problems and responding to
     60requests.
     61
     62Monitoring and management involve three tasks:
     63        1. Observe expected events THEN fix what’s wrong
     64        2. Observe unexpected events THEN write procedures for resolving these problems THEN fix what’s wrong (by responding to monitoring)
     65        3. Plan for the future: Monitor long-term trends in resource
     66        usage THEN provision resources to meet forecasted needs
     67
     68The purpose of GENI monitoring and management is to share information about GENI current operational status and to increase the amount of time that GENI resources are available and usable by legitimate members of the GENI community.
     69
     70''GENI monitoring and management'' differs from monitoring and management of other entities. GENI is a set of federated entities managed by different institutions with different policies.  The people and information needed to troubleshoot and resolve problems are spread across several physical locations.  The users (end users and experimenters), managers and owners of a given piece of equipment may all be different. Finally, interactions between groups are governed by GENI federation agreements (e.g. aggregate provider agreement) and mutual understanding. Since no single party owns all of the parts of the GENI federation, there is a need to coordinate  resolution to problems or outages that may occur between any two parts of the federation.  This coordination is frequently done via out-of-band communication (phone, email, etc) between the necessary parties.
     71
     72The meta-operations model assumes that the management of the aggregates, campuses, regionals and core networks which compose GENI is within each entity's own purview.  As such this document does not cover any items (like log rotation and following local laws) solely related to the internal monitoring or management of those entities.
     73
     74
     75== Use Cases ==
     76The following are a representative set of monitoring use cases. The use cases are not expected to be exhaustive, but they represent what is expected to be used in the Spiral 3-4 time frame. They are intended to show the range of parties which will rely on monitoring and the interfaces and data they need to execute those use cases.  In short, these use cases represent the minimum that the architecture must satisfy and provide a sense of how the monitoring architecture will be used.
     77
     78Each use case includes a brief description of the use case, a list of data that must be collected for the use case, and a list of parties interested in that data.
     79
     80=== Use Case 1: AM Availability ===
     81Use Case
     82 * Experimenter:
     83    * Creates a slice & slivers at multiple aggregates
     84    * Gets an error using aggregate A
     85    * Sends the error to a GENI experimenter help mailing list
     86 * Experiment support staffers:
     87    * Want to verify the current and recent health of aggregate A
     88    * Is there an obvious problem which explains what the experimenter is seeing?
     89    * Is there a scheduled outage or already existing event that could affect this AM?
     90
     91Data to collect
     92 * For each GENI aggregate:
     93   * Is it speaking the AM API (up) right now?
     94   * For each resource advertised by that aggregate, is it currently: available, in use, down/missing/unknown?
     95
     96Interested parties
     97 * Who needs to query this: Everyone might be interested in seeing the health of a GENI resource.
     98 * Who needs to alert on this: Meta-operators. Site operators. Experimenter help providers.
     99
     100=== Use Case 2: Availability/utilization
     101of an aggregate over time? ===
     102Use Case
     103 * After GEC14, GPO assesses how heavily utilized bare-metal compute resources were between GEC13 and GEC14
     104 * GPO determines if there is generally a shortage of these resources.
     105
     106Data to collect
     107 * For each GENI aggregate that supports bare metal compute resources, given a custom time period:
     108  * When was the aggregate speaking the AM API (up) during the time period?
     109  * During the time period, how many bare metal resources were: available, in use, down/missing/unknown?
     110  * During the time period, how many slivers that included bare metal resources were active/created at some point during the period? 
     111 * (If possible) During the time period:
     112   * How many distinct GENI users had active slivers on the aggregate?
     113   * How many distinct GENI users created slivers on the aggregate?
     114
     115Interested parties
     116 * Who needs to query this: GPO, prospective experimenters, site operators, meta-operators.
     117
     118=== Use Case 3: Status of
     119inter-site shared network resources ===
     120Use Case
     121 * A core OpenFlow switch located at NLR in Denver starts misbehaving.
     122 * Its control plane is operating normally, but the data plane appears to drops all traffic on VLAN 3716 (a core shared VLAN).
     123 * This leads to intermittent failures in experiments (traffic through Denver is dropped, traffic along other paths is fine).
     124 * GMOC, GPO, NLR operators and affected site operators gather on IRC to track down the location of the problem, using information about which sites and VLANs have fallen offline.
     125
     126Data to collect
     127 * For each shared GENI L2 network resource, in particular end-to-end paths between GENI resources:
     128   * Is the network resource reachable (from one or more central locations on the network)?
     129   * What is the utilization of the network resource at the site (bandwidth/packets sent/received, breakdown by type e.g. to detect excessive broadcasts)?
     130 * For each pool of L2 network resources reservable between sites?
     131   * How much of the pool is available/in use/not available?
     132   * Is it possible to allocate and use an inter-site network resource right now (end-to-end test)?
     133
     134Interested parties
     135 * Who needs to query this: site operators, meta-operators, GPO. It may be useful to active and prospective experimenters.
     136 * Who needs to alert on this: meta-operators and possibly site operators.
     137
     138=== Use Case 4: Availability and utilization
     139of inter-site network resources over time ===
     140Use Case
     141 * GPO is attempting to learn about per experiment bandwidth utilization over the past year to find out whether GENI experimenters tend to run numerous low-bandwidth experiments, or a smaller number of high-bandwidth ones.
     142
     143Data to collect
     144 * For each item in the previous use case, track the state over time.
     145
     146Interested parties
     147 * Who needs to query this: site IT, aggregate owners, meta-operators, GPO.
     148 * Who needs to alert on this: no one.
     149
     150=== Use Case 5:
     151What is the state of my slice? ===
     152Use Case
     153 * An experimenter has a slice with slivers on many GENI aggregates.
     154 * The experimenter wants to go to one place and get a consistent view of the health and general activity level of his sliver resources.
     155
     156Data to collect
     157 * Given a slice owned by a GENI experimenter:
     158   * What mesoscale slivers are defined on that slice?
     159   * What is the state of each resource on each sliver (active, down), both now and over the course of the experiment?
     160   * What is the utilization of each sliver resource (as appropriate for its type: active processes, disk space used, flowspace rule count, bandwidth), both now and over the course of the experiment?
     161
     162Interested parties
     163 * Who needs to query this: active experimenters.
     164 * Who needs to alert on this: perhaps experimenters running services in their slice
     165
     166=== Use Case 6: State of
     167aggregate utilization at my campus ===
     168Use Case
     169 * An aggregate operator recevies a complaint from campus IT about heavy traffic over the campus control network which seems to be originating from the aggregate operator's lab.
     170 * The aggregate operator wants to quickly determine which if any slivers might be responsible for the traffic.
     171 * If their aggregate does not appear to be causing trouble, Campus IT would like to be able to have some evidence to demonstrate that the slivers on the aggregate are currently sending a typical amount of traffic.
     172
     173Data to collect
     174 * Given an aggregate:
     175   * What mesoscale slivers are defined on that aggregate?
     176   * What users have active resources on the aggregate right now?
     177   * What is the state of each resource on the aggregate (active, down, in use (by whom)), both now and over the recent past?
     178   * What is the utilization of each aggregate resource (as appropriate for its type: active processes, disk space used, flowspace rule count, bandwidth), both now and over the recent past?
     179
     180Interested parties
     181 * Who needs to query this: site operators, site IT, meta-operators: site operators need to be able to go one place to answer questions about current activity on the aggregate, in order to be able to answer questions about broken or misbehaving slivers.
     182 * Who needs to alert on this: site operators, meta-operators.
     183
     184
     185=== Use Case 7: Security complaint from an ISP ===
     186Use Case
     187 * An ISP contacts GENI about concerns that a GENI resource is exhibiting potentially illegal behavior.
     188 * The GENI LLR representative works with GENI Meta-operations as well as the operators of relevant campuses, aggregates, and networks to determine the resource that is the source of the traffic.
     189 * Meta-operations determines what slice is responsible for the traffic from that resource.
     190 * The LLR contacts the owner of the responsible slice (eg using slice e-mail) and determines the nature of the experiment.  LLR learns that the experimenter is running a novel protocol which exhibits properties which appear similar to those of some illegal services.
     191 * The LLR contacts the original ISP explaining the experiment thereby resolving the problem.
     192
     193Data to Collect
     194 * Points of contacts for various groups which are members of the GENI federation
     195 * Which resources are allocated to which slice.
     196 * Which IP address maps to which resource.
     197
     198Services Needed
     199 * An pseudonymous way of contacting the slice owner (ie slice e-mail)
     200
     201Interested Parties
     202 * Who needs to query this: Meta-operations and other operators.
     203 * Who needs to alert on this: None.  Done on demand.
     204
     205=== Use Case: Conclusion ===
     206These seven use cases represent the scope of how  monitoring  will be used in practice and provide context for the monitoring architecture described in the remainder of this document.
     207
     208== Architectural Overview ==
     209[[Image(MonitoringArchv7.jpg, 70%)]]
     210
     211The GENI federation includes and interacts with a variety of entities (or actors) which use a small number of standard interfaces to collect and share monitoring and management data.  The above picture is a guide to the actors and interfaces that the rest of this document describes.  The picture shows how these various entities relate to each other in the context of monitoring.
     212
     213Meta-operations, the Clearinghouse, and I&M are GENI-wide resources.   Meta-operations collects and makes available operational and GENI-wide monitoring data.  The future GENI Clearinghouse is the authoritative repository for project, user, and slice information.  I&M is the authoritative repository for experiment data.
     214
     215The blue arrows indicate that aggregates (which contain and operate resources), campuses (which host resources), and experimenters (who use resources and may generate monitoring information) can submit data to meta-operations.
     216
     217The orange arrows indicate that anyone can query data from meta-operations.
     218
     219The "out-of-band" communications box shows that experimenters, aggregate operators and campus operators can be contacted out-of-band to help resolve problems.
     220
     221
     222In addition, GENI monitoring relies on other non-monitoring interfaces. The red arrow shows experimenters reserving resources via the AM API. The black arrows show how authoritative data about slices, projects, and users from the future GENI Clearinghouse is accessed (via interfaces defined by the GENI Architecture group). The green arrow shows the I&M interfaces to support storing experimental data. The dotted arrows indicate that problems between Campuses and their Regional Networks can be resolved via out-of-band access.  Likewise, GrNOC manages Internet2 and CENIC manages NLR.
     223
     224The next three sub-sections describe the [#ActorOverview actors], [#InterfaceOverview interfaces] and [#DataOverview data] in more detail.  The three sub-sections after that clarify the [#RelationshipstootherpartsofGENI relationship] with other parts of GENI, describe some principles used to collect and report [#MonitoringData data] used in monitoring, and finally a section related to monitoring and management [#Policy policy].
     225
     226=== Actor Overview ===
     227Actors, shown as boxes in the [#ArchitecturalOverview figure], cooperatively share monitoring data within the GENI federation (and outside the federation where appropriate):
     228 * [#Meta-operations Meta-operations] (a.k.a. GMOC) will in the future coordinate GENI-wide operations especially inter-aggregate operations.  Note that aggregates and campuses are responsible for operating their own resources.
     229 * The [#Clearinghouse clearinghouse] provides a set of services and APIs for the GENI federation and is the authoritative source for information about projects, slices, and users.
     230 * [#IM I&M] provides a set of services and APIs for the storage and sharing of data (especially experimenter data).
     231 * [#Aggregate Aggregates] provide resources (e.g. network and compute resources) to the broader GENI community.  Examples include: ProtoGENI, !PlanetLab, ExoGENI, and InstaGENI.  The software which runs an aggregate is known as the Aggregate Manager (AM).
     232 * [#Campus Campuses] host aggregates (or parts of aggregates).  Some campuses also operate aggregates in addition to hosting them.  Campuses are usually colleges or univeresities, but they may be a city, business, or other administrative entity that owns resources.
     233 * [#OtherInfrastructure Backbone & Regional Networks] are infrastructure that GENI relies on.  Some networks (e.g. Internet2 and NLR) participate in GENI. Others (e.g. some regionals) carry GENI traffic but do not participate in GENI directly.
     234 * [#Experimenter Experimenters] run GENI experiments.
     235 * [#LegalLawEnforcementandRegulatoryLLRRepresentative Legal, Law Enforcement, and Regulatory (LLR) Representative] responds to and helps resolve legal, law enforcement and regulatory inquires made to GENI.
     236
     237
     238Note: The Clearinghouse provides a service which aggregates can use to report information
     239about slices, slivers, and resources to the GENI
     240Clearinghouse.  This may not be required for all GENI aggregates, but
     241for simplicity this document assumes this service is being used.
     242If it is not, the same information would need to be obtained from
     243aggregates directly.
     244
     245=== Interface Overview ===
     246
     247Each of the arrows in the [#ArchitecturalOverview figure] represent an interface for sharing data in GENI:
     248 * Query data (out of GMOC DB)
     249   * [#Interface:Anyone-Meta-operations Anyone <-> Meta-operations]
     250     * Anyone can query data ''from'' the meta-operations database on demand
     251 * Publish data (into the GMOC DB)
     252   * [#Interface:AggregateCampusExperimenterAnyone-Meta-operations {Aggregate, Campus, Experimenter, Anyone} <-> Meta-operations]
     253     * Aggregates periodically push aggregate data ''into'' the meta-operations database.
     254     * Alternatively, meta-operations can periodically pull the same data from the aggregates.
     255     * Campuses and experimenters are allowed (but not required) to publish data the same way aggregates do.
     256     * Anyone can collect GENI-wide data and store it in the meta-operations database.
     257     * Aggregates and Campuses provide Point of Contact and geographic location to GMOC
     258 * Publish/Query Clearinghouse
     259   * [#Interface:Clearinghouse--Meta-operations Clearinghouse --> Meta-operations]
     260      * The clearinghouse provides authoritative data about the existence and names of GENI slices, users, and projects including creation and deletion of the same.
     261   * [#Interface:Experimenters-Clearinghouse Experimenters <-> Clearinghouse]
     262     * Experimenters register their user accounts at the GENI clearinghouse.
     263     * The clearinghouse provides the experimenter authoritative information about
     264     their slices.
     265   * [#Interface:AggregateManager-Clearinghouse Aggregate Manager <-> Clearinghouse]
     266     * Aggregate Managers provide authoritative sliver and node information to the
     267     clearinghouse.
     268     * Aggregate Managers can query the clearinghouse for authoritative information about their users and their users' slices.
     269   * [#Interface:Campus-Clearinghouse Campus <-> Clearinghouse]
     270     * Campuses can query the clearinghouse for authoritative information about their users and their users' slices.
     271 * Publish/Query I&M
     272   * [#Interface:Experimenter-IM Experimenter <-> I&M]
     273     * Experimenters (and others) use the I&M framework to publish and share data such as experimental results.
     274 * AM API
     275   * [#Interface:AggregateManager-Experimenter Aggregate Manager <-> Experimenter]
     276     * Experimenters use the Aggregate Manager API (AM API) to reserve resources at the aggregate manager.
     277 * out-of-band
     278    * ''Note: Members of the GENI federation are required to assist the LLR in resolving security incidents.  The LLR uses many of the following out-of-band interfaces when resolving problems.''
     279    * [#Interface:Anyone-Experimenter Anyone -> Experimenter]
     280       * Anyone can talk to the experimenter via the slice e-mail address.
     281    * [#Interface:Anyone-CampusAggregates Anyone -> Campus/Aggregates]
     282       * Point-of-contact (POC) information provides a way to reach
     283       campuses and aggregates
     284    * [#Interface:Campus-Regionals Campus -> Regionals]
     285       * Campuses are responsible for communicating with their
     286       regional networks.
     287    * [#Interface:Variousoperators-BackboneNetworks GMOC -> Backbone Networks]
     288       * GrNOC manages Internet2 and CENIC manages NLR.
     289
     290=== Data Overview ===
     291There are four types of monitoring data collected and stored at meta-operations:
     292   * [#Aggregatedata Aggregate data generated by aggregates]
     293   * [#Campusdata Campus data generated by campuses]
     294   * [#Slicedata Slice data generated by aggregates]
     295   * [#L2NetworkResourcesData Data generated by campuses]
     296   * [#GENI-widedata GENI-wide data]
     297
     298In addition, the GENI Clearinghouse collects and stores data:
     299   * includes [#ClearinghouseData slice, user, and project] information
     300
     301And finally, the I&M infrastructure collects and stores [#IMData data].
     302
     303Information about which resources are reserved in which slices may optionally be reported by the aggregate to the clearinghouse.  In the absence of this reporting aggregates should report this information to meta-operations directly.
     304
     305=== Relationships to other parts of GENI ===
     306GENI operational monitoring is distinct from but relates to the other parts of GENI including the GENI Clearinghouse and Instrumentation & Measurement.
     307
     308==== Relationship between Meta-operations and Clearinghouse ====
     309The future GENI Clearinghouse provides a set of services that are key to the
     310running of the GENI federation such as a slice authority, an entity
     311registry, and a logging service.  The future GENI clearinghouse is the authoritative
     312source of information about slices, users and projects, entities which are key to the GENI
     313federation.  The [GeniArchitectTeam architecture group] has a GENI architecture which defines the clearinghouse services which will provide access to this data. In addition, the GENI Clearinghouse may provide access to
     314other information for which it is not the
     315authoritative source (such as events noting the
     316creation/deletion of slivers) and in the meantime aggregates should report this information directly.
     317
     318GENI Meta-operations is responsible for operating the
     319GENI federation as a whole (although it is not responsible for
     320operating the individual aggregates, campuses, regional or backbone
     321networks).  GENI Meta-operations is the authoritative source for
     322GENI-wide monitoring data (e.g. end-to-end
     323connectivity across the GENI backbone). In addition,
     324meta-operations is the authoritative source of monitoring data provided by
     325aggregates, campuses and experimenters (when appropriate).  For
     326example, this includes current and historical statistics about nodes (operational
     327state, memory/disk usage) and networks (operational state, data
     328rate).  Also, GENI Meta-operations correlates information from the
     329Clearinghouse (eg slice creation and deletion times) when it
     330helps clarify monitoring data.
     331
     332
     333The monitored data accessible from meta-operations is invariably collected by measuring the
     334state of the world (for example, by polling a node periodically to see
     335if it is operational) whereas the data available via the GENI
     336Clearinghouse is based on transitions (for example, an
     337aggregate informs the GENI Clearinghouse about the creation of a sliver at it's aggregate for
     338which it definitively knows the resource allocated, the creator, and the
     339creation time). 
     340
     341==== Relationship between Monitoring and Instrumentation & Measurement ====
     342
     343Instrumentation and Measurement (I&M) collects data of use to
     344experimenters, especially experiment data. I&M can pull operational data of interest from the GMOC database.
     345
     346=== Monitoring Data ===
     347GENI monitoring data can be broken down into a small number of types of data which must be collected in a consistent manner so that the data can be shared and meaningfully used by various parties throughout GENI. Meta-operations must take reasonable steps to protect the privacy of some data (e.g. user data that identifies individuals) in collection, storage, and publishing functions.
     348
     349==== Types of Data ====
     350There are multiple ways of describing the data used in monitoring.
     351
     352The structure of the data falls in one of three categories: time
     353series data, relational data, and event data.
     354
     355''Time series'' data is a particular piece of numeric data measured over time.  An
     356example of time series data is the number of bytes sent on an interface over a series of adjacent 30 second time periods.
     357
     358''Relational'' data conveys entities and the relationship between
     359them. An example of relational data is that "slice S contains resource
     360R".
     361
     362''Event'' data denotes something happening or being noticed at a
     363particular moment in time.  An example of an event is "slice S was created at time T" or "the data rate on interface I exceeded threshold
     364H at time T".
     365
     366The accuracy of the data can be described as either: transition-based or sampled.
     367
     368''transition-based'' data, like that made available by the GENI Clearinghouse, is data based on the knowledge of the entities being created or destroy.  An example of definitive data is "slice S was created at time T by user U".
     369
     370''Sampled'' data, like that made available by GENI Meta-operations, is collected by measuring the state of the world periodically.  An example of sample data is the availability of an interface over time.
     371
     372
     373In addition, data has an authoritative source such as the GENI clearinghouse, meta-operations, or I&M.
     374
     375The ''GENI clearinghouse'' is the authoritative source for information about GENI primitives like slices, projects, and users and event information about the creation and deletion of those same primitives.
     376
     377''Meta-operations'' is the authoritative source for operationally critical data from various aggregates and campuses.  Examples of data stored in meta-operations is polled cpu usage and interface statistics on GENI compute resources.
     378
     379''Instrumentation & Measurement'' is the authoritative source for data collected in the course of individual experiments.
     380
     381
     382==== Consistent Naming and Definitions of Data ====
     383
     384In order for monitoring data to be useful in the shared environment of
     385GENI, data in the GMOC database needs to have a consistent meaning
     386both in the name of the data and the definition of said data.   For
     387example, if interface counters from switches at different campuses are
     388stored with different names it may be difficult to find and easily compare the traffic rates at various points in the network.  Likewise, if some interfaces report byte counters over an interval and others report current data rates then the comparison between these two items may not be accurate or simple.
     389
     390This means that data that is published to the GMOC database should use a naming system that is consistent across data submitters in two ways:
     391  1. the names of the data reported and stored in the database should be consistent regardless of who reports the data, and
     392  2. the definition of the data reported should be consistent regardless of who reports the data.
     393
     394==== Consistent Naming of GENI Resources and Devices ====
     395
     396Many GENI resources may need to be referenced by multiple parties, therefore the names used to identify them need to be consistent regardless of who is submitting the data.
     397
     398For example, two aggregates might need to refer to the same switch or two endpoints of the same network link. 
     399
     400In addition, it is possible for a member of GENI to submit interesting data about another part of GENI and therefore it is important for the name of the resources to be consistent in order for that data to be useful.  For example, meta-operations could run a test to see if each aggregate manager (AM) responds to a basic AM API call to determine that the AM is up and responding to requests.  In this case, meta-operations needs to identify the aggregate it is testing by the same name the aggregate uses to describe itself when submitting their own monitoring data.
     401
     402Where they are guaranteed to be unique, urns should be used to
     403describe items like users and aggregates.  In some
     404circumstances, urns are not unique and in these cases urns combined
     405with another identifier should be used.  For example, AM API v3 will
     406ensure that slices can be identified by the combination of a slice urn
     407and a UUID which together are globally unique.  Names defined by equipment manufacturers should be used where appropriate (MAC addresses, interface names, etc).  Finally if none of
     408these identifiers are appropriate, a consistent naming scheme is chosen
     409leveraging naming schemes defined elsewhere in GENI (such as RSpecs
     410and I&M data definitions) where possible.
     411
     412
     413==== Privacy ====
     414Monitoring data may potentially contain private information which could be used to:
     415  * identify an experimenter
     416  * identify important aspects of experiments (such as experiment topology)
     417
     418On the other hand, because GENI is inherently a shared resource, monitoring information must be shared.  For example, knowing the CPU load on a machine hosting many virtual machines could help ascertain the cause of problems on multiple experiments.
     419
     420Therefore there is an inherent tension between keeping information private and making it public.  We choose to divide data into three categories and clearly inform all parties (meta-operations personnel, aggregate and campus operators, experimenters) about the privacy of particular types of data.   It is the responsibility of all of these parties to work to use and share data appropriately.
     421         
     422There are three types of data:
     423    1. Some data CAN be shared publicly including:
     424       * existence of slice, its name, and its identifiers (urn, UUID), and
     425       * slice is active (has resources)
     426    2. Some data CAN be shared IF the data is protected by access controls.
     427    3. Some data should NOT be shared publicly including:
     428       * opt-in user data
     429       * data that identifies experimenters by username/real name
     430       * data that identifies experimenter contact info.
     431
     432
     433GENI experimenters should be warned that their slice name is public.  Experimenter credentials may contain the user e-mail and this may be shared with aggregates they are using and GMOC, but should not be shared outside GMOC.  In addition, basic information about an experiment (eg nodes used, interface stats) will be publicly available as it may be needed to debug GENI-wide problems as well as issues with individual experiments.
     434
     435Individual organizations should follow their own privacy policies in determining what to share. 
     436
     437Finally, the LLR's access to data is controlled by the [GpoDoc#LegalLawEnforcementandRegulatoryPlan LLR policy] which supersedes anything written here. 
     438
     439==== Accountability Report ====
     440Meta-operations should make available a summary of statistics about
     441each aggregate and campuses to aid local operators in determining
     442whether a problem might be originating at their aggregate/campus or elsewhere.
     443
     444The accountability report should include the data listed in [#UseCase6:Stateofaggregateutilizationatmycampus Use Case 6: State of Aggregate Utilization at My Campus].
     445
     446
     447=== Policy ===
     448
     449GENI policy exists to increase the availability of resources for experimenters. Being able to communicate with various parts of GENI to resolve technical and security problems increases the availability of resources for everyone.  The monitoring infrastructure must be consistent with and support GENI policy.  In particular, slice e-mail and aggregate and campus points of contact make it possible to troubleshoot problems and handle emergency situations.
     450
     451This section goes over various policies and how monitoring related to them.
     452
     453
     454==== Verification of Data Collection ====
     455
     456In addition to data being collected with a consistent
     457[#ConsistentNamingandDefinitionsofData name and meaning] as well as
     458through a consistent mechanism, that the data continues to be successfully collected should regularly be verified and outages should be promptly debugged. 
     459
     460
     461==== Troubleshooting & Event Escalation ====
     462
     463In order to resolve both troubleshooting and security problems, meta-operations, aggregate operators, campuses, and experimenters must work together to resolve problems. 
     464
     465To facilitate this, aggregates must advertise their resources accurately.  At a minimum aggregates must advertise their resources on an [AvailableAggregates aggregate wiki page].  It is preferred that they advertise their resources via the [http://groups.geni.net/geni/wiki/GeniApi AM API].
     466
     467Aggregates and campuses should keep their [#Interface:Anyone-CampusAggregates point of contact] information up-to-date.
     468
     469Experimenters should respond to emails sent to their [#Interface:Anyone-Experimenter slice e-mail address].
     470
     471All parties should cooperate with meta-operations and the LLR on the resolution of security events in a manner consistent with policies defined elsewhere.
     472
     473==== Emergency Stop ====
     474On occasion, meta-operations may identify a problem which may require
     475one or more experiments to be stopped. 
     476
     477In the event of a issue, meta-operations may contact:
     478 1. aggregates and campuses via their [#Interface:Anyone-CampusAggregates Point of Contact] information, and
     479 2. experimenters via [#Interface:Anyone-Experimenter slice e-mail].
     480
     481As a last resort, meta-operations may shutdown resources on one or
     482more slices via a call to the AM API or disconnect aggregates or campuses which seem to be the source of the problem. 
     483
     484The [GpoDoc#EmergencyStopProcedure Emergency Stop] document describes this procedure and is updated periodically.
     485
     486==== Policy ====
     487In addition to following local policy, members of the GENI community should follow policies that relate to them such as:
     488  * the [GpoDoc#GENIAggregateProvidersAgreement Aggregate Provider Agreement],
     489  * the [GpoDoc#LegalLawEnforcementandRegulatoryPlan LLR],
     490  * all [#Privacy privacy] policies,
     491  * ethical experimentation policies, and
     492  * other policies as they come into effect.
     493
     494Organizations should follow best practices (eg for security, logging, and backups) which if not followed would affect other members of the GENI community.
     495
     496==== Security ====
     497The role of security in monitoring and management is to prevent the compromise of GENI resources and to prevent the use of GENI resources to compromise other entities.
     498Meanwhile, we should allow interesting research for which experimenters and operations may have to coordinate.
     499
     500GENI operators should follow best practices to hinder compromise and also detect and respond to compromise when it is detected.
     501GENI experimenters should notify meta-operations if they are doing something which could be dangerous or cause an outage so that this can be tracked like other outages.
     502Monitoring should not introduce security concerns (eg by overloading a resource), but may be used to monitor whether best practices which might affect others are being followed (eg detect software version are out of date).
     503
     504=== Architectural Overview: Conclusion ===
     505The members of the GENI federation are highly dependent on each other.  Operational monitoring largely consists of cooperatively resolving problems and this architecture describes the data and interfaces needed for the various actors to share the information required to do so.
     506
     507The individual members of the GENI federations (aggregates, campuses, experimenters) shown in the [#ArchitecturalOverview figure] are responsible for providing monitoring data about themselves to meta-operations.  Likewise, the GENI Clearinghouse provides authoritative information about slices, projects, and users to meta-operations. 
     508
     509Meta-operations makes the time series, relational and event monitoring information collected from these sources, combined with intra-GENI monitoring, available to members of the GENI federation.
     510
     511In addition, there are various out-of-band interfaces available for coordinating with entities that are not part of the GENI federation (such as regional networks) as well as for communicating with people who are responsible for slices as well as aggregate and campus operators.
     512
     513Data should be collected in a consistent manner taking care to ensure the privacy of data where appropriate. 
     514
     515Monitoring should be done in a manner consistent with GENI policies including maintaining point of contact information used in troubleshooting, emergency stop, and as part of other policies.
     516
     517== Appendix: Actor, Interface and Data Details ==
     518The following sections provide more detail about the items described
     519in the [#ActorOverview Actor], [#InterfaceOverview Interface], and [#DataOverview Data] sections above.
     520
     521=== Actors ===
     522Each of the actors in the monitoring architecture described [#ActorOverview previously] are described below in more detail.
     523
     524The descriptions include a definition of the actor, the operator of that aggregate (where appropriate), a list of the interfaces used by this actor, the data items that it both generates and stores, and any services provided to others.
     525
     526==== Meta-operations ====
     527
     528'''Definition:''' Meta-operations (a.k.a. GMOC) coordinates GENI-wide
     529operations with a focus on inter-aggregate operations.  (Note that aggregates and campuses are responsible for operating their own resources.)
     530
     531'''Operator:''' GENI Meta-Operations Center (GMOC)
     532
     533'''Interfaces:'''
     534 * [#Interface:Anyone-Meta-operations Anyone <-> Meta-operations]
     535 * [#Interface:AggregateCampusExperimenterAnyone-Meta-operations {Aggregate, Campus, Experimenter, Anyone} <-> Meta-operations]
     536 * [#Interface:Clearinghouse--Meta-operations Clearinghouse --> Meta-operations]
     537 * Out-of-band interface: [#Interface:Anyone-Experimenter Anyone --> Experimenter]
     538 * Out-of-band interface: [#Interface:Anyone-CampusAggregates Anyone --> {Campus, Aggregate}]
     539
     540''Note:'' Meta-operations supports the [#Interface:Anyone-Experimenter Anyone --> Experimenter] and
     541[#Interface:Anyone-CampusAggregates Anyone --> {Campus, Aggregate}]
     542interfaces by keeping track of the appropriate contact information and making it
     543accessible to people with appropriate permissions.
     544
     545'''Authoritative Source of Data Items:''' GENI-wide monitoring data; monitoring data
     546from each aggregate in GENI
     547
     548'''Services or visualizations provided:'''
     549 * Web interface for operators
     550 * Web interface for experimenters
     551
     552==== Clearinghouse ====
     553'''Definition:''' The Clearinghouse provides a set of services and APIs for the GENI federation and is the authoritative source for information about projects, slices, and users.
     554
     555'''Operator:''' The long term operator of the Clearinghouse has not
     556yet been determined.
     557
     558'''Interfaces:'''
     559 * [#Interface:Clearinghouse--Meta-operations Clearinghouse --> Meta-operations]
     560 * [#Interface:Experimenters-Clearinghouse Experimenter --> Clearinghouse]
     561 * [#Interface:AggregateManager-Clearinghouse Aggregate --> Clearinghouse]
     562 * [#Interface:Campus-Clearinghouse Campus --> Clearinghouse]
     563
     564'''Authoritative Source of Data Items:''' Slice, user, and project information.
     565
     566'''Data items which is a trusted repository for:''' The Clearinghouse
     567provides a service which aggregates can use to report information
     568about slices, slivers, and resources on those to the GENI
     569Clearinghouse.  This may not be required for all GENI aggregates, but
     570for simplicity this document assumes this service is being used.
     571If it is not, the same information would need to be obtained from
     572aggregates directly.
     573
     574'''Services or visualizations provided:'''
     575 * Services listed in the ["GeniArchitectTeam/GENI Software Architecture v1.0.pdf" Clearinghouse Architecture]
     576
     577
     578==== I&M ====
     579'''Definition:''' Instrumentation and measurement provides a place for members of the GENI community (especially experimenters) to archive data and make it available for others to use.
     580
     581'''Interfaces:'''
     582 * [#Interface:Experimenter-IM Experimenter <-> I&M] 
     583
     584'''Services or visualizations provided:'''
     585 * A variety of data archival, location, and sharing services.
     586
     587==== Aggregate ====
     588'''Definition:''' Aggregates provide resources (e.g. network and compute
     589resources) to the broader GENI community.  Examples include:
     590ProtoGENI, !PlanetLab, and !OpenFlow.
     591
     592Note that in particular, each GENI rack (such as ExoGENI and InstaGENI) is an aggregate.
     593
     594'''Operator:''' Aggregate operators (like ProtoGENI, !PlanetLab, and
     595Orca) or campuses.
     596
     597'''Interfaces:'''
     598
     599 * [#Interface:AggregateCampusExperimenterAnyone-Meta-operations Aggregate <-> Meta-operations]
     600 * [#Interface:AggregateManager-Clearinghouse Aggregate --> Clearinghouse]
     601 * [#Interface:AggregateManager-Experimenter Aggregate Manager <-> Experimenter]
     602 * Out-of-band interface: [#Interface:Anyone-CampusAggregates Anyone --> {Campus, Aggregate}]
     603
     604
     605'''Data items sourced:''' All data about this aggregate and its resources.
     606
     607'''Data items which is a trusted repository for:''' None
     608
     609'''Services or visualizations provided:'''
     610 * Varies.  Some aggregates provide extensive monitoring and
     611 visualization while others provide very little.
     612
     613==== Campus ====
     614'''Definition:''' Equipment on a campus which hosts GENI resources, aggregates, or parts of aggregates.  Some campuses also operate aggregates in addition to hosting them.  Campuses are usually colleges or univeresities, but they may be a city, business, or other administrative entity that owns resources.
     615
     616'''Operator:''' Campus IT
     617
     618'''Interfaces:'''
     619 * [#Interface:AggregateCampusExperimenterAnyone-Meta-operations Campus <-> Meta-operations]
     620 * [#Interface:Campus-Clearinghouse Campus --> Clearinghouse]
     621 * Campuses communicate with regional networks [#Interface:Campus-Regionals ''out-of-band''] as
     622 needed
     623 * Out-of-band interface: [#Interface:Anyone-CampusAggregates Anyone --> {Campus, Aggregate}]
     624
     625'''Data items sourced:''' Data about this campus.
     626
     627'''Data items which is a trusted repository for:''' None
     628
     629'''Services or visualizations provided:'''
     630 * Varies.
     631
     632==== Experimenter ====
     633'''Definition:''' Experimenter runs GENI experiments.
     634
     635'''Operator:''' Self
     636
     637'''Interfaces:'''
     638 * [#Interface:AggregateCampusExperimenterAnyone-Meta-operations Experimenter <-> Meta-operations]
     639 * [#Interface:Experimenters-Clearinghouse Experimenters --> Clearinghouse]
     640 * [#Interface:AggregateManager-Experimenter Aggregate Manager <-> Experimenter]
     641 * [#Interface:Experimenter-IM Experimenter <-> I&M] 
     642
     643'''Data items sourced:''' Data about this experiment.  Not all
     644experiments should provide data to meta-operations.  Appropriate
     645experiments to provide data to meta-operations are those which would
     646be insightful for other experimenters or operators (for example, a
     647experiment which monitored the connectivity of the GENI core would be appropriate).
     648
     649Experimental data itself is stored by I&M.
     650
     651'''Data items which is a trusted repository for:''' None
     652
     653'''Services or visualizations provided:'''
     654 * Varies.
     655
     656==== Other Infrastructure ====
     657''I2/NLR and Regionals''
     658
     659'''Definition:'''  Infrastructure not listed elsewhere that GENI relies on.  Some networks (e.g. Internet2 and NLR) participate in GENI. Others (e.g. some regionals) carry GENI traffic but do not participate in GENI directly.
     660
     661'''Operator:''' Self
     662
     663'''Interfaces:'''
     664 * Campuses communicate with regional networks
     665 [#Interface:Campus-Regionals ''out-of-band''] as
     666 needed
     667 *  [#Interface:Variousoperators-BackboneNetworks Internet2 is managed by GrNOC and NLR is managed by CENIC].
     668
     669'''Data items sourced:''' Infrastructure which is a participant in
     670GENI might make data available, but infrastructure which is not part
     671of GENI will likely not make data available.
     672
     673'''Data items which is a trusted repository for:''' None
     674
     675'''Services or visualizations provided:'''
     676 * None required.
     677
     678==== Legal, Law Enforcement, and Regulatory (LLR) Representative ====
     679'''Definition:'''  The person responsible for responding to and
     680resolving legal, law enforcement, and regulatory requests made to GENI.  The LLR Representative is also known as the LLR.
     681
     682'''Operator:''' LLR Representative is a person.
     683
     684'''Interfaces:'''
     685 * Communicates with experiments via slice e-mail:
     686 [#Interface:Anyone-Experimenter Anyone --> Experimenter]
     687 * Communicates with campuses and aggregates via their Point of
     688 Contact information:  [#Interface:Anyone-CampusAggregates Anyone --> {Campus, Aggregate}]
     689
     690'''References:''' [GpoDoc#LegalLawEnforcementandRegulatoryPlan LLR document]
     691
     692=== Interfaces ===
     693
     694The monitoring actors share information using the interfaces identified [#InterfaceOverview previously] and described below in more detail.
     695
     696The interfaces are grouped according to the legend in the [#ArchitecturalOverview Architecture figure] (eg the "Publish Data" section includes interfaces shown in blue in the figure).
     697
     698Each interface includes a description, whether the interface is push/pull, a list of data shared over this interface, the data format, the frequency of update and the purpose of the interface.
     699
     700==== Query data (out of GMOC DB) ====
     701===== Interface: Anyone <-> Meta-operations =====
     702 * '''Description:'''
     703  * Anyone can query data ''from'' the meta-operations database on demand
     704
     705 * '''Push/pull:''' pull
     706
     707 * '''Data:'''
     708   * On demand, query any time series, relational, or event data allowed by the [#Privacy privacy policy]:
     709     * [#Aggregatedata Aggregate data generated by aggregates]
     710     * [#Campusdata Campus data generated by campuses]
     711     * [#Slicedata Slice data generated by aggregates]
     712     * [#L2NetworkResourcesData Data generated by campuses]
     713     * [#GENI-widedata GENI-wide data]
     714
     715 * '''Data format:''' Currently this is done by requesting the [http://gmoc-db.grnoc.iu.edu/web-services/gen_api.pl previous 10 minutes of time series data] via HTTP.
     716
     717 * '''Frequency of update:''' on demand
     718
     719 * '''Purpose:''' Allows any entity to pull monitoring data from the meta-operations database.
     720
     721==== Publish data (into the GMOC DB) ====
     722
     723===== Interface: {Aggregate, Campus, Experimenter, Anyone} <-> Meta-operations =====
     724
     725 * '''Description:'''
     726     * Aggregates periodically push data ''into'' the meta-operations database
     727       * Alternatively, meta-operations can periodically pull the same data from the aggregates
     728     * Campuses and experimenters may use the same interfaces that
     729     aggregates use to publish data to the meta-operations database.
     730     * Anyone can perform GENI-wide monitoring and publish
     731     the results in the meta-operations database
     732
     733
     734 * '''Register the following information at meta-operations:'''
     735    * aggregate/campus point of contact
     736    * aggregate/campus geographical location
     737
     738 * '''Push/pull:''' push to meta-operations (or optionally meta-operations can pull this data periodically or on demand)
     739
     740 * '''Data:''' Periodically push any time series, relational, or event data with some consistent granularity (eg submit data collected at 30 second intervals every 5 minutes).
     741   * Data to be pushed includes:
     742     * [#Aggregatedata Aggregate data generated by aggregates]
     743     * [#Campusdata Campus data generated by campuses]
     744     * [#Slicedata Slice data generated by aggregates]
     745     * [#L2NetworkResourcesData Data generated by campuses]
     746     * [#GENI-widedata GENI-wide data]
     747
     748 * '''Data format:''' Currently, time series data is submitted via XML file defined by GMOC ([http://gmoc-db.grnoc.iu.edu/sources/measurement_api/measurement_sender.pl script]).
     749
     750 * '''Frequency of update:''' Currently updated every 5 minutes with data granularity of 30 seconds.
     751
     752 * '''Purpose:''' Allows aggregate, campus or experimenter to push monitoring data into the meta-operations database.
     753
     754 * '''Notes:''' Alternatively, GMOC can use nearly the same interface to pull the data from the aggregate.
     755
     756 * '''Variations'''
     757    * '''Data source:''' [#Aggregate Aggregate Manager] (AM)  / '''Data sink:''' [#Meta-operations Meta-operations]
     758    * '''Data source:''' [#Campus Campus] / '''Data sink:''' [#Meta-operations Meta-operations]
     759    * '''Data source:''' [#Experimenter Experimenter] / '''Data
     760    sink:''' [#Meta-operations Meta-operations]
     761
     762==== Publish/Query Clearinghouse ====
     763
     764'''Note:''' All interfaces involving the Clearinghouse will be defined
     765by the [wiki:GeniClearinghouse clearinghouse architecture] and related
     766documents.
     767
     768===== Interface: Clearinghouse --> Meta-operations =====
     769 * '''Description:'''
     770      * The clearinghouse provides authoritative data about GENI
     771      entities (slices, users, aggregates) including their existence, their creation and
     772      deletion, and the relationship between the same.
     773
     774 * '''Data:'''
     775  The clearinghouse provides:
     776   * authoritative slice, user and project data (including meta-data and historical data about the same),
     777   * a list of aggregates
     778   * a list of services
     779   * and may provide a copy of resource information for some aggregates and a mapping between slice, user, and resource information.
     780
     781 * '''Purpose:''' Allows meta-operations to get authoritative slice, user, and project info from the clearinghouse.  Also allows access to sliver information from aggregates which provide it to the Clearinghouse.
     782
     783===== Interface: Experimenters <-> Clearinghouse =====
     784 * '''Description:'''
     785     * Experimenters request the generation of user and slice
     786     credentials and information at the clearinghouse. 
     787     * The clearinghouse provides the experimenter information about
     788     their slices (and optionally about their slivers and nodes).
     789       * In particular, the clearinghouse knows about creation and
     790       deletion of user, slice, and project information (and meta-data
     791       about the same).
     792
     793 * '''Push/pull:''' Push
     794
     795 * '''Data:'''
     796   * list of experimenter's slices (including meta-data)
     797   * (optionally) list of slivers in experimenter's slice (including meta-data)
     798
     799 * '''Data format:''' TBD by clearinghouse architecture
     800
     801 * '''Frequency of update:''' on demand
     802
     803 * '''Purpose:''' Experimenter can query the clearinghouse for information about slices which they have access to.
     804
     805===== Interface: Aggregate Manager <-> Clearinghouse =====
     806 * '''Description:'''
     807     * Optionally Aggregate Managers provide sliver and node
     808     information to the clearinghouse.
     809       * In particular, AMs can optionally provide information to the
     810       clearinghouse such as: a sliver was created/deleted (and the time
     811       of that operation); nodes were added/removed from a sliver (and
     812       the time of that operation); a slice at an AM was shutdown (and
     813       the time of that operation).
     814
     815
     816 * '''Data:'''
     817   * (optionally) list of slivers in each slice and time sliver was created/deleted
     818   * (optionally) list of nodes in each sliver and time node was added/removed
     819   * (optionally) list of shutdown slivers and time shutdown was executed
     820
     821
     822 * '''Purpose:''' The clearinghouse stores an optional copy of information about slivers and an optional mapping between slivers and nodes.
     823
     824===== Interface: Campus <-> Clearinghouse =====
     825 * '''Description:'''
     826  * Campuses can access information stored at the Clearinghouse.
     827
     828 * '''Data:'''
     829  * list of users at campus
     830  * list of slices at campus
     831  * (optionally) list of slivers in each slice and time sliver was created/deleted
     832  * (optionally) list of nodes in each sliver and time node was added/removed
     833  * (optionally) list of shutdown slivers and time shutdown was executed
     834
     835 * '''Purpose:'''   Allows campuses to access information stored at the Clearinghouse.
     836
     837==== Publish/Query I&M ====
     838===== Interface: Experimenter <-> I&M =====
     839 * '''Purpose:''' Allow members of the GENI federation to share arbitrary data. 
     840
     841==== AM API ====
     842===== Interface: Aggregate Manager <-> Experimenter =====
     843 * '''Purpose:'''  Experimenters use the [http://groups.geni.net/geni/wiki/GeniApi AM API] to reserve resources at the aggregate manager (i.e. to create a sliver)
     844
     845 * '''Push/pull:''' push
     846
     847 * '''Data:'''
     848  * advertisement RSpec (contains a list of nodes at AM)
     849  * manifest RSpec (contains a list of nodes in a slice)
     850
     851 * '''Data format:''' [http://groups.geni.net/geni/wiki/GeniApi XMLRPC over HTTPS]
     852
     853 * '''Frequency of update:''' On demand.
     854
     855==== Out-of-band ====
     856===== Interface: Anyone -> Experimenter =====
     857 * '''Description:'''
     858       * A slice e-mail provides a way for anyone to talk to the
     859       experimenter(s) responsible for a slice.
     860
     861 * '''Notes:''' This slice e-mail is recommended by the [GpoDoc#LegalLawEnforcementandRegulatoryPlan LLR].
     862
     863===== Interface: Anyone -> Campus/Aggregates =====
     864 * '''Description:'''
     865       * Point-of-contact (POC) information provides a way to reach campuses and aggregates via e-mail or telephone.
     866
     867 * '''Data:'''
     868  * e-mail of campus/aggregate point of contact
     869  * telephone number of campus/aggregate point of contact
     870
     871 * '''Frequency of update:''' As needed
     872
     873===== Interface: Campus -> Regionals =====
     874 * '''Description:'''
     875       * Campuses are responsible for communicating with their regional
     876       networks, which aren't part of GENI.
     877
     878===== Interface: Various operators -> Backbone Networks =====
     879 * '''Description:'''
     880       * Internet2 is managed by GrNOC and NLR is managed by CENIC.
     881
     882=== Data ===
     883The monitoring actors and interfaces share data identified [#DataOverview previously] and described below in more detail.
     884
     885It is impossible to detail all data which might be collected to use for monitoring in GENI. The following list aims to provide a fairly minimal baseline of data that should be collected in Spirals 3 and 4 and which satisfies the [#UseCases use cases] described previously.  Note that the format of the data may be interface specific.
     886
     887==== GENI-wide data ====
     888The following data primarily describes the health of various GENI resources:
     889 * GENI aggregate is speaking the AM API right now. Plus historical versions of this info.
     890 * GENI aggregate is able to execute major element of the sliver creation workflow via the AM API right now. Plus historical versions of this info.
     891 * For each L2 network resource, is the network resource reachable (from one or more central locations on the network)?
     892 * For each pool of L2 network resources reservable between sites, is it possible to allocate and use an inter-site network resource right now (end-to-end test)? Plus historical versions of this info.
     893 * Health data about the control plane (e.g. how full is the myplc's disk as opposed to how full is the plnode's disk)
     894
     895==== Aggregate data ====
     896For each aggregate:
     897 * Find active/created slivers on aggregate.  Plus historical versions of this info.
     898 * Find active users on aggregate.
     899 * Find the state of resources (available/in use  (by whom)/down/missing/unknown) on aggregate. Plus historical versions of this info.
     900 * Find the utilization and history of resources on aggregate (as appropriate for its type: active processes, disk space used, flowspace rule count, bandwidth). Plus historical versions of this info.
     901 * Find distinct GENI users who had active slivers (or created slivers on said aggregate).  Plus historical versions of this info.
     902
     903 * point of contact
     904 * geographical location
     905
     906==== Campus data ====
     907For each campus:
     908 * point of contact
     909 * geographical location
     910
     911==== L2 Network Resources Data ====
     912For each L2 network resource:
     913 * Utilization of the network resource at the site (bandwidth/packets sent/received, breakdown by type e.g. to detect excessive broadcasts)?
     914
     915For each pool of L2 network resources reservable between sites:
     916 * Portion of the pool that is available/in use/not available? Plus historical versions of this info.
     917
     918==== Slice data ====
     919Given a slice owned by a GENI experimenter:
     920 * Find resources on slice.
     921 * Find state of resources on each sliver (active/down)? Plus historical versions of this info.
     922 * What is the utilization of each sliver resource (as appropriate for its type: active processes, disk space used, flowspace rule count, bandwidth). Plus historical versions of this info.
     923
     924==== Clearinghouse Data ====
     925
     926Data collected at the Clearinghouse:
     927 * slice information (name, creation/deletion/shutdown times)
     928 * users
     929 * projects
     930 * slivers (optional)
     931 * nodes (optional)
     932 * relationship between the above items
     933 * list of services provided by the clearinghouse
     934
     935Information about which resources are reserved in which slices may optionally be reported by the aggregate to the clearinghouse.  In the absence of this reporting meta-operations can observe this information.
     936
     937==== I&M Data ====
     938 * Experiment data