Changes between Version 5 and Version 6 of MonitoringArch
- Timestamp:
- 06/06/12 18:51:18 (12 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
MonitoringArch
v5 v6 385 385 both in the name of the data and the definition of said data. For 386 386 example, if interface counters from switches at different campuses are 387 stored with different names it may be difficult to find and easily compare the traffic rates at various points in the network. Likewise, if some interfaces report byte counters over an interval and others report current data rates then the comparison between these two items may not be accurate or simple.387 stored with different names it may be difficult to find and easily compare the traffic rates at various points in the network. Likewise, if some interfaces report byte counters over an interval and others report current data rates, then the comparison between these two items may not be accurate or simple. 388 388 389 389 This means that data that is published to the GMOC database should use a naming system that is consistent across data submitters in two ways: … … 399 399 In addition, it is possible for a member of GENI to submit interesting data about another part of GENI and therefore it is important for the name of the resources to be consistent in order for that data to be useful. For example, meta-operations could run a test to see if each aggregate manager (AM) responds to a basic AM API call to determine that the AM is up and responding to requests. In this case, meta-operations needs to identify the aggregate it is testing by the same name the aggregate uses to describe itself when submitting their own monitoring data. 400 400 401 Where they are guaranteed to be unique, urns should be used to401 Where they are guaranteed to be unique, URNs should be used to 402 402 describe items like users and aggregates. In some 403 circumstances, urns are not unique and in these cases urns combined403 circumstances, URNs are not unique, and in these cases URNs combined 404 404 with another identifier should be used. For example, AM API v3 will 405 ensure that slices can be identified by the combination of a slice urn405 ensure that slices can be identified by the combination of a slice URN 406 406 and a UUID which together are globally unique. Names defined by equipment manufacturers should be used where appropriate (MAC addresses, interface names, etc). Finally if none of 407 407 these identifiers are appropriate, a consistent naming scheme is chosen … … 421 421 There are three types of data: 422 422 1. Some data CAN be shared publicly including: 423 * existence of slice, its name, and its identifiers ( urn, UUID), and423 * existence of slice, its name, and its identifiers (URN, UUID), and 424 424 * slice is active (has resources) 425 425 2. Some data CAN be shared IF the data is protected by access controls. … … 430 430 431 431 432 GENI experimenters should be warned that their slice name is public. Experimenter credentials may contain the user e-mail and this may be shared with aggregates they are using and GMOC, but should not be shared outside GMOC. In addition, basic information about an experiment (eg nodes used, interface stats) will be publicly available as it may be needed to debug GENI-wide problems as well as issues with individual experiments.432 GENI experimenters should be warned that their slice name is public. Experimenter credentials may contain the user e-mail and this may be shared with aggregates they are using and with GMOC, but should not be shared outside GMOC. In addition, basic information about an experiment (eg nodes used, interface stats) will be publicly available as it may be needed to debug GENI-wide problems as well as issues with individual experiments. 433 433 434 434 Individual organizations should follow their own privacy policies in determining what to share. … … 460 460 ==== Troubleshooting & Event Escalation ==== 461 461 462 In order to resolve both troubleshooting and security problems, meta-operations, aggregate operators, campuses, and experimenters must work together to resolve problems.462 In order to resolve both troubleshooting and security problems, meta-operations, aggregate operators, campuses, and experimenters must work together. 463 463 464 464 To facilitate this, aggregates must advertise their resources accurately. At a minimum aggregates must advertise their resources on an [AvailableAggregates aggregate wiki page]. It is preferred that they advertise their resources via the [http://groups.geni.net/geni/wiki/GeniApi AM API]. … … 495 495 ==== Security ==== 496 496 The role of security in monitoring and management is to prevent the compromise of GENI resources and to prevent the use of GENI resources to compromise other entities. 497 Meanwhile, we should allow interesting research for which experimenters and operations may haveto coordinate.498 499 GENI operators should follow best practices to hinder compromise and also detect and respond to compromise when it is detected.497 Meanwhile, we should allow interesting research which may require experimenters and operations to coordinate. 498 499 GENI operators should follow best practices to hinder compromise and also detect and respond to successful compromise. 500 500 GENI experimenters should notify meta-operations if they are doing something which could be dangerous or cause an outage so that this can be tracked like other outages. 501 Monitoring should not introduce security concerns (eg by overloading a resource), but may be used to monitor whether best practices which might affect others are being followed (eg detect software version are out of date).501 Monitoring should not introduce security concerns (eg by overloading a resource), and may help to detect potential security issues (eg outdated software). 502 502 503 503 === Architectural Overview: Conclusion === … … 512 512 Data should be collected in a consistent manner taking care to ensure the privacy of data where appropriate. 513 513 514 Monitoring should be done in a manner consistent with GENI policies including maintaining point of contact information used in troubleshooting , emergency stop, and as part of other policies.514 Monitoring should be done in a manner consistent with GENI policies including maintaining point of contact information used in troubleshooting and emergency stop. 515 515 516 516 == Appendix: Actor, Interface and Data Details == … … 521 521 Each of the actors in the monitoring architecture described [#ActorOverview previously] are described below in more detail. 522 522 523 The descriptions include a definition of the actor, the operator of that a ggregate(where appropriate), a list of the interfaces used by this actor, the data items that it both generates and stores, and any services provided to others.523 The descriptions include a definition of the actor, the operator of that actor (where appropriate), a list of the interfaces used by this actor, the data items that it both generates and stores, and any services provided to others. 524 524 525 525 ==== Meta-operations ==== … … 528 528 operations with a focus on inter-aggregate operations. (Note that aggregates and campuses are responsible for operating their own resources.) 529 529 530 '''Operator:''' GENI Meta-Operations Center (GMOC)530 '''Operator:''' Indiana University GrNOC (currently) 531 531 532 532 '''Interfaces:''' … … 576 576 577 577 ==== I&M ==== 578 '''Definition:''' Instrumentation and measurement provides a place for members of the GENI community (especially experimenters) to archive data and make it available for others to use.578 '''Definition:''' Instrumentation and measurement provides a set of services and APIs for members of the GENI community (especially experimenters) to archive data and make it available for others to use. I&M services are defined by the I&M working group. 579 579 580 580 '''Interfaces:''' … … 587 587 '''Definition:''' Aggregates provide resources (e.g. network and compute 588 588 resources) to the broader GENI community. Examples include: 589 ProtoGENI, !PlanetLab, and !OpenFlow. 589 ProtoGENI, !PlanetLab, and !OpenFlow. (Aggregates may be distributed over widely separated physical locations and operated by different organizations, unlike campuses.) 590 590 591 591 Note that in particular, each GENI rack (such as ExoGENI and InstaGENI) is an aggregate. … … 611 611 612 612 ==== Campus ==== 613 '''Definition:''' Equipment on a campus whichhosts GENI resources, aggregates, or parts of aggregates. Some campuses also operate aggregates in addition to hosting them. Campuses are usually colleges or univeresities, but they may be a city, business, or other administrative entity that owns resources.613 '''Definition:''' Organization and equipment on a campus that hosts GENI resources, aggregates, or parts of aggregates. Some campuses also operate aggregates in addition to hosting them. Campuses are usually colleges or univeresities, but they may be a city, business, or other administrative entity that owns resources. 614 614 615 615 '''Operator:''' Campus IT … … 712 712 * [#GENI-widedata GENI-wide data] 713 713 714 * '''Data format:''' Currently this is done by requesting the [http://gmoc-db.grnoc.iu.edu/web-services/gen_api.pl previous 10 minutes of time series data] via HTTP.715 716 714 * '''Frequency of update:''' on demand 717 715 718 * '''Purpose:''' Allows any entity to pull monitoring data from the meta-operations database.716 * '''Purpose:''' Allows any entity to pull monitoring data from the meta-operations [http://gmoc.grnoc.iu.edu/gmoc/index/gmoc-live-db.html database]. 719 717 720 718 ==== Publish data (into the GMOC DB) ==== … … 862 860 ===== Interface: Anyone -> Campus/Aggregates ===== 863 861 * '''Description:''' 864 * Point-of-contact (POC) information provides a way to reach campuses and aggregates via e-mail or telephone. 862 * Point-of-contact (POC) information provides a way to reach campuses and aggregates via e-mail or telephone. This information is not public, but is made available to those with appropriate permissions. 865 863 866 864 * '''Data:''' … … 872 870 ===== Interface: Campus -> Regionals ===== 873 871 * '''Description:''' 874 * Campuses are responsible for communicating with theirregional875 networks, whicharen't part of GENI.872 * Campuses are responsible for communicating with regional 873 networks, if they aren't part of GENI. 876 874 877 875 ===== Interface: Various operators -> Backbone Networks ===== … … 908 906 * geographical location 909 907 908 Could also upload: 909 * campus equipment status 910 911 In addition, logs should be kept and made available on request. 912 913 910 914 ==== L2 Network Resources Data ==== 911 915 For each L2 network resource: … … 918 922 Given a slice owned by a GENI experimenter: 919 923 * Find resources on slice. 920 * Find state of resources on each sliver (active/down) ?Plus historical versions of this info.924 * Find state of resources on each sliver (active/down). Plus historical versions of this info. 921 925 * What is the utilization of each sliver resource (as appropriate for its type: active processes, disk space used, flowspace rule count, bandwidth). Plus historical versions of this info. 922 926