wiki:GeniRacks

Version 5 (modified by chaos@bbn.com, 13 years ago) (diff)

--

This is a list of Spiral 4 GENI rack requirements for the meso-scale environments. We expect rack teams to use this list as a reference for the December rack design reviews referenced in each team's Statement of Work. The GPO will use this list to evaluate systems and software for the meso-scale build-out.

GENI Rack Specification

GENI rack projects will deliver racks and develop software and systems over three years. These requirements apply to all Spiral 4 deliveries and rack functions through GEC15. Requirements will be reviewed and may be revised each year.

A. Software Requirements Summary

  1. Deliver an Aggregate Manager for reserving compute and network resources
  2. Comply with the GENI Aggregate Manager API
  3. Support GENI standard RSpecs
    1. Support the OpenFlow schema extensions for configuring OpenFlow resources
    2. Publish any aggregate-required extensions for aggregate specific functionality
  4. Federate with pgeni.gpolab.bbn.com as a slice authority
  5. Provide IP and static layer 2 VLAN connections between resources on the rack and the GENI backbone (Dynamic VLAN connections are also encouraged, but do not replace the requirement for static VLAN connections.)
  6. Allow OpenFlow flowspace to be reserved through the Aggregate Manager API
    1. Initial delivery may use a separate instance of the AM API if necessary
    2. Automatically configure intra-rack VLANs among reserved nodes
    3. Automatically provision and configure IP or VLAN connectivity to other aggregates in the slice as requested

B. Software Requirements Details

  1. Provide reservable sliceable compute and network resources using a single Aggregate Manager
    1. Initial delivery may use a second Aggregate Manager instance to reserve OpenFlow network flowspace resources
  2. Comply with the GENI Aggregate Manager API (as defined at ggw:GeniApi)
    1. Compliance will be measured with a set of GPO-supplied compliance tests
    2. Support GENI AM API v1 plus Change Set A from ggw:GAPI_AM_API_DRAFT (or a more recent version) for deliveries before May 2012
      1. Support GENI standard RSpecs per below
    3. Deploy and support updates to implement adopted AM API revisions on GENI racks no more than 6 months after GENI adoption (GENI AM API status is tracked on http://groups.geni.net/geni/wiki/GAPI_AM_API )
      1. Draft changes under discussion are available at ggw:GAPI_AM_API_DRAFT
      2. Supply a mechanism for applying updates remotely and reproducibly
  3. Federate with specific Slice Authorities
    1. Accept user certificates and slice credentials in GENI standard formats (as defined at ggw:GeniApi)
    2. Accept certificates and credentials from pgeni.gpolab.bbn.com in initial delivery
    3. Desired: Federate also with PlanetLab Central (www.planet-lab.org) and ProtoGENI Utah (boss.emulab.net)
  4. Support GENI format RSpecs
    1. Accept and produce GENI v3 RSpecs for advertisements, requests, and manifests
    2. Deploy and support updates to support new version of GENI RSpec schemas no more than 6 months after adoption
    3. Publish any RSpec extensions required for aggregate specific functionality
    4. Support the OpenFlow schema extensions for configuring OpenFlow resources
  5. Support at least two different operating systems for compute resources
    1. Provide images for experimenters
    2. Advertise image availability in the advertisement RSpec

C. Integration Requirements

These are rack requirements that are needed to integrate, evaluate, and support rack prototypes in the GENI meso-scale environment. We expect that GENI racks will interoperate with other GENI production aggregates, particularly with other GENI OpenFlow switches and compute resources.

  1. Compute resource requirements:
    1. Support operations for at least 100 simultaneously used virtual compute resources. Implement functions to verify isolation of simultaneously used resources.
    2. Support configuration options and operations for at least one bare metal compute resource in each rack.
    3. Support configuration options for operating virtual and bare metal compute resources simultaneously in a single rack.
    4. Support the ability to run Microsoft operating system for a user application on bare metal nodes. (Microsoft is requested but not required on VMs).
    5. Identify any restrictions on the types of supported Operating Systems that users can run on VMs or bare metal nodes.
  2. Network resource and experimental connectivity requirements:
    1. Support at least 100 simultaneous active (e.g. actually passing data) layer 2 Ethernet VLAN connections to the rack. For this purpose, VLAN paths must terminate on separate rack VMs, not on the rack switch.
    2. Be able to connect a single VLAN from a network external to the rack to multiple VMs in the rack. Do this for multiple external VLANs simultaneously. (Measurement and experimental monitoring VLANs are examples of VLANs that may need to connect to all active VMs simultaneously.)
    3. Be able to connect a single VLAN from a network external to the rack to multiple VMs and a bare metal compute resource in the rack simultaneously.
    4. Support individual addressing for each VM (e.g. IP and MAC addresses per VM should appear unique for experiments that require full dataplane virtualization.)
    5. Support AM API options to allow both static (pre-defined VLAN number) and dynamic (negotiated VLAN number e.g. interface to DYNES, enhanced SHERPA) configuration options for making GENI layer 2 connections
    6. Support the ability to run multiple OpenFlow controllers in the rack, and allow the site administrators (or their rack team representatives during early integration) to determine which controllers can affect the rack's OpenFlow switch. Note, site administrators may choose to default approve ALL controllers, but we do not expect that option to cover local site policy for all racks).
  3. Rack resource management requirements:
    1. Provide Admin, Operator, and User accounts to manage access to rack resources. Admin privileges should be similar to super user, Operator should provide access to common operator functions such as debug tools, emergency stop etc. not available to users. Access to all accounts must be more secure than username/password (e.g. require SSH key).
    2. Support remote access to Admin and Operator accounts for local site admins and GENI operations staff with better than username/password control (e.g. remote terminal access with SSH key)
    3. Implement functions needed for remotely-controlled power cycling the rack components, and ensure that rack components reboot automatically after a power failure or intentional power cycle.
    4. Implement functions needed to execute current Emergency Stop ( see http://groups.geni.net/geni/wiki/GpoDoc ). Rack teams are expected to implement changes to the GENI adopted Emergency Stop procedure and deploy them in racks no more than 3 months after they are approved at a GEC.
    5. Provide remote ability to determine and change active IP addresses on all addressable rack resources (including VMs).
    6. Provide remote ability to determine MAC addresses for all rack resources (including VMs).

D. Monitoring Requirements

Each rack must report status and health data to the GENI Meta-Operations Center using the currently-adopted GENI GMOC interfaces (see details below) Rack teams are expected to implement changes to the GENI adopted GMOC interfaces and deploy them in racks no less than 3 months after they are approved at a GEC.

  1. Current monitoring practices for GMOC reporting are documented at at http://groups.geni.net/geni/wiki/PlasticSlices/MonitoringRecommendations#Sitemonitoringarchitecture. The measurement sender is documented at http://gmoc-db.grnoc.iu.edu/sources/measurement_api/measurement_sender.pl and an example configuration file is is http://gmoc-db.grnoc.iu.edu/sources/measurement_api/
  2. Data may be submitted as time series via the GMOC data submission API, or GMOC may poll for the data, at the preference of the rack vendors and GMOC.
  3. Data must be submitted at least once every 10 minutes per rack, and may be submitted as often as desired. Simple data should be collected every minute; complex checks may be done less frequently.
  4. Timestamps on submitted data must be accurate to within 1 second
  5. The following types of data are of interest to GENI users, and must be collected and reported to GMOC:
    1. Health of aggregates: whether the AM is up and reachable via the AM API, what resources of what types the AM has and what their state is (in use, available, down/unknown), overall sliver count and resource utilization level on the aggregate, status and utilization of each sliver active on the aggregate (minimum: sliver uptime, sliver resource utilization, performance data as available)
    2. Health of network devices: liveness, interface traffic counters (including types of traffic e.g. broadcast/multicast), VLANs defined on interfaces, MAC address tables on data plane VLANs
    3. Health of hosts which serve experimental VMs: liveness, CPU/disk/memory utilization, interface counters on dataplane interfaces, VM count and capacity
    4. Run (and report the results of) health checks that create and use VMs and OpenFlow connections within the rack, at least hourly.
  6. The following types of data are operationally relevant and may be of interest to GENI users. Racks should collect these for their own use, and are encouraged to submit them to GMOC for aggregation and problem debugging:
    1. Health of network devices: CPU and memory utilization, openflow configuration and status
    2. Health of all hosts: liveness, CPU/disk/memory utilization, interface traffic counters, uptime, process counts, active user counts
    3. Rack infrastructure: power utilization, control network reachability of all infrastructure devices (KVMs, PDUs, etc), reachability of commodity internet, reachability of GENI data plane if testable
  7. Log and report total number of active users on a rack and identifiers for those users, where the total is measured over the time period that has elapsed since the last report. Report at least twice daily. It must be possible to identify an email address for each individual user identifier, but it is acceptable to require additional information that is not public or available on the rack to identify users by email, as long as that information is available to site support staff and GENI operations staff.
  8. Each rack must provide a static (always-available) interface on each GENI mesoscale VLAN, which can be used for reachability testing of the liveness of the mesoscale connection to the rack.

E. Production Aggregate Requirements

Because GENI racks are meant to be used by experimenters after a very short integration period, GENI racks must meet current GENI production aggregate requirements, beginning at the time of the rack's deployment. The rack production team may choose to take initial responsibility for the local site aggregate to meet these requirements during initial installation and integration, but must hand off responsibility to the local site within 3 months. See details in the Local Aggregate Owner Requirements section.

F. Local Aggregate Owner Requirements

Local site GENI rack owners must meet some GENI requirements for hosting and supporting a GENI rack. The rack development teams must provide functions needed to support rack owners in meeting these requirements.

  1. Sites provide space, power (preferably with backup), and air conditioning, network connection for both layer2 (Ethernet VLAN) and layer3 data plane connections to GENI, and publicly routeable IP addresses for the rack (separate control and data plane ranges). Addresses should be on an unrestricted network segment outside any site firewalls. All addresses dedicated to the nodes should have registered DNS names and support both forward and reverse lookup. Racks should not be subject to port filtering. Rack teams must document their minimum requirements for all site-provided services (e.g. number of required IP addresses) and provide complete system installation documentation on a public website (e.g. the GENI wiki).
  2. Sites must operate racks according to the most recent version of the GENI Aggregate Provider's Agreement and the GENI Recommended Use Policy (see http://groups.geni.net/geni/wiki/GpoDoc ). Rack teams must implement functions that allow site and GENI operations staff to monitor all rack nodes for intrusion attempts and abnormal behavior to support execution of the GENI Recommended Use Policy.
  3. Sites must provide at least two support contacts (preferably via a mailing list), who can respond to issues reported to GENI or proactively detected by monitoring. These contacts will join the GENI response-team@geni.net mailing list. Rack teams must implement functions that the site contacts can use to assist with incident response (e.g. site administrator accounts and tools). In particular, rack teams must support functions needed to respond to Legal, Law Enforcement and Regulatory issues with the GENI LLR Representative (e.g. ability to identify rack users by email address).
  4. Sites must inform GENI operations of actions they take on any open GENI issue report via the GMOC ticketing system. Rack teams should not establish parallel reporting requirements, and should not change deployed systems without opening and maintaining GMOC tracking tickets.
  5. Site support staff (and GENI operations) must be able to identify all software versions and view all configurations running on all GENI rack components once they are deployed. The rack users' experimental software running on the rack is exempt from this requirement.
  6. Site support staff (and GENI operations) must be able to view source code for any software covered by the GENI Intellectual Property Agreement that runs on the rack. Rack teams should document the location of such source code in their public site documentation (e.g. on the GENI wiki).
  7. Sites may set policy for use of the rack resources, and must be able to manage the OpenFlow and compute resources in the rack to implement that policy. The rack teams must identify each site's written GENI policy before installation, and implement functions that allow site support contacts to manage resources according to the site policies. Site policies must be documented on the site aggregate information page on the GENI wiki.
  8. Rack teams must implement functions that verify proper installation and operation of the rack resources with periodic network and compute resource health checks. These functions must succeed for at least 24 hours before a site takes over responsibility for a GENI rack aggregate.

G. Experimenter Requirements

  1. Experimenters must have root/admin capability on VMs.
  2. Experimenters must be able to provision multiple (physical or virtual) data plane interfaces per experiment.
  3. Experimenters must have direct layer 2 network access (receive and transmit), via linux SOCK_RAW socket type or similar capability.
  4. Rack must provide a mechanism to virtualize data plane network interfaces so that multiple experiments running on multiple VMs may simultaneously and independently use a single physical interface (e.g. by providing separate IP and MAC addresses for each VM). [Example use case: An experimenter wishes to test her non-IP protocol across thirty network nodes (implemented as five VMs in each of six GENI racks at six different physical sites). The network connections among the sites are virtualized into five separate networks (called VLANs here, but some other virtualization approach is OK), VLAN1, …, VLAN5. Some VMs represent end hosts in the network. They will open connections to only one virtualized interface on a particular network, e.g. eth0.VLAN2. Other VMs represent routers and open multiple virtualized interfaces on multiple networks, e.g. eth0.VLAN1, eth1.VLAN3, eth2.VLAN5. Packets transmitted on a particular VLANx are visible on all other virtual interfaces on VLANx, but they are not visible on virtual interfaces on different VLANs, nor are they visible to interfaces in any other experiment.]