wiki:GENIRacksHome/AcceptanceTests/InstageniAcceptanceTestsPlan

Version 20 (modified by lnevers@bbn.com, 6 years ago) (diff)

--

InstaGENI Acceptance Test Plan

This page captures the GENI Racks Acceptance Test Plan to be executed for the InstaGENI project. Tests in this plan are based on the InstaGENI Use Cases and each test provides a mapping to the GENI Racks Requirements that it validates. This plan defines tests that cover the following types of requirements: Integration (C), Monitoring(D), Experimenter(G) and Local Aggregate (F) requirements. The GENI AM API Acceptance tests suite covers Software (A and B) requirements that are not covered in this plan. This plan covers high-priority functional tests; InstaGENI racks support more functions than those highlighted here.

The GPO infrastructure team will perform the tests described in this page and the GPO software team will run the software acceptance tests in cooperation with the InstaGENI rack team. The GPO will execute all of this test plan for the first two InstaGENI racks delivered on the GENI network (InstaGENI Utah and BBN). The InstaGENI acceptance test effort will capture findings and unresolved issues, which will be published in an Acceptance Test Report. GENI operations (beginning with the InstaGENI at Utah) will run a shorter version of the acceptance tests for each subsequent site installed. Issues from site installation tests will be tracked in operations tickets. See the InstaGENI Acceptance Test Status page for details about the current status.

Assumptions and Dependencies

The following assumptions are made for all tests described in this plan:

  • GPO ProtoGENI credentials from https://pgeni.gpolab.bbn.com are used for all tests.
  • GPO ProtoGENI is the Slice Authority for all tests.
  • Resources for each test will be requested from the InstaGENI Aggregate Manager.
  • Compute resources are VMs unless otherwise stated.
  • All Aggregate Manager requests MUST be made via the Omni command line tool which uses the GENI AM API.
  • In all scenarios, one experiment is always equal to one slice.
  • InstaGENI will be used as the interface to the rack FOAM for OpenFlow resources in the OpenFlow test cases.
  • FOAM will be used as the OpenFlow aggregate manager for Meso-scale resources in the OpenFlow test cases.
  • Experimenter test cases will use Internet2.

It is expected that the InstaGENI Aggregate Manager will provide an interface into the FOAM aggregate. If the InstaGENI interface to FOAM is not available, tests will be executed by submitting requests directly to FOAM; When the interface to FOAM become available the test will be re-executed.

The InstaGENI solution does not provide support for experimenters uploading custom VM images to the rack. Therefore, any test case using custom images will modified to use available images for the rack. The ability to upload a custom VM image to the InstaGENI rack will be tested when it becomes available. Test Traffic Profile:

  • Experiment traffic includes UDP and TCP data streams at low rates to ensure end-to-end delivery
  • Traffic exchange is used to verify that the appropriate data paths are used and that traffic is delivered successfully for each test described.
  • Performance measurement is not a goal of these acceptance tests.

Acceptance Tests Descriptions

This section describes each acceptance test by defining its goals, topology, and outline test procedure. Test cases are listed by priority in sections below. The cases that verify the largest number of requirement criteria are typically listed at a higher priority. The prerequisite tests will be executed first to verify that baseline monitoring and administrative functions are available. This will allow the execution of the experimenter test cases. Additional monitoring and administrative tests described in later sections will also run before the completion of the acceptance test effort.

Administration Prerequisite Tests

Administrative Acceptance tests will verify support of administrative management tasks and focus on verifying priority functions for each of the rack components. The set of administrative features described in this section will be verified initially. Additional administrative tests are described in a later section which will be executed before the acceptance test completion.

IG-ADM-1: Rack Receipt and Inventory Test

This "test" uses BBN as an example site by verifying that we can do all the things we need to do to integrate the rack into our standard local procedures for systems we host.

Procedure

Outline:

  • InstaGENI and GPO power and wire the BBN rack
  • GPO configures the instageni.gpolab.bbn.com DNS namespace and 192.1.242.128/25 IP space, and enters all public IP addresses used by the rack into DNS.
  • GPO requests and receives administrator accounts on the rack and GPO sysadmins receive read access to all InstaGENI monitoring of the rack.
  • GPO inventories the physical rack contents, network connections and VLAN configuration, and power connectivity, using our standard operational inventories.
  • GPO, InstaGENI, and GMOC share information about contact information and change control procedures, and InstaGENI operators subscribe to GENI operations mailing lists and submit their contact information to GMOC.
  • GPO reviews the documented parts list, power requirements, physical and logical network connectivity requirements, and site administrator community requirements, verifying that these documents should be sufficient for a new site to use when setting up a rack.

See the full detailed test plan for IG-ADM-1 for more info.

IG-ADM-2: Rack Administrator Access Test

This test verifies local and remote administrative access to rack devices.

Procedure

Outline:

  1. For each type of rack infrastructure node, including the server host itself, the boss VM, the ops VM, the FOAM VM and an experimental host configured for OpenVZ, use a site administrator account to test:
    • Login to the node using public-key SSH.
    • Verify that you cannot login to the node using password-based SSH, nor via any unencrypted login protocol.
    • When logged in, run a command via sudo to verify root privileges.
  2. For each rack infrastructure device (switches, remote PDUs if any), use a site administrator account to test:
    • Login via SSH.
    • Login via a serial console (if the device has one).
    • Verify that you cannot login to the device via an unencrypted login protocol.
    • Use the "enable" command or equivalent to verify privileged access.
  3. Test that iLO (the InstaGENI remote console solution for rack hosts) can be used to access the consoles of the server host and an experimental host:
    • Login via SSH or other encrypted protocol.
    • Verify that you cannot login via an unencrypted login protocol.

See the full detailed test plan for IG-ADM-2 for more info.

Monitoring/Rack Inspection Prerequisite Tests

These tests verify that GPO can locate and access information which may be needed to determine rack state and debug problems during experimental testing, and also verify by observation that rack components are ready to be tested. Additional monitoring tests are defined in a later section to complete the validation in this section.

IG-MON-1: Control Network Software and VLAN Inspection Test

This test inspects the state of the rack control network, infrastructure nodes, and system software.

Procedure

  • A site administrator enumerates processes on each of the server host, the boss VM, the ops VM, the FOAM VM, and an experimental node configured for OpenVZ, which listen for network connections from other nodes, identifies what version of what software package is in use for each, and verifies that we know the source of each piece of software and could get access to its source code.
  • A site administrator reviews the configuration of the rack control plane switch and verifies that each experimental node's control and iLO interfaces are on the expected VLANs.
  • A site administrator reviews the MAC address table on the control plane switch, and verifies that all entries are identifiable and expected.

IG-MON-2: GENI Software Configuration Inspection Test

This test inspects the state of the GENI AM software in use on the rack.

Procedure

  • A site administrator uses available system data sources (process listings, monitoring output, system logs, etc) and/or AM administrative interfaces to determine the configuration of InstaGENI resources:
    • How many experimental nodes are available for bare metal use, how many are configured as OpenVZ containers, and how many are configured as PlanetLab containers.
    • What operating system each OpenVZ container makes available for experimental VMs.
    • How many unbound VLANs are in the rack's available pool.
    • Whether the ProtoGENI and FOAM AMs trust the pgeni.gpolab.bbn.com slice authority, which will be used for testing.
  • A site administrator uses available system data sources to determine the configuration of OpenFlow resources according to FOAM, InstaGENI, and FlowVisor.

IG-MON-3: GENI Active Experiment Inspection Test

This test inspects the state of the rack data plane and control networks when experiments are running, and verifies that a site administrator can find information about running experiments.

Procedure

  • An experimenter from the GPO starts up experiments to ensure there is data to look at:
    • An experimenter runs an experiment containing at least one rack OpenVZ VM, and terminates it.
    • An experimenter runs an experiment containing at least one rack OpenVZ VM, and leaves it running.
  • A site administrator uses available system and experiment data sources to determine current experimental state, including:
    • How many VMs are running and which experimenters own them
    • How many physical hosts are in use by experiments, and which experimenters own them
    • How many VMs were terminated within the past day, and which experimenters owned them
    • What OpenFlow controllers the data plane switch, the rack FlowVisor, and the rack FOAM are communicating with
  • A site administrator examines the switches and other rack data sources, and determines:
    • What MAC addresses are currently visible on the data plane switch and what experiments do they belong to?
    • For some experiment which was terminated within the past day, what data plane and control MAC and IP addresses did the experiment use?
    • For some experimental data path which is actively sending traffic on the data plane switch, do changes in interface counters show approximately the expected amount of traffic into and out of the switch?

Experimenter Acceptance Tests

IG-EXP-1: Bare Metal Support Acceptance Test

Bare metal nodes are exclusive dedicated physical nodes that are used throughout the experimenter test cases. This section outlines features to be verified which are not explicitly validated in other scenarios:

  1. Determine which nodes can be used as exclusive nodes.
  2. Obtain 2 licensed recent Microsoft OS images for physical nodes from the site (BBN).
  3. Reserve and boot 2 physical nodes using Microsoft image.
  4. Obtain a recent Linux OS image for physical nodes from the InstaGENI list.
  5. Reserve and boot a physical node using this Linux OS image.
  6. Release physical node resource.
  7. Modify Aggregate resource allocation for the rack to add 1 additional physical node.

IG-EXP-2: InstaGENI Single Site Acceptance Test

This one site test run on the Utah InstaGENI rack includes two experiments. Each experiment requests local compute resources, which generate bidirectional traffic over a Layer 2 data plane network connection. The goals of this test are to verify basic operations of VMs and data flows within one rack; verify the ability to request a publically routable IP address and public TCP/UDP port mapping for a control interface on a compute resource; and verify the ability to add a customized image for the rack.

Test Topology

This test uses this topology:

Note: The diagram shows the logical end-points for each experiment traffic exchange. The VMs may or may not be on different experiment nodes.

Prerequisites

This test has these prerequisites:

  • InstaGENI makes available at least two Linux distributions and a FreeBSD image as stated in design document
  • Two GPO customized Ubuntu image snapshots are available and have been manually uploaded by the rack administrator using available InstaGENI documentation. One Ubuntu image is for the VM and one Ubuntu image is for the physical node in this test.
  • Traffic generation tools may be part of image or may be installed at experiment runtime.
  • Administrative accounts have been created for GPO staff on the Utah InstaGENI rack.
  • GENI Experimenter1 and Experimenter2 accounts exist.
  • Baseline Monitoring is in place for the entire Utah site, to ensure that any problems are quickly identified.

Procedure

Do the following:

  1. As Experimenter1, request ListResources from Utah InstaGENI.
  2. Review advertisement RSpec for a list of OS images which can be loaded, and identify available resources.
  3. Verify that the GPO Ubuntu customized image is available in the advertisement RSpec.
  4. Define a request RSpec for two VMs, each with a GPO Ubuntu image. Request a publically routable IP address and public TCP/UDP port mapping for the control interface on each node.
  5. Create the first slice.
  6. Create a sliver in the first slice, using the RSpec defined in step 4.
  7. Log in to each of the systems, and send traffic to the other system sharing a VLAN.
  8. Using root privileges on one of the VMs load a Kernel module. It is expected this will not work on shared OpenVZ nodes, testing will proceed past this step.
  9. Run a netcat listener and bind to port XYZ on each of the VMs in the Utah rack.
  10. Send traffic to port XYZ on each of the VMs in the Utah rack over the control network from any commodity Internet host.
  11. As Experimenter2, request ListResources from Utah InstaGENI.
  12. Define a request RSpec for two physical nodes, both using the uploaded GPO Ubuntu images.
  13. Create the second slice.
  14. Create a sliver in the second slice, using the RSpec defined in step 12.
  15. Log in to each of the systems, and send traffic to the other system.
  16. Verify that experimenters 1 and 2 cannot use the control plane to access each other's resources (e.g. via unauthenticated SSH, shared writable filesystem mount)
  17. Review system statistics and VM isolation and network isolation on data plane.
  18. Verify that each VM has a distinct MAC address for that interface.
  19. Verify that VMs' MAC addresses are learned on the data plane switch.
  20. Stop traffic and delete slivers.

IG-EXP-3: InstaGENI Single Site 100 VM Test

This one site test runs on the Utah InstaGENI rack and includes various scenarios to validate compute resource requirements for VMs. The goal of this test is not to validate the InstaGENI limits, but simply to verify that the InstaGENI rack can provide 100 VMs with its experiment nodes under various scenarios, including:

  • Scenario 1: 1 Slice with 100 VMs
  • Scenario 2: 2 Slices with 50 VMs each
  • Scenario 3: 4 Slices with 25 VMS each
  • Scenario 4: 50 Slices with 2 VMs each
  • Scenario 5: 100 Slices with 1 VM each

If Scenario 1 does not work, tests will be run to determine the maximum number of VMs allowed in one slice. In the event that the maximum number of VMs allowed in one slice is less than 50 VMs, than scenario 2 will not be executed. This test will evenly distribute the VM requests across the available experiment nodes for each scenario.

Test Topology

This test uses this topology:

Prerequisites

This test has these prerequisites:

  • Traffic generation tools may be part of image or installed at experiment runtime.
  • Administrative accounts exist for GPO staff on the Utah InstaGENI rack.
  • GENI Experimenter1 account exists.
  • Baseline Monitoring is in place for the entire Utah site, to ensure that any problems are quickly identified.

Procedure

Do the following:

  1. As Experimenter1, request ListResources from Utah InstaGENI.
  2. Review ListResources output, and identify available resources.
  3. Write the Scenario 1 RSpec that requests 100 VMs evenly distributed across the experiment nodes using the default image.
  4. Create a slice.
  5. Create a sliver in the slice, using the RSpec defined in step 3.
  6. Log into several of the VMs, and send traffic to several other systems.
  7. Step up traffic rates to verify VMs continue to operate with realistic traffic loads.
  8. Review system statistics and VM isolation (does not include network isolation)
  9. Verify that several VMs running on the same experiment node have a distinct MAC address for their interface.
  10. Verify for several VMs running on the same experiment node, that their MAC addresses are learned on the data plane switch.
  11. Review monitoring statistics and check for resource status for CPU, disk, memory utilization, interface counters, uptime, process counts, and active user counts.
  12. Stop traffic and delete sliver.
  13. Re-execute the procedure described in steps 1-12 with changes required for Scenario 2 (2 Slices with 50 VMs each).
  14. Re-execute the procedure described in steps 1-12 with changes required for Scenario 3 (4 Slices with 25 VMS each).
  15. Re-execute the procedure described in steps 1-12 with changes required for Scenario 4 (50 Slices with 2 VMs each).
  16. Re-execute the procedure described in steps 1-12 with changes required for Scenario 5 (100 Slices with 1 VM each).

IG-EXP-4: InstaGENI Multi-site Acceptance Test

This test includes two sites and two experiments, using resources in the BBN and Utah InstaGENI racks. Each of the compute resources will exchange traffic over the GENI core network. In addition, the BBN VM and bare metal resources in Experiment2 will use multiple data interfaces. All site-to-site experiments will take place over a wide-area Layer 2 data plane network connection via Internet2 using VLANs allocated by the AM. The goal of this test is to verify basic operations of VMs and data flows between the two racks.

Test Topology

This test uses this topology:

Note: Each of the Experiment2 traffic dashed lines equals a VLAN. The VMs shown may or may not be on different experiment nodes.

Prerequisites

This test has these prerequisites:

  • BBN InstaGENI connectivity statistics will be monitored at the GPO InstaGENI Monitoring site.
  • This test will be scheduled at a time when site contacts are available to address any problems.
  • Administrative accounts have been created for GPO staff at each rack.
  • The VLANs used will be allocated by the rack AM.
  • Baseline Monitoring is in place at each site, to ensure that any problems are quickly identified.
  • InstaGENI manages private address allocation for the endpoints in this test.
  • Availability of the ION AM, if not available, a static VLAN will be used.

Procedure

Do the following:

  1. As Experimenter1, Request ListResources from BBN InstaGENI.
  2. Request ListResources from Utah InstaGENI.
  3. Review ListResources output from both AMs.
  4. Define a request RSpec for a VM at BBN InstaGENI.
  5. Define a request RSpec for a VM at Utah InstaGENI and an unbound exclusive non-OpenFlow VLAN to connect the 2 endpoints.
  6. Create the first slice.
  7. Create a sliver at each InstaGENI aggregate using the RSpecs defined above.
  8. Log in to each of the systems, and send traffic to the other system, leave traffic running.
  9. As Experimenter2, Request ListResources from BBN and Utah InstaGENI.
  10. Define a request RSpec for one VM and one bare metal node in the BBN InstaGENI rack. Each resource should have two logical interfaces and a 3rd VLAN for the local connection.
  11. Define a request RSpec to add two VMs at Utah and two VLANs to connect the BBN InstaGENI to the Utah InstaGENI.
  12. Create a second slice.
  13. In the second slice, create a sliver at each InstaGENI aggregate using the RSpecs defined above.
  14. Log in to each of the end-point systems, and send traffic to the other end-point system which shares the same VLAN.
  15. Verify traffic handling per experiment, VM isolation, and MAC address assignment.
  16. Construct and send a non-IP ethernet packet over the data plane interface.
  17. Review baseline monitoring statistics.
  18. Run test for at least 4 hours.
  19. Review baseline monitoring statistics.
  20. Stop traffic and delete slivers.

Note: After a successful test run, this test will be revisited and the procedure will be re-executed as a longevity test for a minimum of 24 hours rather than 4 hours.

IG-EXP-5: InstaGENI Network Resources Acceptance Test

A three site experiment where the only InstaGENI resources used are OpenFlow network resources. All compute resources are outside the InstaGENI rack. The experiment will use the InstaGENI Aggregate Manager to request the rack data plane resources. The InstaGENI AM communicates with FOAM to configure the InstaGENI site OpenFlow switch. The goal of this test is to verify OpenFlow operations and integration with meso-scale compute resources and other compute resources external to the InstaGENI rack.

Test Topology

Note: The NLR and Internet2 OpenFlow VLANs are the GENI Network Core static VLANs.

Note: The InstaGENI AM is not shown in the diagram, but the experiment request is initially made through the InstaGENI AM, which communicates with FOAM.

Prerequisites

  • A GPO site network is connected to the InstaGENI OpenFlow switch
  • InstaGENI FOAM server is running and can manage the InstaGENI OpenFlow switch
  • An OpenFlow controller is run by the experimenter and is accessible by FlowVisor via DNS hostname (or IP address) and TCP port.
  • Two meso-scale remote sites make compute resources and OpenFlow meso-scale resources available for this test
  • GMOC data collection for the meso-scale and InstaGENI rack resources is functioning for the OpenFlow and traffic measurements required in this test.

Procedure

The following operations are to be executed:

  1. As Experimenter1, Determine BBN compute resources and define RSpec.
  2. Determine remote meso-scale compute resources and define RSpec.
  3. Define a request RSpec for OpenFlow network resources at the BBN InstaGENI AM.
  4. Define a request RSpec for OpenFlow network resources at the remote I2 Meso-scale site.
  5. Define a request RSpec for the OpenFlow Core resources
  6. Create the first slice
  7. Create a sliver for the BBN compute resources.
  8. Create a sliver at the I2 meso-scale site using FOAM at site.
  9. Create a sliver at of the BBN InstaGENI AM.
  10. Create a sliver for the OpenFlow resources in the core.
  11. Create a sliver for the meso-scale compute resources.
  12. Log in to each of the compute resources and send traffic to the other end-point.
  13. Verify that traffic is delivered to target.
  14. Review baseline, GMOC, and meso-scale monitoring statistics.
  15. As Experimenter2, determine BBN compute resources and define RSpec.
  16. Determine remote meso-scale compute resources and define RSpec.
  17. Define a request RSpec for OpenFlow network resources at the BBN InstaGENI AM.
  18. Define a request RSpec for OpenFlow network resources at the remote NLR Meso-scale site.
  19. Define a request RSpec for the OpenFlow Core resources
  20. Create the second slice
  21. Create a sliver for the BBN compute resources.
  22. Create a sliver at the meso-scale site using FOAM at site.
  23. Create a sliver at of the BBN InstaGENI AM.
  24. Create a sliver for the OpenFlow resources in the core.
  25. Create a sliver for the meso-scale compute resources.
  26. Log in to each of the compute resources and send traffic to the other endpoint.
  27. As Experimenter2, insert flowmods and send packet-outs only for traffic assigned to the slivers.
  28. Verify that traffic is delivered to target according to the flowmods settings.
  29. Review baseline, GMOC, and monitoring statistics.
  30. Stop traffic and delete slivers.

IG-EXP-6: InstaGENI and Meso-scale Multi-site OpenFlow Acceptance Test

This test includes three sites and three experiments, using resources in the BBN and Utah InstaGENI racks as well as meso-scale resources, where the network resources are the core OpenFlow-controlled VLANs. Each of the compute resources will exchange traffic with the others in its slice, over a wide-area Layer 2 data plane network connection, using Internet2 and NLR VLANs. In particular, the following slices will be set up for this test:

  • Slice 1: One InstaGENI VM at each of BBN and Utah.
  • Slice 2: Two InstaGENI VMs at Utah and one VM and one bare metal nodea at BBN.
  • Slice 3: An InstaGENI VM at BBN, a PG node at BBN, and a meso-scale Wide-Area ProtoGENI (WAPG) node.

Test Topology

This test uses this topology:

Note: The two Utah VMs in Experiment2 must be on the same experiment node. This is not the case for other experiments.

Note: The NLR and Internet2 OpenFlow VLANs are the GENI Network Core static VLANs.

Prerequisites

This test has these prerequisites:

  • Meso-scale sites are available for testing
  • BBN InstaGENI connectivity statistics are monitored at the GPO InstaGENI Monitoring site.
  • GENI Experimenter1, Experimenter2 and Experimenter3 accounts exist.
  • This test will be scheduled at a time when site contacts are available to address any problems.
  • Both InstaGENI aggregates can link to static VLANs.
  • Site's OpenFlow VLAN is implemented and is known for this test. (Current example is meso-scale VLAN 1750)
  • Baseline Monitoring is in place at each site, to ensure that any problems are quickly identified.
  • GMOC data collection for the meso-scale and InstaGENI rack resources is functioning for the OpenFlow and traffic measurements required in this test.
  • An OpenFlow controller is run by the experimenter and is accessible by FlowVisor via DNS hostname (or IP address) and TCP port.

Note: If Utah PG access to OpenFlow is available when this test is executed, a PG node will be added to the third slice. This node is not shown in the diagram above.

Procedure

Do the following:

  1. As Experimenter1, request ListResources from BBN InstaGENI, Utah InstaGENI, and from FOAM at I2 and NLR Site.
  2. Review ListResources output from all AMs.
  3. Define a request RSpec for a VM at the BBN InstaGENI.
  4. Define a request RSpec for a VM at the Utah InstaGENI.
  5. Define request RSpecs for OpenFlow resources from BBN FOAM to access GENI OpenFlow core resources.
  6. Define request RSpecs for OpenFlow core resources at I2 FOAM
  7. Define request RSpecs for OpenFlow core resources at NLR FOAM.
  8. Create the first slice.
  9. Create a sliver in the first slice at each AM, using the RSpecs defined above.
  10. Log in to each of the systems, verify IP address assignment. Send traffic to the other system, leave traffic running.
  11. As Experimenter2, define a request RSpec for one VM and one physical node at BBN InstaGENI.
  12. Define a request RSpec for two VMs on the same experiment node at Utah InstaGENI.
  13. Define request RSpecs for OpenFlow resources from BBN FOAM to access GENI OpenFlow core resources.
  14. Define request RSpecs for OpenFlow core resources at I2 FOAM.
  15. Define request RSpecs for OpenFlow core resources at NLR FOAM.
  16. Create a second slice.
  17. Create a sliver in the second slice at each AM, using the RSpecs defined above.
  18. Log in to each of the systems in the slice, and send traffic to each other systems; leave traffic running
  19. As Experimenter3, request ListResources from BBN InstaGENI, BBN meso-scale FOAM, and FOAM at Meso-scale Site (Internet2 Site BBN and NLR site is TBD).
  20. Review ListResources output from all AMs.
  21. Define a request RSpec for a VM at the BBN InstaGENI.
  22. Define a request RSpec for a compute resource at the BBN meso-scale site.
  23. Define a request RSpec for a compute resource at a meso-scale site.
  24. Define request RSpecs for OpenFlow resources to allow connection from OpenFlow BBN InstaGENI to Meso-scale OpenFlow sites(BBN and second site TBD) (I2 and NLR).
  25. If PG access to OpenFlow is available, define a request RSpec for the PG OpenFlow resource.
  26. Create a third slice.
  27. Create slivers that connects the TBD Internet2 Meso-scale OpenFlow site to the BBN InstaGENI Site, and the BBN Meso-scale site; and if available, to PG node.
  28. Log in to each of the compute resources in the slice, configure data plane network interfaces on any non-InstaGENI resources as necessary, and send traffic to each other systems; leave traffic running.
  29. Verify that all three experiment continue to run without impacting each other's traffic, and that data is exchanged over the path along which data is supposed to flow.
  30. Review baseline monitoring statistics and checks.
  31. As site administrator, identify all controllers that the BBN InstaGENI OpenFlow switch is connected to
  32. As Experimenter3, verify that traffic only flows on the network resources assigned to slivers as specified by the controller.
  33. Verify that no default controller, switch fail-open behavior, or other resource other than experimenters' controllers, can control how traffic flows on network resources assigned to experimenters' slivers.
  34. Set the hard and soft timeout of flowtable entries
  35. Get switch statistics and flowtable entries for slivers from the OpenFlow switch.
  36. Get layer 2 topology information about slivers in each slice.
  37. Install flows that match only on layer 2 fields, and confirm whether the matching is done in hardware.
  38. If supported, install flows that match only on layer 3 fields, and confirm whether the matching is done in hardware.
  39. Run test for at least 4 hours.
  40. Review monitoring statistics and checks as above.
  41. Delete slivers.

Documentation:

  1. Verify access to documentation about which OpenFlow actions can be performed in hardware.

IG-EXP-7: Click Router Experiment Acceptance Test

This test case uses a Click modular router experiment with InstaGENI VM nodes. The scenario uses 2 VMs as hosts and 4 VMs as Click Routers and is based on the following Click example experiment, although unlike the example, this test case uses VMs and it runs the Click router module in user space.

Test Topology

This test uses this topology:

Note: Two VMs will be requested on the same physical worker node at each rack site for the user-level Click Router .

Prerequisites

This test has these prerequisites:

  • TBD

Procedure

Do the following:

  1. As Experimenter1, request ListResources from BBN InstaGENI, and Utah InstaGENI.
  2. Review ListResources output from both rack AMs.
  3. Define a request RSpec for three VMs at the BBN InstaGENI.
  4. Define a request RSpec for three VMs at the Utah InstaGENI.
  5. Create the first slice.
  6. Create a sliver in the first slice at each AM, using the RSpecs defined above.
  7. Download the Click Router software on the 4 nodes that will be routers in the experiment.
  8. Build the user-level Click program in the 'userlevel' directory on each of the 4 router nodes.
  9. Determine the interface to MAC address mapping, and various settings required for the 4 router node and modify click configuration.
  10. Run the user-level Click router by providing the configuration file name as an argument on each node.
  11. Log in to Host1 and send traffic to host2, leave traffic running.
  12. Review Click logs on each Click router.
  13. Stop traffic on Host2.
  14. Log in to Host1 and send traffic to host2, leave traffic running.
  15. Review Click logs on each Click router.
  16. Stop traffic on Host2.
  17. Delete sliver.

Additional Administration Acceptance Tests

These tests will be performed as needed after the administration baseline tests complete successfully. For example, the Software Update Test will be performed at least once when the rack team provides new software for testing. We expect these tests to be interspersed with other tests in this plan at times that are agreeable to the GPO and the participants, not just run in a block at the end of testing. The goal of these tests is to verify that sites have adequate documentation, procedures, and tools to satisfy all GENI site requirements.

IG-ADM-3: Full Rack Reboot Test

In this test, a full rack reboot is performed as a drill of a procedure which a site administrator may need to perform for site maintenance.

Note: this test must be run using the BBN rack because it requires physical access.

Procedure

  1. Review relevant rack documentation about shutdown options and make a plan for the order in which to shutdown each component.
  2. Cleanly shutdown and/or hard-power-off all devices in the rack, and verify that everything in the rack is powered down.
  3. Power on all devices, bring all logical components back online, and use monitoring and comprehensive health tests to verify that the rack is healthy again.

IG-ADM-4: Emergency Stop Test

In this test, an Emergency Stop drill is performed on a sliver in the rack.

Prerequisites

  • GMOC's updated Emergency Stop procedure is approved and published on a public wiki.
  • InstaGENI's procedure for performing a shutdown operation on any type of sliver in an InstaGENI rack is published on a public wiki or on a protected wiki that all InstaGENI site administrators (including GPO) can access.
  • An Emergency Stop test is scheduled at a convenient time for all participants and documented in GMOC ticket(s).
  • A test experiment is running that involves a slice with connections to at least one InstaGENI rack compute resource.

Procedure

  • A site administrator reviews the Emergency Stop and sliver shutdown procedures, and verifies that these two documents combined fully document the campus side of the Emergency Stop procedure.
  • A second administrator (or the GPO) submits an Emergency Stop request to GMOC, referencing activity from a public IP address assigned to a compute sliver in the rack that is part of the test experiment.
  • GMOC and the first site administrator perform an Emergency Stop drill in which the site administrator successfully shuts down the sliver in coordination with GMOC.
  • GMOC completes the Emergency Stop workflow, including updating/closing GMOC tickets.

IG-ADM-5: Software Update Test

In this test, we update software on the rack as a test of the software update procedure.

Prerequisites

Minor updates of system packages for all infrastructure OSes, InstaGENI local AM software, and FOAM are available to be installed on the rack. This test may need to be scheduled to take advantage of a time when these updates are available.

Procedure

  • A BBN site administrator reviews the procedure for performing software updates of GENI and non-GENI software on the rack. If there is a procedure for updating any version tracking documentation (e.g. a wiki page) or checking any version tracking tools, the administrator reviews that as well.
  • Following that procedure, the administrator performs minor software updates on rack components, including as many as possible of the following (depending on availability of updates):
    • At least one update of a standard (non-GENI) FreeBSD package on each of the boss, the ops VM, and the OpenVZ image running on an experimental node. (GPO will look for a package which has a security vulnerability listed in the portaudit database.)
    • At least one update of a standard (non-GENI) system package on the FOAM VM.
    • At least one update of a standard (non-GENI) system package on the server host OS.
    • An update of InstaGENI local AM software on boss and ops
    • An update of FOAM software
  • The admin confirms that the software updates completed successfully
  • The admin updates any appropriate version tracking documentation or runs appropriate tool checks indicated by the version tracking procedure.

IG-ADM-6: Control Network Disconnection Test

In this test, we disconnect parts of the rack control network or its dependencies to test partial rack functionality in an outage situation.

Note: this test must be performed on the BBN rack because GPO will modify configuration on the control plane router and switch upstream from the rack in order to perform the test.

Procedure

  • Simulate an outage of boss.emulab.net by inserting a firewall rule on the BBN router blocking the rack from reaching it. Verify that an administrator can still access the rack, that rack monitoring to GMOC continues through the outage, and that some experimenter operations still succeed.
  • Simulate an outage of each of the rack server host and control plane switch by disabling their respective interfaces on the BBN's control network switch. Verify that GPO, InstaGENI, and GMOC monitoring all see the outage.

IG-ADM-7: Documentation Review Test

Although this is not a single test per-se, this section lists required documents that the rack teams will write. Draft documents should be delivered prior to testing of the functional areas to which they apply. Final documents must be delivered before Spiral 4 site installations at non-developer sites. Final documents will be public, unless there is some specific reason a particular document cannot be public (e.g. a security concern from a GENI rack site).

Procedure

Review each required document listed below, and verify that:

  • The document has been provided in a public location (e.g. the GENI wiki, or any other public website)
  • The document contains the required information.
  • The documented information appears to be accurate.

Note: this tests only the documentation, not the rack behavior which is documented. Rack behavior related to any or all of these documents may be tested elsewhere in this plan.

Documents to review:

  • Pre-installation document that lists specific minimum requirements for all site-provided services for potential rack sites (e.g. space, number and type of power plugs, number and type of power circuits, cooling load, public addresses, NLR or Internet2 layer2 connections, etc.). This document should also list all standard expected rack interfaces (e.g. 10GBE links to at least one research network).
  • Summary GENI rack parts list, including vendor part numbers for "standard" equipment intended for all sites (e.g. a VM server) and per-site equipment options (e.g. transceivers, PDUs etc.), if any. This document should also indicate approximately how much headroom, if any, remains in the standard rack PDUs' power budget to support other equipment that sites may add to the rack.
  • Procedure for identifying the software versions and system file configurations running on a rack, and how to get information about recent changes to the rack software and configuration.
  • Explanation of how and when software and OS updates can be performed on a rack, including plans for notification and update if important security vulnerabilities in rack software are discovered.
  • Description of the GENI software running on a standard rack, and explanation of how to get access to the source code of each piece of standard GENI software.
  • Description of all the GENI experimental resources within the rack, and what policy options exist for each, including: how to configure rack nodes as bare metal vs. VM server, what options exist for configuring automated approval of compute and network resource requests and how to set them, how to configure rack aggregates to trust additional GENI slice authorities, and whether it is possible to trust local users within the rack.
  • Description of the expected state of all the GENI experimental resources in the rack, including how to determine the state of an experimental resource and what state is expected for an unallocated bare metal node.
  • Procedure for creating new site administrator and operator accounts.
  • Procedure for changing IP addresses for all rack components.
  • Procedure for cleanly shutting down an entire rack in case of a scheduled site outage.
  • Procedure for performing a shutdown operation on any type of sliver on a rack, in support of an Emergency Stop request.
  • Procedure for performing comprehensive health checks for a rack (or, if those health checks are being run automatically, how to view the current/recent results).
  • Technical plan for handing off primary rack operations to site operators at all sites.
  • Per-site documentation. This documentation should be prepared before sites are installed and kept updated after installation to reflect any changes or upgrades after delivery. Text, network diagrams, wiring diagrams and labeled photos are all acceptable for site documents. Per-site documentation should include the following items for each site:
    1. Part numbers and quantities of PDUs, with NEMA input power connector types, and an inventory of which equipment connects to which PDU.
    2. Physical network interfaces for each control and data plane port that connects to the site's existing network(s), including type, part numbers, maximum speed etc. (eg. 10-GB-SR fiber)
    3. Public IP addresses allocated to the rack, including: number of distinct IP ranges and size of each range, hostname to IP mappings which should be placed in site DNS, whether the last-hop routers for public IP ranges subnets sit within the rack or elsewhere on the site, and what firewall configuration is desired for the control network.
    4. Data plane network connectivity and procedures for each rack, including core backbone connectivity and documentation, switch configuration options to set for compatibility with the L2 core, and the site and rack procedures for connecting non-rack-controlled VLANs and resources to the rack data plane. A network diagram is highly recommended (See existing OpenFlow meso-scale network diagrams on the GENI wiki for examples.)

Additional Monitoring Acceptance Tests

These tests will be performed as needed after the monitoring baseline tests complete successfully. For example, the GMOC data collection test will be performed during the InstaGENI Network Resources Acceptance test, where we already use the GMOC for meso-scale OpenFlow monitoring. We expect these tests to be interspersed with other tests in this plan at times that are agreeable to the GPO and the participants, not just run in a block at the end of testing. The goal of these tests is to verify that sites have adequate tools to view and share GENI rack data that satisfies all GENI monitoring requirements.

IG-MON-4: Infrastructure Device Performance Test

This test verifies that the rack head node performs well enough to run all the services it needs to run.

Procedure

While experiments involving FOAM-controlled OpenFlow slivers and compute slivers are running:

  • View OpenFlow control monitoring at GMOC and verify that no monitoring data is missing
  • View VLAN 1750 data plane monitoring, which pings the rack's interface on VLAN 1750, and verify that packets are not being dropped
  • Verify that the CPU idle percentage on the server host and the FOAM VM are both nonzero.

IG-MON-5: GMOC Data Collection Test

This test verifies the rack's submission of monitoring data to GMOC.

Note: This test relies on a GMOC API and data definitions which are under development. Availability of the appropriate API and definitions is a prerequisite for submitting each type of data. We expect InstaGENI will be able to submit a small number of operational data items by GEC14.

Procedure

View the dataset collected at GMOC for the BBN and Utah InstaGENI racks. For each piece of required data, attempt to verify that:

  • The data is being collected and accepted by GMOC and can be viewed at gmoc-db.grnoc.iu.edu
  • The data's "site" tag indicates that it is being reported for the rack located at the gpolab or Utah site (as appropriate for that rack).
  • The data has been reported within the past 10 minutes.
  • For each piece of data, either verify that it is being collected at least once a minute, or verify that it requires more complicated processing than a simple file read to collect, and thus can be collected less often.

Verify that the following pieces of data are being reported:

  • Is each of the rack InstaGENI and FOAM AMs reachable via the GENI AM API right now?
  • Is each compute or unbound VLAN resource at each rack AM online? Is it available or in use?
  • Sliver count and percentage of rack compute and unbound VLAN resources in use.
  • Identities of current slivers on each rack AM, including creation time for each.
  • Per-sliver interface counters for compute and VLAN resources (where these values can be easily collected).
  • Is the rack data plane switch online?
  • Interface counters and VLAN memberships for each rack data plane switch interface
  • MAC address table contents for shared VLANs which appear on rack data plane switches
  • Is each rack experimental node online?
  • For each rack experimental node configured as an OpenVZ VM server, overall CPU, disk, and memory utilization for the host, current VM count and total VM capacity of the host.
  • For each rack experimental node configured as an OpenVZ VM server, interface counters for each data plane interface.
  • Results of at least one end-to-end health check which simulates an experimenter reserving and using at least one resource in the rack.

Verify that per-rack or per-aggregate summaries are collected of the count of distinct users who have been active on the rack, either by providing raw sliver data containing sliver users to GMOC, or by collecting data locally and producing trending summaries on demand.

Test Methodology and Reporting

Test Case Execution

  1. All test procedure steps will be executed until there is a blocking issue.
  2. If a blocking issue is found for a test case, testing will be stopped for that test case.
  3. Testing focus will shift to another test case while waiting for a solution to a blocking issue.
  4. If a non-blocking issue is found, testing will continue toward completion of the procedure.
  5. When a software resolution or workaround is available for a blocking issue, the test impacted by the issue is re-executed until it can be completed successfully.
  6. Supporting documentation will be used whenever available.
  7. Questions that were not answered by existing documentation are to be gathered during the acceptance testing and published, as we did for the rack design.

Issue Tracking

  1. All issues discovered in acceptance testing regardless of priority are to be tracked in a bug tracking system.
  2. The bug tracking system to be used is [insert_link_here]
  3. All types of issues encountered (documentation error, software bug, missing features, missing documentation, etc.) are to tracked.
  4. All unresolved issues will be reviewed and published at the end of the acceptance test as part of the acceptance test report.

Status Updates and Reporting

  1. A periodic status update will be generated, as the acceptance test plan is being executed.
  2. Periodic (once per-day) status update will be posted to the rack team mail list (InstaGENI-design@geni.net).
  3. Upon acceptance test completion, all findings and unresolved issue will be captured in an acceptance test report.
  4. Supporting configuration and RSpecs used in testing will be part of the acceptance test report.

Test Case Naming

The test case in this plan follow a naming convention that uses IG-XXX-Y where IG is InstaGENI and XXX may equal any of the following: ADM for Administrative or EXP for Experimenter or MON for Monitoring. The final component of the test case name is the Y, which is the test case number.

Requirements Validation

This acceptance test plan verified Integration (C), Monitoring (D), Experimenter (G) and Local Aggregate (F) requirements. As part of the test planing process, the GPO Infrastructure group mapped each of the GENI Racks Requirements to a set of validation criteria. For a detailed look at the validation criteria see the GENI Racks Acceptance Criteria page.

This plan does not validate any Software (B) requirements, as they are validated by the GPO Software team's GENI AM API Acceptance tests suite.

Some requirements are not verified in this test plan:

  • C.2.a "Support at least 100 simultaneous active (e.g. actually passing data) layer 2 Ethernet VLAN connections to the rack. For this purpose, VLAN paths must terminate on separate rack VMs, not on the rack switch."
    • Production Aggregate Requirements (E)

Glossary

Following is a glossary for terminology used in this plan, for additional terminology definition see the GENI Glossary page.

  • People:
    • Experimenter: A person accessing the rack using a GENI credential and the GENI AM API.
    • Administrator: A person who has fully-privileged access to, and responsibility for, the rack infrastructure (servers, network devices, etc) at a given location.
    • Operator: A person who has unprivileged/partially-privileged access to the rack infrastructure at a given location, and has responsibility for one or a few particular functions.
  • Baseline Monitoring: Set of monitoring functions which show aggregate health for VMs and switches and their interface status, traffic counts for interfaces and VLANs. Includes resource availability and utilization.
  • Experimental compute resources:
    • VM: An experimental compute resource which is a virtual machine located on a physical machine in the rack.
    • Bare metal Node: An experimental exclusive compute resource which is a physical machine usable by experimenters without virtualization.
    • Compute Resource: Either a VM or a bare metal node.
  • Experimental compute resource components:
    • logical interface: A network interface seen by a compute resource (e.g. a distinct listing in ifconfig output). May be provided by a physical interface, or by virtualization of an interface.
  • Experimental network resources:
    • VLAN: A data plane VLAN, which may or may not be OpenFlow-controlled.
    • Bound VLAN: A VLAN which an experimenter requests by specifying the desired VLAN ID. (If the aggregate is unable to provide access to that numbered VLAN or to another VLAN which is bridged to the numbered VLAN, the experimenter's request will fail.)
    • Unbound VLAN: A VLAN which an experimenter requests without specifying a VLAN ID. (The aggregate may provide any available VLAN to the experimenter.)
  • Exclusive VLAN: A VLAN which is provided for the exclusive use of one experimenter.
  • Shared VLAN: A VLAN which is shared among multiple experimenters.

We make the following assumptions about experimental network resources:

  • Unbound VLANs are always exclusive.
  • Bound VLANs may be either exclusive or shared, and this is determined on a per-VLAN basis and configured by operators.
  • Shared VLANs are always OpenFlow-controlled, with OpenFlow providing the slicing between experimenters who have access to the VLAN.
  • If a VLAN provides an end-to-end path between multiple aggregates or organizations, it is considered "shared" if it is shared anywhere along its length --- even if only one experimenter can access the VLAN at some particular aggregate or organization (for whatever reason), a VLAN which is shared anywhere along its L2 path is called "shared".

Email help@geni.net for GENI support or email me with feedback on this page!

Attachments (7)

Download all attachments as: .zip