33 | | This section describes each acceptance test by defining goals, topology used, and outlining test procedure. Test cases are listed by priority in sections below; the cases that verify the largest number of requirement criteria are typically listed at a higher priority. The baseline tests are executed first to verify that monitoring and administrative functions are available; this will allow the execution of the use-case-based test topologies. Additional monitoring and administrative tests are described in later sections which will be run before the completion of the acceptance test effort. |
34 | | |
35 | | == Administration Baseline Acceptance Test == |
36 | | |
37 | | Administrative Acceptance tests will verify support of administrative management tasks and focus on verifying priority functions for each of the rack components. The set of administrative features described in this section will be verified initially. Additional [wiki:GENIRacksHome/AcceptanceTests/ExogeniAcceptanceTestsPlan#AdditionalAdministrationAcceptanceTests administrative] test are described in a later section which will be executed before the acceptance test completion. |
38 | | |
39 | | === Prerequisite === |
40 | | - Administrative accounts are available for GPO staff on the GPO ExoGENI rack |
41 | | - Procedures or tools are available for changing IP addresses for all addressable rack components. |
42 | | |
43 | | === Procedure === |
44 | | |
45 | | Administrative tasks: |
46 | | 1. As Administrator execute root/admin commands, and create accounts for supported user types (admin, oper, user) |
47 | | 2. As Operator (just created) execute root/admin commands and create two user accounts. |
48 | | 3. As User (just created) make sure that the following are not allowed: admin/operator commands and administrative tools. |
49 | | 4. As Administrator, identify all software versions and view all configurations running on all GENI rack components |
50 | | 5. Verify that access to rack components with ssh password is not allowed, nor for via any unencrypted login protocol. |
51 | | 6. Verify ssh public-key access to all rack components that support ssh. |
52 | | 7. As Administrator delete user and operator accounts created in steps above. |
53 | | 8. Verify site administrator has local and remote console access to each rack component. |
54 | | 9. Verify remote power cycling for each rack component. |
55 | | 10. Verify ability to determine MAC address for all physical host interfaces, all network device interfaces, all active experimental VMs, and all recently-terminated experimental VMs. |
56 | | |
57 | | Validate bare-metal support using available procedures: |
58 | | 1. Use available administrative function to determine which nodes can be used as bare-metal. |
59 | | 2. Verify ability to reserve and boot a bare-metal running Microsoft Windows |
60 | | 3. Release bare-metal resource. |
61 | | 4. View a list of OS images which can be loaded on bare-metal nodes. |
62 | | 5. Modify resource allocation to add 1 additional bare-metal (2 total) for use in acceptance test. |
63 | | |
64 | | Validate Baseline Policy: |
65 | | 1. Verify ability for site administrator to set a policy for a Network Resource (not !OpenFlow) to allow anyone to access resource. |
66 | | 2. Verify ability for site administrator to set a policy for a Network Resource (not !OpenFlow) to ban a user from accessing resources. |
67 | | |
68 | | Validate Logging features: |
69 | | 1. Review log and reports generated during user creation tests to verify user counts reflect expectations. |
70 | | 2. Verify log rolling is available |
71 | | 3. Verify remote logging is available. |
72 | | 4. Verify slice and sliver (AM transactions) can be part of remote logging. |
73 | | 5. Verify remote logging is possible before roll over log deletion. |
74 | | 6. Verify logging show all account types (admin, oper, and user) loggin into the rack. |
75 | | 7. Verify logging captures sudo access. |
76 | | |
77 | | Validate Security: |
78 | | 1. Monitor for software updates and patches that occur as part of the testing by reviewing logging and configuration management. |
79 | | 2. Record turnaround time for !OpenFlow, Compute Resources and Network Resources updates. |
80 | | 3. Monitor vulnerability alerts from Rack Team. |
81 | | |
82 | | == Monitoring Baseline Acceptance Test == |
83 | | The Monitoring Baseline acceptance is the minimum monitoring required to observe rack components. Baseline monitoring is executed within each of the tests that are based on use cases. Additional [wiki:GENIRacksHome/AcceptanceTests/ExogeniAcceptanceTestsPlan#AdditionalMonitoringAcceptanceTests monitoring] tests are define in a later section to complete the validation in this section. |
84 | | |
85 | | |
86 | | === Prerequisite === |
87 | | - Access to Nagios statistics in the rack is available to GPO staff. |
88 | | |
89 | | === Procedure === |
90 | | Verify the following monitoring features are available by accessing Nagios or whatever source is gathering statistics. Note, that this is an initial approach and as evaluation progresses, evaluation will transition to reviewing this data via monitoring systems at GPO and/or at GMOC. |
91 | | |
92 | | 1. Aggregate availability via the AM API. |
93 | | 2. State of available resources (in use, available, down/unknown) |
94 | | 3. Overall sliver count |
95 | | 3. Resource utilization level on the aggregate |
96 | | 4. Status and utilization of each sliver active on the aggregate (minimum: sliver uptime, sliver resource utilization, performance data as available) |
97 | | 3. Network devices by reviewing device liveness. |
98 | | 5. Interface traffic counters (including types of traffic e.g. broadcast/multicast) |
99 | | 6. List of VLANs defined on interfaces, MAC address tables on data plane VLANs |
100 | | 7. At each available ExoGENI site, access the static (always-available) interface to verify Meso-scale connection. |
101 | | |
102 | | == ExoGENI Single Site Acceptance Test - Use Case 1 == |
103 | | |
104 | | This is a one site test run on the GPO ExoGENI rack, and it includes three experiments. Each experiment requests local compute resources which generate bidirectional traffic over a Layer 2 dataplane network connection. The experiment is executed in two parts: Part 1 sets up two concurrent experiments, and Part 2 sets up an experiment that validates compute resource limits. |
105 | | |
106 | | |
107 | | === Test Topology === |
| 39 | This section describes each acceptance test by defining its goals, topology, and outline test procedure. Test cases are listed by priority in sections below. The cases that verify the largest number of requirement criteria are typically listed at a higher priority. The prerequisite tests will be executed first to verify that baseline monitoring and administrative functions are available. This will allow the execution of the experimenter test cases. Additional monitoring and administrative tests described in later sections will also run before the completion of the acceptance test effort. |
| 40 | |
| 41 | == Administration Prerequisite Tests == |
| 42 | |
| 43 | Administrative Acceptance tests will verify support of administrative management tasks and focus on verifying priority functions for each of the rack components. The set of administrative features described in this section will be verified initially. Additional administrative tests are described in a later section which will be executed before the acceptance test completion. |
| 44 | |
| 45 | === EG-ADM-1: Rack Receipt and Inventory Test === |
| 46 | |
| 47 | This "test" uses BBN as an example site by verifying that we can do all the things we need to do to integrate the rack into our standard local procedures for systems we host. |
| 48 | |
| 49 | ==== Procedure ==== |
| 50 | |
| 51 | * ExoGENI and GPO power and wire the BBN rack |
| 52 | * GPO configures the exogeni.gpolab.bbn.com DNS namespace and 192.1.242.0/25 IP space, and enters all public IP addresses for the BBN rack into DNS. |
| 53 | * GPO requests and receives administrator accounts on the rack and read access to ExoGENI Nagios for GPO sysadmins. |
| 54 | * GPO inventories the physical rack contents, network connections and VLAN configuration, and power connectivity, using our standard operational inventories. |
| 55 | * GPO, ExoGENI, and GMOC share information about contact information and change control procedures, and ExoGENI operators subscribe to GENI operations mailing lists and submit their contact information to GMOC. |
| 56 | |
| 57 | === EG-ADM-2: Rack Administrator Access Test === |
| 58 | |
| 59 | This test verifies local and remote administrative access to rack devices. |
| 60 | |
| 61 | ==== Procedure ==== |
| 62 | |
| 63 | 1. For each type of rack infrastructure node, including the head node and a worker node configured for !OpenStack, use a site administrator account to test: |
| 64 | * Login to the node using public-key SSH. |
| 65 | * Verify that you cannot login to the node using password-based SSH, nor via any unencrypted login protocol. |
| 66 | * When logged in, run a command via sudo to verify root privileges. |
| 67 | 2. For each rack infrastructure device (switches, remote PDUs if any), use a site administrator account to test: |
| 68 | * Login via SSH. |
| 69 | * Login via a serial console (if the device has one). |
| 70 | * Verify that you cannot login to the device via an unencrypted login protocol. |
| 71 | * Use the "enable" command or equivalent to verify privileged access. |
| 72 | 3. Test that IMM (the ExoGENI remote console solution for rack hosts) can be used to access the consoles of the head node and a worker node: |
| 73 | * Login via SSH or other encrypted protocol. |
| 74 | * Verify that you cannot login via an unencrypted login protocol. |
| 75 | |
| 76 | == Monitoring/Rack Inspection Prerequisite Tests == |
| 77 | |
| 78 | These tests verify that GPO can locate and access information which may be needed to determine rack state and debug problems during experimental testing, and also verify by observation that rack components are ready to be tested. Additional monitoring tests are defined in a later section to complete the validation in this section. |
| 79 | |
| 80 | === EG-MON-1: Control Network Software and VLAN Inspection Test === |
| 81 | |
| 82 | This test inspects the state of the rack control network, infrastructure nodes, and system software. |
| 83 | |
| 84 | ==== Procedure ==== |
| 85 | |
| 86 | * A site administrator enumerates processes on each of the head node and an !OpenStack worker node which listen for network connections from other nodes, identifies what version of what software package is in use for each, and verifies that we know the source of each piece of software and could get access to its source code. |
| 87 | * A site administrator reviews the configuration of the rack management switch and verifies that each worker node's control interfaces are on the expected VLANs for that worker node's function (!OpenStack or bare metal). |
| 88 | * A site administrator reviews the MAC address table on the management switch, and verifies that all entries are identifiable and expected. |
| 89 | |
| 90 | === EG-MON-2: GENI Software Configuration Inspection Test === |
| 91 | |
| 92 | This test inspects the state of the GENI AM software in use on the rack. |
| 93 | |
| 94 | ==== Procedure ==== |
| 95 | |
| 96 | * A site administrator uses available system data sources (process listings, monitoring output, system logs, etc) and/or AM administrative interfaces to determine the configuration of ExoGENI resources: |
| 97 | * How many VMs are assigned to each of the BBN rack SM and the global ExoSM |
| 98 | * How many bare metal nodes are configured on the rack and whether they are controlled by the BBN rack SM or by ExoSM. |
| 99 | * How many unbound VLANs are in the rack's available pool and whether they are controlled by the BBN rack SM or by ExoSM. |
| 100 | * Whether the BBN ExoGENI AM, the RENCI ExoGENI AM, and ExoSM trust the pgeni.gpolab.bbn.com slice authority, which will be used for testing. |
| 101 | * A site administrator uses available system data sources to determine the configuration of !OpenFlow resources according to FOAM, ExoGENI, and !FlowVisor. |
| 102 | |
| 103 | === EG-MON-3: GENI Active Experiment Inspection Test === |
| 104 | |
| 105 | This test inspects the state of the rack data plane and control networks when experiments are running, and verifies that a site administrator can find information about running experiments. |
| 106 | |
| 107 | ==== Procedure ==== |
| 108 | |
| 109 | * An experimenter from the GPO starts up experiments to ensure there is data to look at: |
| 110 | * An experimenter runs an experiment containing at least one rack VM, and terminates it. |
| 111 | * An experimenter runs an experiment containing at least one rack VM, and leaves it running. |
| 112 | * A site administrator uses available system and experiment data sources to determine current experimental state, including: |
| 113 | * How many VMs are running and which experimenters own them |
| 114 | * How many VMs were terminated within the past day, and which experimenters owned them |
| 115 | * What !OpenFlow controllers the data plane switch, the rack !FlowVisor, and the rack FOAM are communicating with |
| 116 | * A site administrator examines the switches and other rack data sources, and determines: |
| 117 | * What MAC addresses are currently visible on the data plane switch and what experiments do they belong to? |
| 118 | * For some experiment which was terminated within the past day, what data plane and control MAC and IP addresses did the experiment use? |
| 119 | * For some experimental data path which is actively sending traffic on the data plane switch, do changes in interface counters show approximately the expected amount of traffic into and out of the switch? |
| 120 | |
| 121 | == Experimenter Acceptance Tests == |
| 122 | |
| 123 | === EG-EXP-1: Bare Metal Support Acceptance Test === |
| 124 | |
| 125 | Bare metal nodes are exclusive nodes that are used throughout the experimenter test cases. This section outlines features to be verified which are not explicitly validated in other scenarios: |
| 126 | |
| 127 | 1. Determine which nodes can be used as bare metal, aka exclusive node. |
| 128 | 2. Obtain a list of OS images which can be loaded on bare metal nodes from the ExoGENI team. (List should be based on successful bare metal loads by ExoGENI team or others in GENI community and should be available on a public web page.) |
| 129 | 3. Obtain 2 licensed recent Microsoft OS images for bare metal nodes from the site (BBN). |
| 130 | 4. Reserve and boot a bare metal node using Microsoft image. |
| 131 | 5. Obtain a recent Linux OS image for bare metal nodes from the ExoGENI team's successful test list. |
| 132 | 6. Reserve and boot a bare metal node using this Linux OS image. |
| 133 | 7. Release bare metal resource. |
| 134 | 8. Modify Aggregate resource allocation for the rack to add 1 additional bare metal node (2 total) for use in experimenter test cases. |
| 135 | |
| 136 | === EG-EXP-2: ExoGENI Single Site Acceptance Test === |
| 137 | |
| 138 | This one site test run on the BBN ExoGENI rack includes two experiments. Each experiment requests local compute resources, which generate bidirectional traffic over a Layer 2 data plane network connection. The goal of this test is to verify basic operations of VMs and data flows within one rack. |
| 139 | |
| 140 | |
| 141 | ==== Test Topology ==== |
200 | | 19. Delete slivers. |
201 | | |
202 | | Note: After initial successful test run, this test will be revisited and will be re-run as a longevity test for a minimum of 24 hours. |
203 | | |
204 | | == ExoGENI Multi-site !OpenFlow Acceptance Test - Use Case 5 == |
205 | | |
206 | | This test includes two sites and two experiments, using resources in the GPO and RENCI ExoGENI racks, where the network resources are the core !OpenFlow-controlled VLANs. Each of the compute resources will exchange traffic with the others in its slice, over a wide-area Layer 2 dataplane network connection, using NLR and/or Internet2 VLANs. |
207 | | |
208 | | |
209 | | === Test Topology === |
| 266 | 19. Stop traffic and delete slivers. |
| 267 | |
| 268 | Note: After a successful test run, this test will be revisited and the procedure will be re-executed as a longevity test for a minimum of 24 hours rather than 4 hours. |
| 269 | |
| 270 | === EG-EXP-5: ExoGENI OpenFlow Network Resources Acceptance Test === |
| 271 | |
| 272 | This is a two site experiment that uses ExoGENI !OpenFlow network resources (no ExoGENI compute resources) and two non-ExoGENI compute resources. The experiment will use the ExoGENI FOAM to configure the ExoGENI site !OpenFlow switch. The goal of this test is to verify !OpenFlow operations and integration with meso-scale compute resources and other compute resources external to the ExoGENI rack. |
| 273 | |
| 274 | ==== Test Topology ==== |
| 275 | |
| 276 | [[Image(ExoGENIOFNetworkResourceAcceptanceTest.jpg)]] |
| 277 | |
| 278 | ==== Prerequisites ==== |
| 279 | |
| 280 | - A GPO site network is connected to the ExoGENI !OpenFlow switch |
| 281 | - ExoGENI FOAM server is running and can manage the ExoGENI !OpenFlow switch |
| 282 | - Two meso-scale remote sites make compute resources and !OpenFlow meso-scale resources available for this test |
| 283 | - This test is scheduled at a time when site contacts are available to address any problems. |
| 284 | - All sites'!OpenFlow VLANs are implemented and recorded on the GENI meso-scale network pages for this test. (Current example is meso-scale VLAN 1750) |
| 285 | - GMOC data collection for the meso-scale and ExoGENI rack resources is functioning for the !OpenFlow and traffic measurements required in this test. (This requires only a subset of the Additional Monitoring GMOC Data Collection tests to be successful.) |
| 286 | |
| 287 | ==== Procedure ==== |
| 288 | The following operations are to be executed: |
| 289 | 1. As Experimenter1, Determine GPO compute resources and define RSpec. |
| 290 | 2. Determine remote meso-scale compute resources and define RSpec. |
| 291 | 3. Define a request RSpec for OF network resources at the BBN ExoGENI FOAM. |
| 292 | 4. Define a request RSpec for OF network resources at the remote I2 meso-scale site. |
| 293 | 5. Define a request RSpec for the !OpenFlow Core resources |
| 294 | 6. Create the first slice |
| 295 | 7. Create a sliver for the GPO compute resources. |
| 296 | 8. Create a sliver at the I2 meso-scale site using FOAM at site. |
| 297 | 9. Create a sliver at of the BBN ExoGENI FOAM Aggregate. |
| 298 | 10. Create a sliver for the !OpenFlow resources in the core. |
| 299 | 11. Create a sliver for the meso-scale compute resources. |
| 300 | 11. Log in to each of the compute resources and send traffic to the other end-point. |
| 301 | 12. Verify that traffic is delivered to target. |
| 302 | 13. Review baseline, GMOC, and meso-scale monitoring statistics. |
| 303 | 14. As Experimenter2, determine GPO compute resources and define RSpec. |
| 304 | 15. Determine remote meso-scale compute resources and define RSpec. |
| 305 | 16. Define a request RSpec for OF network resources at the BBN ExoGENI FOAM. |
| 306 | 17. Define a request RSpec for OF network resources at the remote NLR meso-scale site. |
| 307 | 18. Define a request RSpec for the !OpenFlow Core resources |
| 308 | 19. Create the second slice |
| 309 | 20. Create a sliver for the GPO compute resources. |
| 310 | 21. Create a sliver at the meso-scale site using FOAM at site. |
| 311 | 22. Create a sliver at of the BBN ExoGENI FOAM Aggregate. |
| 312 | 23. Create a sliver for the !OpenFlow resources in the core. |
| 313 | 24. Create a sliver for the meso-scale compute resources. |
| 314 | 25. Log in to each of the compute resources and send traffic to the other endpoint. |
| 315 | 26. As Experimenter2, insert flowmods and send packet-outs only for traffic assigned to the slivers. |
| 316 | 27. Verify that traffic is delivered to target according to the flowmods settings. |
| 317 | 28. Review baseline, GMOC, and monitoring statistics. |
| 318 | 29. Stop traffic and delete slivers. |
| 319 | |
| 320 | === EG-EXP-6: ExoGENI and Meso-scale Multi-site !OpenFlow Acceptance Test === |
| 321 | |
| 322 | This test includes three sites and three experiments, using resources in the GPO and RENCI ExoGENI racks as well as meso-scale resources, where the network resources are the core !OpenFlow-controlled VLANs. Each of the compute resources will exchange traffic with the others in its slice, over a wide-area Layer 2 data plane network connection, using NLR and Internet2 VLANs. In particular, the following slices will be set up for this test: |
| 323 | * Slice 1: One ExoGENI VM at each of BBN and RENCI. |
| 324 | * Slice 2: One bare metal ExoGENI node and one ExoGENI VM at BBN; two ExoGENI VMs at RENCI. |
| 325 | * Slice 3: An ExoGENI VM at BBN, a PG node at BBN, and a meso-scale Wide-Area ProtoGENI (WAPG) node. |
| 326 | |
| 327 | The goal of this test is to verify ExoGENI rack interoperability with other meso-scale GENI sites. |
| 328 | |
| 329 | ==== Test Topology ==== |
236 | | 5. Define request RSpecs for !OpenFlow resources from NLR and/or Internet2. |
237 | | 6. Create the first slice. |
238 | | 7. Create a sliver in the first slice at each AM, using the RSpecs defined above. |
239 | | 8. Log in to each of the systems, and send traffic to the other system, leave traffic running. |
240 | | 9. Define a request RSpec for two VMs at GPO ExoGENI. |
241 | | 10. Define a request RSpec for two VMs at RENCI ExoGENI. |
242 | | 11. Create a second slice. |
243 | | 12. Create a sliver in the second slice at each AM, using the RSpecs defined above. |
244 | | 13. Log in to each of the systems in the slice, and send traffic to each other systems; leave traffic running |
245 | | 14. Request !ListResources from GPO ExoGENI and FOAM at Meso-scale site. |
246 | | 15. Review !ListResources output from all AMs. |
247 | | 16. Define a request RSpec for a VM at the GPO ExoGENI. |
248 | | 17. Define a request RSpec for a compute resource at a Meso-scale site. |
249 | | 18. Define request RSpecs for !OpenFlow resources to allow connection from OF GPO ExoGENI to Meso-scale OF site. |
250 | | 19. Create a third slice |
251 | | 20. Create a sliver that connects the Meso-scale !OpenFlow site to the GPO ExoGENI Site. |
252 | | 21. Log in to each of the systems in the slice, and send traffic to each other systems; leave traffic running |
253 | | 22. Verify that all three experiment continue to run without impacting traffic, and that data is exchanged over the !OpenFlow datapath defined in RSpec. |
254 | | 23. Review baseline monitoring statistics and checks |
255 | | 24. As admin, verify ability to map all controllers associated to the !OpenFlow switch. |
256 | | 25. Run test for at least 4 hours. |
257 | | 26. Review monitoring statistics and checks as above. |
258 | | 27. Delete all slivers at each ExoGENI AMs and at FOAM AMs. |
| 361 | 5. Define request RSpecs for !OpenFlow resources from BBN FOAM to access GENI !OpenFlow core resources. |
| 362 | 6. Define request RSpecs for !OpenFlow core resources at NLR FOAM. |
| 363 | 7. Create the first slice. |
| 364 | 8. Create a sliver in the first slice at each AM, using the RSpecs defined above. |
| 365 | 9. Log in to each of the systems, verify IP address assignment. Send traffic to the other system, leave traffic running. |
| 366 | 10. As Experimenter2, define a request RSpec for one VM and one bare metal node at BBN ExoGENI. |
| 367 | 11. Define a request RSpec for two VMs on the same worker node at RENCI ExoGENI. |
| 368 | 12. Define request RSpecs for !OpenFlow resources from GPO FOAM to access GENI !OpenFlow core resources. |
| 369 | 13. Define request RSpecs for !OpenFlow core resources at NLR FOAM. |
| 370 | 14. Create a second slice. |
| 371 | 15. Create a sliver in the second slice at each AM, using the RSpecs defined above. |
| 372 | 16. Log in to each of the systems in the slice, and send traffic to each other systems; leave traffic running |
| 373 | 17. As Experimenter3, request !ListResources from BBN ExoGENI, GPO FOAM, and FOAM at Meso-scale Site (NLR Site GPO and Internet2 site TBD). |
| 374 | 18. Review !ListResources output from all AMs. |
| 375 | 19. Define a request RSpec for a VM at the BBN ExoGENI. |
| 376 | 20. Define a request RSpec for a compute resource at the GPO Meso-scale site. |
| 377 | 21. Define a request RSpec for a compute resource at a Meso-scale site. |
| 378 | 22. Define request RSpecs for !OpenFlow resources to allow connection from OF BBN ExoGENI to Meso-scale OF sites(GPO and second site TBD) (NLR and I2). |
| 379 | 23. Create a third slice. |
| 380 | 24. Create a sliver that connects the Internet2 Meso-scale !OpenFlow site to the BBN ExoGENI Site, and the GPO Meso-scale site. |
| 381 | 25. Log in to each of the compute resources in the slice, configure data plane network interfaces on any non-ExoGENI resources as necessary, and send traffic to each other systems; leave traffic running. |
| 382 | 26. Verify that all three experiments continue to run without impacting each other's traffic, and that data is exchanged over the path along which data is supposed to flow. |
| 383 | 27. Review baseline, GMOC, and monitoring statistics. |
| 384 | 28. As site administrator, identify all controllers that the BBN ExoGENI !OpenFlow switch is connected to |
| 385 | 29. As Experimenter3, verify that traffic only flows on the network resources assigned to slivers as specified by the controller. |
| 386 | 30. Verify that no default controller, switch fail-open behavior, or other resource other than experimenters' controllers, can control how traffic flows on network resources assigned to experimenters' slivers. |
| 387 | 31. Set the hard and soft timeout of flowtable entries |
| 388 | 32. Get switch statistics and flowtable entries for slivers from the !OpenFlow switch. |
| 389 | 33. Get layer 2 topology information about slivers in each slice. |
| 390 | 34. Install flows that match on layer 2 fields and/or layer 3 fields. |
| 391 | 35. Run test for at least 4 hours. |
| 392 | 36. Review monitoring statistics and checks as above. |
| 393 | 47. Stop traffic and delete slivers. |
| 394 | |
| 395 | Note: If Utah PG access to !OpenFlow is available when this test is executed, a PG node will be added to the third slice. This node is not shown in the diagram above. (This is an optional part of the test.) |
| 396 | |
| 397 | Documentation check: |
| 398 | 1. Verify public access to documentation about which !OpenFlow match actions can be performed in hardware for the ExoGENI switches. |
| 399 | |
| 400 | == Additional Administration Acceptance Tests == |
| 401 | |
| 402 | These tests will be performed as needed after the administration baseline tests complete successfully. For example, the Software Update Test will be performed at least once when the rack team provides new software for testing. We expect these tests to be interspersed with other tests in this plan at times that are agreeable to the GPO and the participants, not just run in a block at the end of testing. The goal of these tests is to verify that sites have adequate documentation, procedures, and tools to satisfy all GENI site requirements. |
| 403 | |
| 404 | === EG-ADM-3: Full Rack Reboot Test === |
| 405 | |
| 406 | In this test, a full rack reboot is performed as a drill of a procedure which a site administrator may need to perform for site maintenance. |
| 407 | |
| 408 | ==== Procedure ==== |
| 409 | |
| 410 | 1. Review relevant rack documentation about shutdown options and make a plan for the order in which to shutdown each component. |
| 411 | 2. Cleanly shutdown and/or hard-power-off all devices in the rack, and verify that everything in the rack is powered down. |
| 412 | 3. Power on all devices, bring all logical components back online, and use monitoring and comprehensive health tests to verify that the rack is healthy again. |
| 413 | |
| 414 | === EG-ADM-4: Emergency Stop Test === |
| 415 | |
| 416 | In this test, an Emergency Stop drill is performed on a sliver in the rack. |
| 417 | |
| 418 | ==== Prerequisites ==== |
| 419 | |
| 420 | * GMOC's updated Emergency Stop procedure is approved and published on a public wiki. |
| 421 | * ExoGENI's procedure for performing a shutdown operation on any type of sliver in an ExoGENI rack is published on a public wiki or on a protected wiki that all ExoGENI site administrators (including GPO) can access. |
| 422 | * An Emergency Stop test is scheduled at a convenient time for all participants and documented in GMOC ticket(s). |
| 423 | * A test experiment is running that involves a slice with connections to at least one ExoGENI rack compute resource. |
| 424 | |
| 425 | ==== Procedure ==== |
| 426 | |
| 427 | * A site administrator reviews the Emergency Stop and sliver shutdown procedures, and verifies that these two documents combined fully document the campus side of the Emergency Stop procedure. |
| 428 | * A second administrator (or the GPO) submits an Emergency Stop request to GMOC, referencing activity from a public IP address assigned to a compute sliver in the rack that is part of the test experiment. |
| 429 | * GMOC and the first site administrator perform an Emergency Stop drill in which the site administrator successfully shuts down the sliver in coordination with GMOC. |
| 430 | * GMOC completes the Emergency Stop workflow, including updating/closing GMOC tickets. |
| 431 | |
| 432 | === EG-ADM-5: Software Update Test === |
| 433 | |
| 434 | In this test, we update software on the rack as a test of the software update procedure. |
| 435 | |
| 436 | ==== Prerequisites ==== |
| 437 | |
| 438 | Minor updates of system RPM packages, ExoGENI local AM software, and FOAM are available to be installed on the rack. This test may need to be scheduled to take advantage of a time when these updates are available. |
| 439 | |
| 440 | ==== Procedure ==== |
| 441 | |
| 442 | * A BBN site administrator reviews the procedure for performing software updates of GENI and non-GENI software on the rack. If there is a procedure for updating any version tracking documentation (e.g. a wiki page) or checking any version tracking tools, the administrator reviews that as well. |
| 443 | * Following that procedure, the administrator performs minor software updates on rack components, including as many as possible of the following (depending on availability of updates): |
| 444 | * At least one update of a standard (non-GENI) RPM package on each of the head node and a worker node |
| 445 | * An update of ExoGENI local AM software |
| 446 | * An update of FOAM software |
| 447 | * The admin confirms that the software updates completed successfully |
| 448 | * The admin updates any appropriate version tracking documentation or runs appropriate tool checks indicated by the version tracking procedure. |
| 449 | |
| 450 | === EG-ADM-6: Control Network Disconnection Test === |
| 451 | |
| 452 | In this test, we disconnect parts of the rack control network or its dependencies to test partial rack functionality in an outage situation. |
| 453 | |
| 454 | ==== Procedure ==== |
| 455 | |
| 456 | * Simulate an outage of geni.renci.org by inserting a firewall rule on the GPO router blocking the rack from reaching it. Verify that an administrator can still access the rack, that rack monitoring to GMOC continues through the outage, and that some experimenter operations still succeed. |
| 457 | * Simulate an outage of each of the rack head node and management switch by disabling their respective interfaces on the GPO's control network switch. Verify that GPO, ExoGENI, and GMOC monitoring all see the outage. |
| 458 | |
| 459 | === EG-ADM-7: Documentation Review Test === |
| 460 | |
| 461 | Although this is not a single test ''per-se'', this section lists required documents that the rack teams will write. Draft documents should be delivered prior to testing of the functional areas to which they apply. Final documents must be delivered before Spiral 4 site installations at non-developer sites. Final documents will be public, unless there is some specific reason a particular document cannot be public (e.g. a security concern from a GENI rack site). |
| 462 | |
| 463 | ==== Procedure ==== |
| 464 | |
| 465 | Review each required document listed below, and verify that: |
| 466 | * The document has been provided in a public location (e.g. the GENI wiki, or any other public website) |
| 467 | * The document contains the required information. |
| 468 | * The documented information appears to be accurate. |
| 469 | |
| 470 | ''Note: this tests only the documentation, not the rack behavior which is documented. Rack behavior related to any or all of these documents may be tested elsewhere in this plan.'' |
| 471 | |
| 472 | Documents to review: |
| 473 | * Pre-installation document that lists specific minimum requirements for all site-provided services for potential rack sites (e.g. space, number and type of power plugs, number and type of power circuits, cooling load, public addresses, NLR or Internet2 layer2 connections, etc.). This document should also list all standard expected rack interfaces (e.g. 10GBE links to at least one research network). |
| 474 | * Summary GENI Rack parts list, including vendor part numbers for "standard" equipment intended for all sites (e.g. a VM server) and per-site equipment options (e.g. transceivers, PDUs etc.), if any. This document should also indicate approximately how much headroom, if any, remains in the standard rack PDUs' power budget to support other equipment that sites may add to the rack. |
| 475 | * Procedure for identifying the software versions and system file configurations running on a rack, and how to get information about recent changes to the rack software and configuration. |
| 476 | * Explanation of how and when software and OS updates can be performed on a rack, including plans for notification and update if important security vulnerabilities in rack software are discovered. |
| 477 | * Description of the GENI software running on a standard rack, and explanation of how to get access to the source code of each piece of standard GENI software. |
| 478 | * Description of all the GENI experimental resources within the rack, and what policy options exist for each, including: how to configure rack nodes as bare metal vs. VM server, what options exist for configuring automated approval of compute and network resource requests and how to set them, how to configure rack aggregates to trust additional GENI slice authorities, and whether it is possible to trust local users within the rack. |
| 479 | * Description of the expected state of all the GENI experimental resources in the rack, including how to determine the state of an experimental resource and what state is expected for an unallocated bare metal node. |
| 480 | * Procedure for creating new site administrator and operator accounts. |
| 481 | * Procedure for changing IP addresses for all rack components. |
| 482 | * Procedure for cleanly shutting down an entire rack in case of a scheduled site outage. |
| 483 | * Procedure for performing a shutdown operation on any type of sliver on a rack, in support of an Emergency Stop request. |
| 484 | * Procedure for performing comprehensive health checks for a rack (or, if those health checks are being run automatically, how to view the current/recent results). |
| 485 | * Technical plan for handing off primary rack operations to site operators at all sites. |
| 486 | * Per-site documentation. This documentation should be prepared before sites are installed and kept updated after installation to reflect any changes or upgrades after delivery. Text, network diagrams, wiring diagrams and labeled photos are all acceptable for site documents. Per-site documentation should include the following items for each site: |
| 487 | 1. Part numbers and quantities of PDUs, with NEMA input power connector types, and an inventory of which equipment connects to which PDU. |
| 488 | 2. Physical network interfaces for each control and data plane port that connects to the site's existing network(s), including type, part numbers, maximum speed etc. (eg. 10-GB-SR fiber) |
| 489 | 3. Public IP addresses allocated to the rack, including: number of distinct IP ranges and size of each range, hostname to IP mappings which should be placed in site DNS, whether the last-hop routers for public IP ranges subnets sit within the rack or elsewhere on the site, and what firewall configuration is desired for the control network. |
| 490 | 4. Data plane network connectivity and procedures for each rack, including core backbone connectivity and documentation, switch configuration options to set for compatibility with the L2 core, and the site and rack procedures for connecting non-rack-controlled VLANs and resources to the rack data plane. A network diagram is highly recommended (See existing !OpenFlow meso-scale network diagrams on the GENI wiki for examples.) |
| 491 | |
| 492 | == Additional Monitoring Acceptance Tests == |
| 493 | |
| 494 | These tests will be performed as needed after the monitoring baseline tests complete successfully. For example, the GMOC data collection test will be performed during the ExoGENI Network Resources Acceptance test, where we already use the GMOC for meso-scale !OpenFlow monitoring. We expect these tests to be interspersed with other tests in this plan at times that are agreeable to the GPO and the participants, not just run in a block at the end of testing. The goal of these tests is to verify that sites have adequate tools to view and share GENI rack data that satisfies all GENI monitoring requirements. |
| 495 | |
| 496 | === EG-MON-4: Infrastructure Device Performance Test === |
| 497 | |
| 498 | This test verifies that the rack head node performs well enough to run all the services it needs to run. |
| 499 | |
| 500 | ==== Procedure ==== |
| 501 | |
| 502 | While experiments involving FOAM-controlled !OpenFlow slivers and compute slivers are running: |
| 503 | * View !OpenFlow control monitoring at GMOC and verify that no monitoring data is missing |
| 504 | * View VLAN 1750 data plane monitoring, which pings the rack's interface on VLAN 1750, and verify that packets are not being dropped |
| 505 | * Verify that the CPU idle percentage on the head node is nonzero. |
| 506 | |
| 507 | === EG-MON-5: GMOC Data Collection Test === |
| 508 | |
| 509 | This test verifies the rack's submission of monitoring data to GMOC. |
| 510 | |
| 511 | ==== Procedure ==== |
| 512 | |
| 513 | View the dataset collected at GMOC for the BBN and RENCI ExoGENI racks. For each piece of required data, attempt to verify that: |
| 514 | * The data is being collected and accepted by GMOC and can be viewed at gmoc-db.grnoc.iu.edu |
| 515 | * The data's "site" tag indicates that it is being reported for the rack located at the `gpolab` or RENCI site (as appropriate for that rack). |
| 516 | * The data has been reported within the past 10 minutes. |
| 517 | * For each piece of data, either verify that it is being collected at least once a minute, or verify that it requires more complicated processing than a simple file read to collect, and thus can be collected less often. |
| 518 | |
| 519 | Verify that the following pieces of data are being reported: |
| 520 | * Is each of the rack ExoGENI and FOAM AMs reachable via the GENI AM API right now? |
| 521 | * Is each compute or unbound VLAN resource at each rack AM online? Is it available or in use? |
| 522 | * Sliver count and percentage of compute and unbound VLAN resources in use for the rack SM. |
| 523 | * Identities of current slivers on each rack AM, including creation time for each. |
| 524 | * Per-sliver interface counters for compute and VLAN resources (where these values can be easily collected). |
| 525 | * Is the rack data plane switch online? |
| 526 | * Interface counters and VLAN memberships for each rack data plane switch interface |
| 527 | * MAC address table contents for shared VLANs which appear on rack data plane switches |
| 528 | * Is each rack worker node online? |
| 529 | * For each rack worker node configured as an !OpenStack VM server, overall CPU, disk, and memory utilization for the host, current VM count and total VM capacity of the host. |
| 530 | * For each rack worker node configured as an !OpenStack VM server, interface counters for each data plane interface. |
| 531 | * Results of at least one end-to-end health check which simulates an experimenter reserving and using at least one resource in the rack. |
| 532 | |
| 533 | Verify that per-rack or per-aggregate summaries are collected of the count of distinct users who have been active on the rack, either by providing raw sliver data containing sliver users to GMOC, or by collecting data locally and producing trending summaries on demand. |
| 534 | |
| 535 | = Test Methodology and Reporting = |
| 536 | |
| 537 | == Test Case Execution == |
| 538 | 1. All test procedure steps will be executed until there is a blocking issue. |
| 539 | 2. If a blocking issue is found for a test case, testing will be stopped for that test case. |
| 540 | 3. Testing focus will shift to another test case while waiting for a solution to a blocking issue. |
| 541 | 4. If a non-blocking issue is found, testing will continue toward completion of the procedure. |
| 542 | 4. When a software resolution or workaround is available for a blocking issue, the test impacted by the issue is re-executed until it can be completed successfully. |
| 543 | 5. Supporting documentation will be used whenever available. |
| 544 | 6. Questions that are not answered by existing documentation will be gathered during the acceptance testing and published on the GENI wiki for followup. |
| 545 | |
| 546 | == Issue Tracking == |
| 547 | 1. All issues discovered in acceptance testing regardless of priority are to be tracked in a bug tracking system. |
| 548 | 2. ExoGENI rack team should propose a bug tracking system that the team, the GPO, and the GMOC can access to enter or respond to bugs. Viewing for GENI bugs should be public. (We will use the groups.geni.net ticket system if there is no better proposal from the ExoGENI team.) |
| 549 | 3. All types of issues encountered (documentation error, software bug, missing features, missing documentation, etc.) will be tracked. |
| 550 | 4. All unresolved issues will be reviewed and published at the end of the acceptance test as part of the acceptance test report. |
| 551 | 5. An initial response to a logged bug should happen within 24 hours (excepting weekend and holiday hours). An initial response does not require resolution, but should at minimum include a comment/question and indicate who is working on the bug. |
| 552 | |
| 553 | == Status Updates and Reporting == |
| 554 | 1. A periodic status update will be generated, as the acceptance test plan is being executed. |
| 555 | 2. Periodic (once per business day) status update will be posted to the rack team mail list (exogeni-design@geni.net). |
| 556 | 3. Upon acceptance test completion, all findings and unresolved issue will be captured in an acceptance test report. |
| 557 | 4. Supporting configuration and RSpecs used in testing will be part of the public acceptance test report, except where site security or privacy concerns require holding back that information. |
| 558 | |
| 559 | == Test Case Naming == |
| 560 | The test case in this plan follow a naming convention that uses ''EG-XXX-Y'' where ''EG'' is ExoGENI and ''XXX'' may equal any of the following: ''ADM'' for Administrative or ''EXP'' for Experimenter or ''MON'' for Monitoring. The final component of the test case name is the ''Y'', which is the test case number. |
| 561 | |
| 562 | = Requirements Validation = |
| 563 | |
| 564 | This acceptance test plan verifies Integration (C), Monitoring (D), Experimenter (G) and Local Aggregate (F) requirements. As part of the test planing process, the GPO Infrastructure group mapped each of the [http://groups.geni.net/geni/wiki/GeniRacks GENI Rack Requirements] to a set of validation criteria. For a detailed look at the validation criteria see the GENI Racks [wiki:GENIRacksProjects/AcceptanceTestsCriteria Acceptance Criteria] page. |
| 565 | |
| 566 | This plan does not validate any Software (B) requirements, as they are validated by the GPO Software team's [http://groups.geni.net/syseng/wiki/SwAcceptanceAMAPIv1 GENI API Acceptance tests] suite. |
| 567 | |
| 568 | Some requirements are not verified in this test plan: |
| 569 | * C.2.a "Support at least 100 simultaneous active (e.g. actually passing data) layer 2 Ethernet VLAN connections to the rack. For this purpose, VLAN paths must terminate on separate rack VMs, not on the rack switch." |
| 570 | * Production Aggregate Requirements (E) |
| 571 | |
| 572 | = Glossary = |
| 573 | |
| 574 | Following is a glossary for terminology used in this plan, for additional terminology definition see the [http://groups.geni.net/geni/wiki/GeniGlossary GENI Glossary] page. |
261 | | == ExoGENI and Meso-scale OpenFLow Interoperability Acceptance Test - Use Case 4 == |
262 | | |
263 | | A two site experiment including !OpenFlow resources and Compute Resources at the GPO ExoGENI rack and a Meso-scale site (TBD). Experiment will request a compute resource (VM) at each site to exchange bidirectional traffic. Requests be made for network resources at each FOAM Aggregate to allow traffic exchange over !OpenFlow VLANs. This is a low-priority test that will be executed if time permits. |
264 | | |
265 | | |
266 | | === Test Topology === |
267 | | |
268 | | [[Image(ExoGENIMesoscaleOpenFlowAcceptanceTest.jpg)]] |
269 | | |
270 | | === Prerequisites === |
271 | | - Baseline Monitoring is in place at each of the site to ensure that any potential problems are quickly identified. |
272 | | - GPO ExoGENI connectivity statistics will be monitored at the [http://monitor.gpolab.bbn.com/connectivity/exogeni.html GPO ExoGENI Monitoring] site. |
273 | | - GMOC is receiving monitoring data for ExoGENI Racks. |
274 | | - This test will be scheduled to ensure that resource available and site contact is available in case of potential problems. |
275 | | - VLANs to be used are from the pre-allocated pool of ExoGENI VLANs. |
276 | | |
277 | | === Procedure === |
278 | | The following operations are to be executed: |
279 | | 1. Request !ListResources at the GPO ExoGENI AM. |
280 | | 2. Request !ListResources at the GPO FOAM AM. |
281 | | 3. Request !ListResources at the Meso-scale AM. |
282 | | 4. Request !ListResources at the Meso-scale FOAM AM. |
283 | | 5. Review !ListResources output from all aggregates. |
284 | | 6. Define a request RSpec for one Compute Resource at the GPO ExoGENI AM. |
285 | | 7. Define a request RSpec for one Compute Resource at the Meso-scale AM. |
286 | | 8. Define a request RSpec for Network Resource at the GPO FOAM AM. |
287 | | 9. Define a request RSpec for Network Resource at the Meso-scale FOAM AM. |
288 | | 10. Create a slice |
289 | | 11. Create a sliver at each ExoGENI and Meso-scale AMs using the RSpecs defined above. |
290 | | 12. Create a sliver at each FOAM AM using the RSpecs defined above. |
291 | | 13. Log in to each of the Compute Resources and send traffic to the other Host. |
292 | | 14. Verify data is exchanged over !OpenFlow channel. |
293 | | 15. Delete sliver at each ExoGENI AMs and at FOAM AMs. |
294 | | 16. Delete slice (if supported). |
295 | | |
296 | | |
297 | | == Additional Administration Acceptance Tests == |
298 | | |
299 | | The following operations are to be executed: |
300 | | 1. Review tools or procedures for change IP for each of the rack component types. |
301 | | 2. Change IP addresses for each switch, VM, Bare Metal node, etc. |
302 | | 3. View source code for any software covered by the GENI Intellectual Property Agreement that runs on the rack. Should be able to determine location of such source code in their public site documentation from the Rack Development Team. |
303 | | |
304 | | == Additional Monitoring Acceptance Tests == |
305 | | |
306 | | Upon completion of the acceptance phase, the GPO will verify that all information available in the Baseline Monitoring Acceptance tests is available via the Operational monitoring portal at gmoc-db.grnoc.iu.edu. In addition to the baseline monitoring tests, the following features will be verified: |
307 | | |
308 | | 1. Operational monitoring data for the rack is available at gmoc-db.grnoc.iu.edu. |
309 | | 2. The rack data's "site" tag in the GMOC database indicates the physical location (e.g. host campus) of the rack. |
310 | | 3. Whenever the rack is operational, GMOC's database contains site data which is at most 10 minutes old. |
311 | | 4. Any site variable which can be collected by reading a counter (i.e. which does not require system or network processing beyond a file read) is collected at least once a minute. |
312 | | 5. All hosts which submit data to gmoc-db have system clocks which agree with gmoc-db's clock to within 45 seconds. (GMOC is responsible for ensuring that gmoc-db's own clock is synchronized to an accurate time source.) |
313 | | 6. The GMOC database contains data about whether each site AM has recently been reachable via the GENI AM API. |
314 | | 7. The GMOC database contains data about the recent uptime and availability of each compute or pool VLAN resource at each rack AM. |
315 | | 8. The GMOC database contains the sliver count and percentage of resources in use at each rack AM. |
316 | | 9. The GMOC database contains the creation time of each sliver on each rack AM. |
317 | | 10. If possible, the GMOC database contains per-sliver interface counters for each rack AM. |
318 | | 11. The GMOC database contains data about whether each rack dataplane switch has recently been online. |
319 | | 12. The GMOC database contains recent traffic counters and VLAN memberships for each rack dataplane switch interface. |
320 | | 13. The GMOC database contains recent MAC address table contents for static VLANs which appear on rack dataplane switches |
321 | | 14. The GMOC database contains data about whether each experimental VM server has recently been online. |
322 | | 15. The GMOC database contain overall CPU, disk, and memory utilization, and VM count and capacity, for each experimental VM server. |
323 | | 16. The GMOC database contains overall interface counters for experimental VM server dataplane interfaces. |
324 | | 17. The GMOC database contains recent results of at least one end-to-end health check which simulates an experimenter reserving and using at least one resource in the rack. |
325 | | 18. A site administrator can locate current and recent CPU and memory utilization for each rack network device, and can find recent changes or errors in a log. |
326 | | 19. A site administrator can locate current configuration of flowvisor, FOAM, and any other !OpenFlow services, and find logs of recent activity and changes. |
327 | | 20. For each infrastructure and experimental host, a site administrator can locate current and recent uptime, CPU, disk, and memory utilization, interface traffic counters, process counts, and active user counts. |
328 | | 21. A site administrator can locate recent syslogs for all infrastructure and experimental hosts. |
329 | | 22. A site administrator can locate information about the network reachability of all rack infrastructure which should live on the control network, and can get alerts when any rack infrastructure control IP becomes unavailable from the rack server host, or when the rack server host cannot reach the commodity internet. |
330 | | 23. A site administrator can get information about the power utilization of rack PDUs. |
331 | | 24. Given a public IP address and port, a pool VLAN, or a sliver name, a site administrator or GMOC staffer can identify the email address of the experimenter who controlled that resource at a particular time. |
332 | | 25. For trending purposes, per-rack or per-aggregate summaries are collected of the count of distinct users who have been active on a given rack. Racks may provide raw sliver/user data to GMOC, or may produce their own trending summaries on demand. |
333 | | 26. Meso-scale reachability testing can report on the recent liveness of the rack static VLANs by pinging a per-rack IP in each Meso-scale monitoring subnet. |
334 | | |
335 | | == Emergency Stop Acceptance test == |
336 | | |
337 | | === Prerequisite === |
| 577 | * Local Broker - An ORCA Broker provides the coordinating function needed to create slices. The rack's ORCA AM delegates a portion of the local resources to one or more brokers. Each rack has an ORCA AM that delegates resources to a local broker (for coordinating intra-rack resource allocations of compute resources and VLANs) and to the global broker. |
| 578 | |
| 579 | |
| 580 | * ORCA Actors - ExoGENI Site Authorities and Brokers which can communicate with each other. An actor requires ExoGENI Operations staff approval in order to start communications with other actors. |
| 581 | |
| 582 | * ORCA Actor Registry - A secure service that allows distributed ExoGENI ORCA Actors to recognize each other and create security associations in order for them to communicate. Runs at [https://geni.renci.org:12443/registry/actors.jsp Actor Registry] web page. All active ORCA Actors are listed in this page. |
| 583 | |
| 584 | |
| 585 | * ORCA Aggregate Manager (AM) - An ORCA resource provider that handles requests for resources via the ORCA SM and coordinates brokers to delegate resources. The ORCA Aggregate Manager is not the same as the GENI Aggregate Manager. |
| 586 | |
| 587 | * Site or Rack Service Manager (SM) - Exposes the ExoGENI Rack GENI AM API interface and the native XMLRPC interface to handles experimenter resource requests. The site SM receives requests from brokers (tickets) and redeems tickets with the ORCA AM. All Acceptance tests defined in this plan interact with a Service Manager (Site SM or the global ExoSM) using the GENI AM API interface via the Omni tool. |
| 588 | |
| 589 | * ExoSM - A global ExoGENI Service Manager that provides access to resources from multiple ExoGENI racks and intermediate network providers. The ExoSM supports GENI AM API interactions. |
| 590 | |
| 591 | * ORCA RSpec/NDL conversion service - A service running RENCI which is used by all ORCA SMs to conver RSPEC requests to NDL and NDL Manifests to RSpec. |
| 592 | |
| 593 | * People: |
| 594 | * Experimenter: A person accessing the rack using a GENI credential and the GENI AM API. |
| 595 | * Administrator: A person who has fully-privileged access to, and responsibility for, the rack infrastructure (servers, network devices, etc) at a given location. |
| 596 | * Operator: A person who has unprivileged/partially-privileged access to the rack infrastructure at a given location, and has responsibility for one or a few particular functions. |
| 597 | |
| 598 | * Baseline Monitoring: Set of monitoring functions which show aggregate health for VMs and switches and their interface status, and traffic counts for interfaces and VLANs. Baseline monitoring includes resource availability and utilization. |
| 599 | |
| 600 | * Experimental compute resources: |
| 601 | |
| 602 | * VM: An experimental compute resource which is a virtual machine located on a physical machine in the rack. |
| 603 | * Bare metal Node: An experimental exclusive compute resource which is a physical machine usable by experimenters without virtualization. |
| 604 | * Compute Resource: Either a VM or a bare metal node. |
| 605 | * Experimental compute resource components: |
| 606 | * logical interface: A network interface seen by a compute resource (e.g. a distinct listing in `ifconfig` output). May be provided by a physical interface, or by virtualization of an interface. |
| 607 | |
| 608 | * Experimental network resources: |
| 609 | * VLAN: A data plane VLAN, which may or may not be !OpenFlow-controlled. |
| 610 | * Bound VLAN: A VLAN which an experimenter requests by specifying the desired VLAN ID. (If the aggregate is unable to provide access to that numbered VLAN or to another VLAN which is bridged to the numbered VLAN, the experimenter's request will fail.) |
| 611 | * Unbound VLAN: A VLAN which an experimenter requests without specifying a VLAN ID. (The aggregate may provide any available VLAN to the experimenter.) |
| 612 | * Exclusive VLAN: A VLAN which is provided for the exclusive use of one experimenter. |
| 613 | * Shared VLAN: A VLAN which is shared among multiple experimenters. |
| 614 | |
| 615 | We make the following assumptions about experimental network resources for these tests: |
| 616 | * Unbound VLANs are always exclusive. |
| 617 | * Bound VLANs may be either exclusive or shared, and this is determined on a per-VLAN basis and configured by operators, including in some cases operators outside GENI. |
| 618 | * Shared VLANs are always !OpenFlow-controlled, with !OpenFlow providing the slicing between experimenters who have access to the VLAN. |
| 619 | * If a VLAN provides an end-to-end path between multiple aggregates or organizations, it is considered "shared" if it is shared anywhere along its length --- even if only one experimenter can access the VLAN at some particular aggregate or organization (for whatever reason), a VLAN which is shared anywhere along its L2 path is called "shared". |
339 | | - GMOC delivers updated Emergency Stop procedure document. [add_link_when_available] |
340 | | - GPO writes a generic Emergency Stop procedure, based on the updated GMOC procedure. [add_link_when_available] |
341 | | - Emergency stop coordination has taken place to make sure that all site are aware of their role and that all expected steps are documented. |
342 | | |
343 | | === Procedure === |
344 | | |
345 | | GPO executes the generic Emergency Stop procedure. |
346 | | |
347 | | |
348 | | = Requirements Validation = |
349 | | This section maps test cases to individual acceptance test criteria. It also documents the requirements that are not validated. |
350 | | |
351 | | == Validated Requirements Mappings == |
352 | | |
353 | | === Administration Acceptance Test Requirements Mapping === |
354 | | |
355 | | - (C.1.d) Ability to support Microsoft Windows on bare-metal node. |
356 | | - (C.3.a) Ability of Administrator account type to add/delete/modify 2 or more accounts. |
357 | | - (C.3.a) Ability of Operator account type to add/delete/modify for 2 or more accounts. |
358 | | - (C.3.a) Ability of User account type to be added/deleted/modified for 2 or more accounts. |
359 | | - (C.3.a) Ability of Administrator accounts have super user privileges on all rack node types (network, compute, misc.) |
360 | | - (C.3.a) Ability of Operator accounts have privileges on all rack node type (network, compute, misc.) to access to common operator functions such as: debug tools, emergency stop,<TBD>. |
361 | | - (C.3.a) User account types do no have access to Administrative functions. |
362 | | - (C.3.a) User accounts do not have access to Operator functions such as: debug tools, emergency stop, <TBD> |
363 | | - (C.3.a) Ability of Administrator, Operator, and User account types to use secure ssh Keys rather than password. |
364 | | - (C.3.b) Account access is not allowed for username/password control. |
365 | | - (C.3.b) Ability to support account access via ssh keys. |
366 | | - (C.3.c) Ability to remote power cycle rack components. |
367 | | - (C.3.e) Verify procedures and/or tools provided for changing IP addresses for all rack components types. |
368 | | - (C.3.f) Verify ability to remotely determine MAC addresses for all rack resources (including VMs) |
369 | | - (F.5) Ability of site support staff (and GENI operations) must be able to identify all software versions and view all configurations running on all GENI rack components once they are deployed. The rack users' experimental software running on the rack is exempt from this requirement. |
370 | | - (F.6) Ability of site support staff (and GENI operations) must be able to view source code for any software covered by the GENI Intellectual Property Agreement that runs on the rack. Rack teams should document the location of such source code in their public site documentation (e.g. on the GENI wiki). |
371 | | - (F.6) |
372 | | |
373 | | === Monitoring Acceptance Test Requirements Mapping === |
374 | | - (D.5.a) Ability to provide aggregate health via AM API. |
375 | | - (D.5.a) Ability to get resource types, counts, states and utilization. |
376 | | - (D.5.a) Ability to get the following for active slivers: uptime, resource utilization, performance data. |
377 | | - (D.5.b) Ability to get network device status, traffic counters, traffic type. |
378 | | - (D.5.b) Ability to get network device VLANs by interface, MAC address tables for VLANs. |
379 | | - (D.8) Each rack has always-available interface on Meso-scale VLAN. |
380 | | |
381 | | |
382 | | === ExoGENI Single Site Acceptance Test - Use Case 1 === |
383 | | |
384 | | * (C.1.a) Ability to operate the advertised minium number of hosts in a rack simultaneously in multiple experiments. |
385 | | * (C.1.b) Ability to support at least one bare-metal node using a supported Linux OS. |
386 | | * (C.1.c) Ability to support multiple VMs simultaneously in a single rack. |
387 | | * (C.1.c) Ability to support multiple VMs and bare-metal nodes simultaneously in a single rack. |
388 | | * (C.1.c) Ability to support multiple bare-metal nodes simultaneously in a single rack. |
389 | | * (C.3.b) Ability to support account access via SSH keys. |
390 | | * (D.5.c) Ability to monitor VM status for CPU, disk, memory utilization. |
391 | | * (D.5.c) Ability to monitor VM interface counters. |
392 | | * (D.6.b) Ability to monitor VMs: CPU, disk, and memory utilization interface traffic counters, uptime, process counts, and active user counts. |
393 | | * (D.6.b) Ability to monitor bare-metal nodes: CPU, disk, and memory utilization interface traffic counters, uptime, process counts, and active user counts. |
394 | | * (D.7) Ability of logging and reporting to capture active user counts per rack. |
395 | | * (G.1) Ability to get VMs with root/admin capabilities. |
396 | | * (G.1) Ability to get bare-metal nodes with root/admin capabilities. |
397 | | * (B.5) Support at least two different operating systems for compute resources |
398 | | * (B.5.a) Provide images for experimenters |
399 | | * (B.5.b) Advertise image availability in the advertisement RSpec |
400 | | |
401 | | === ExoGENI Multi-site Acceptance Test - Use Case 2 === |
402 | | |
403 | | * (C.1.a) Ability to isolate simultaneously used resources in an experiment. |
404 | | * (C.2.b) Ability to connect a single external VLAN to multiple VMs in the rack |
405 | | * (C.2.b) Ability to connect multiple external VLANs to multiple VMs in the rack |
406 | | * (C.2.c) Ability to simultaneously connect a single external VLAN to multiple VMs and bare-metal node. |
407 | | * (C.2.d) Ability to have unique addressing when multiple experiments are running. |
408 | | * (G.2) Ability to provision VM compute resources with multiple dataplane interfaces |
409 | | * (G.3) Ability to get layer two network access to send/receive traffic. |
410 | | * (G.4) Ability to run single experiments using single VMs accessing a single interface to access remote racks. |
411 | | * (G.4) Ability to run single experiments using multiple VMs accessing a single interface (one, several, and all VM at once) to access multiple remote racks. |
412 | | * (G.4) Ability to run multiple experiments using multiple VMs accessing a single interface (one, several, and all VM at once) to access multiple remote racks. |
413 | | * (G.4) Ability to handle traffic as expected across VLANs for each scenario. |
414 | | * (D.6.a) Ability to monitor network devices CPU and memory utilization. |
415 | | * (D.6.c) Ability to view status of power utilization. |
416 | | * (D.6.c) Ability to view control network reachability for all infrastructure devices (KVMs, PDUs, etc) |
417 | | * (D.6.c) Ability to view control network reachability for commodity internet |
418 | | * (D.6.c) Ability to view control network reachability for GENI data plane |
419 | | |
420 | | === ExoGENI Multi-site !OpenFlow Acceptance Test - Use Case 5 === |
421 | | |
422 | | * (D.6.a) Ability to monitor !OpenFlow configurations. |
423 | | * (D.6.a) Ability to monitor !OpenFlow Status. |
424 | | * (D.5.d) Ability to monitor !OpenFlow health checks that are minimally run hourly. |
425 | | * (C.2.f) Ability to run multiple !OpenFlow controllers to control multiple Network Resources in one rack. |
426 | | * (C.2.f) Ability to manage OF rack resources in a multi-site scenario. |
427 | | * (C.2.f) Ability of administrative functions to show which controllers are associated with the OF Switch. |
428 | | |
429 | | |
430 | | === Emergency Stop Acceptance test Requirements Mapping === |
431 | | |
432 | | - (C.3.d) Verify Emergency Stop participants and stakeholders are identified |
433 | | - (C.3.d) Verify Emergency Stop Triggers has been identified and implemented. |
434 | | - (C.3.d) Verify existence of GENI Operational Contact Lists, may differ by rack type? |
435 | | - (C.3.d) Verify Security implications for the emergency stop. |
436 | | - (C.3.d) Verify that a correlation exists between requests and aggregates. |
437 | | - (C.3.d) Verify that response time expectations are defined for severe and urgent cases |
438 | | - (C.3.d) Verify that an escalation path is defined for initial escalation and quarantine. |
439 | | - (C.3.d) Verify that response time expectations are defined for the issue reporter" |
440 | | |
441 | | |
442 | | == Requirements not verified == |
443 | | |
444 | | The following requirement is not verified in this plan. No plan exists for validating at this time, when a plan will be created it will be executed in the GPO Lab. |
445 | | |
446 | | - (C.2.a) "Support at least 100 simultaneous active (e.g. actually passing data) layer 2 Ethernet VLAN connections to the rack. For this purpose, VLAN paths must terminate on separate rack VMs, not on the rack switch." |
447 | | |
448 | | |
449 | | = Glossary = |
450 | | Following is a glossary for terminology used in this plan, for additional terminology definition see the [http://groups.geni.net/geni/wiki/GeniGlossary GENI Glossary] page. |
451 | | * Account type: |
452 | | * Experimenter: a person accessing the rack using a GENI credential and the GENI AM API |
453 | | * Administrator: a person who has fully-privileged access to, and responsibility for, the rack infrastructure (servers, network devices, etc) at a given location |
454 | | * Operator: a person who has unprivileged/partially-privileged access to the rack infrastructure at a given location, and has responsibility for one or a few particular functions. |
455 | | |
456 | | * Baseline Monitoring: Set of monitoring functions which show aggregate health for VMs and switches and their interface status, traffic counts for interfaces and VLANs. Includes resource availability and utilization. |
457 | | |
458 | | * Experimental compute resources: |
459 | | * VM: an experimental compute resource which is a virtual machine located on a physical machine in the rack. |
460 | | * bare-metal node: an experimental compute resource which is a physical machine usable by experimenters without virtualization. |
461 | | * compute resource: either a VM or a bare metal node. |
462 | | |
463 | | * Experimental network resources: |
464 | | * Static VLAN: The VLAN is provisioned entirely out of band. Admins set up the VLAN manually; experimenters must know the VLAN ID and request connection to it from the rack AM(s). |
465 | | * Pool VLAN: The VLAN is provisioned dynamically from a pool of manually pre-allocated VLANs. Admins set up the pool, and configure the VLAN IDs into the rack AM(s). Experimenters do not specify a VLAN ID in their requests. |
466 | | * Dynamic VLAN: The VLAN is provisioned dynamically everywhere that it exists. Admins don't do any out of band setup work; experimenters do not specify a VLAN ID in their requests. |
| 621 | {{{ |
| 622 | #!html |
| 623 | Email <a href="mailto:help@geni.net"> help@geni.net </a> for GENI support or email <a href="mailto:luisa.nevers@bbn.com">me</a> with feedback on this page! |
| 624 | }}} |