wiki:GENIRacksHome/ExogeniRacks/AcceptanceTestStatus/EG-ADM-4

Version 18 (modified by tupty@bbn.com, 11 years ago) (diff)

--

Detailed test plan for EG-ADM-4: Emergency Stop Test

This page is GPO's working page for performing EG-ADM-4. It is public for informational purposes, but it is not an official status report. See GENIRacksHome/ExogeniRacks/AcceptanceTestStatus for the current status of ExoGENI acceptance tests.

Last substantive edit of this page: 2013-02-26

Page format

  • The status chart summarizes the state of this test
  • The high-level description from test plan contains text copied exactly from the public test plan and acceptance criteria pages.
  • The steps contain things I will actually do/verify:
    • Steps may be composed of related substeps where I find this useful for clarity
    • Each step is identified as either "(prep)" or "(verify)":
      • Prep steps are just things we have to do. They're not tests of the rack, but are prerequisites for subsequent verification steps
      • Verify steps are steps in which we will actually look at rack output and make sure it is as expected. They contain a Using: block, which lists the steps to run the verification, and an Expect: block which lists what outcome is expected for the test to pass.

Status of test

See GENIRacksHome/ExogeniRacks/AcceptanceTestStatus for the meanings of test states.

Step State Date completed Open Tickets Closed Tickets/Comments
1 Color(green,Pass)? 2013-03-01 tupty has reviewed GMOC doc and EG doc, GPO doc has been posted
2 Color(green,Pass)? 2013-02-28 test has been scheduled, roles have been defined
3 Color(green,Pass)? 2013-03-06
4 Color(green,Pass)? 2013-03-06
5A Color(green,Pass)? 2013-03-06
5B Color(#98FB98,Pass: most criteria)? Document needs to address issues with verification that slice was deleted, and how to view which slivers are in which slice
5C Color(green,Pass)? 2013-03-06
6 Color(green,Pass)? 2013-03-06

High-level description from test plan

In this test, an ES (Emergency Stop) drill is performed on a sliver in the rack.

Procedure

  • A site administrator reviews the local site ES procedure, GMOC ES procedure, and sliver shut down procedure, and verifies that these documents combined fully document the campus side of the ES procedure.
  • A second administrator (or the GPO) submits an ES request to GMOC, referencing activity from a public IP address assigned to a compute sliver in the rack that is part of the test experiment.
  • GMOC and the first site administrator perform an ES drill in which the site administrator successfully shuts down the sliver in coordination with GMOC.
  • GMOC completes the ES workflow, including updating/closing GMOC tickets.

Criteria to verify as part of this test

  • VI.07. A public document explains the requirements that site administrators have to the GENI community, including how to join required mailing lists, how to keep their support contact information up-to-date, how and under what circumstances to work with Legal, Law Enforcement and Regulatory(LLR) Plan, how to best contact the rack vendor with operational problems, what information needs to be provided to GMOC to support emergency stop, and how to interact with GMOC when an Emergency Stop request is received. (F.3, C.3.d)
  • VI.17. A procedure is documented for performing a shut down operation on any type of sliver on the rack, in support of an Emergency Stop request. (C.3.d)
  • VII.18. Given a public IP address and port, an exclusive VLAN, a sliver name, or a piece of user-identifying information such as e-mail address or username, a site administrator or GMOC operator can identify the email address, username, and affiliation of the experimenter who controlled that resource at a particular time. (D.7)
  • VII.19. GMOC and a site administrator can perform a successful Emergency Stop drill in which slivers containing compute and OpenFlow-controlled network resources are shut down. (C.3.d)

Step 1 (prep): Site administrator reviews local site ES procedure, GMOC ES procedure, and ExoGENI sliver shut down procedure

The site administrator should review the local site ES procedure, the ES procedure provided by the GMOC, and the ExoGENI sliver shut down procedure. All of these procedures should make sense together, and the site administrator should follow the local site ES procedure for the test. The site administrator should identify parts of the local procedure where they need to take action on the aggregate, and they should reference the ExoGENI sliver shut down procedure for that part of the test. He or she should also identify where the local site procedure requires interfacing with the GMOC. The parts identified by the site administrator should be verified with the GMOC and with the ExoGENI team.

Results of testing step 1: 2013-03-01

The documents have been collected, and they are at the following locations:

Step 2 (prep): GPO, GMOC, and ExoGENI team coordinate a time to run an ES test

The GPO will coordinate with parties at the GMOC and on the ExoGENI team to identify when an ES test can be run. This test will focus primarily on the interactions with the site administator(s) and performing the procedures documented by the rack team. The following roles will need to be defined for this test:

  • GMOC Coordinator: person from the GMOC who coordinates the ES activity on the GMOC's side
  • ExoGENI Contact: person from the ExoGENI team who can be around if there are questions about the document or sliver shut down procedure
  • ES Initiator: GPO person who initiates an Emergency Stop request
  • Experimenter: GPO person who has created a sliver
  • Site Administrator: GPO person who is acting as the site administrator of the GPO ExoGENI rack

Results of testing step 2: 2013-02-28

Date of test: 2013-03-06

Role Person
GMOC Coordinator Eldar
ExoGENI Contact Chris, Jonothan, Victor
ES Initiator Chaos
Experimenter Josh
Site Administrator Tim

Step 3 (prep): Experimenter sets up a slice

The experimenter will set up a slice that includes a sliver on the GPO ExoGENI rack. The sliver should be a VM that is attached to the shared mesoscale VLAN, and it should be sending traffic that is visible through monitoring.

Results of testing step 3: 2013-03-06

Josh is runnin UDP iperfs in his jbs15 and jbs16 slices from the BBN ExoGENI rack to the BBN InstaGENI rack. This traffic should be visible on the graph of poblano gi0/15's last hour of RX bytes/sec. We will want to look at the amount of traffic before and after we shut down the sliver. Once the shutdown is complete, I will capture today's graph with a timestamp.

Step 4 (prep): ES initiated

  • The ES Initiator contacts the GMOC Coordinator to initiate an ES request describing the slice URN.
  • The GMOC walks quickly walks through their procedure, skipping more formal steps as needed, in order to contact the aggregate operator primary contact.
    • The GMOC does not need to verify the identity of the ES Initiator for the purposes of this test, and therefore should not contact the ES Initiator.
    • The GMOC does not need to contact the Experimenter for the purposes of this test, and therefore the GMOC should not contact the Experimenter.

Results of testing step 4: 2013-03-06

Chaos sent the following at 2:30 PM EST:

GMOC:

I am a GENI experimenter, and am trying to use the BBN site.

I noticed that the slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs15
has a sliver on the ORCA aggregate at the BBN ExoGENI rack
(bbn-hn.exogeni.net:11443) which is causing trouble.

Please perform an emergency stop on that sliver.

Chaos

The GMOC followed up at 2:54 PM EST with:

Greetings Aggregate Operator or GENI Rack site Operator,

We received an emergency stop request from a GENI Experimenter (Chaos
Golubinski) for the following GENI resource:

Slice URN: urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs15 of which
RENIC is listed as Aggregate Operator and BBN/GPO is the Rack site
contact. Please acknowledge within an hour that you are looking into
this Emergency Stop request and are working to shut-down this
resource. If no response is received in one hour, we will proceed to
contact your Escalation contact and/or perform an isolation of your
slice/resource from the GENI Core network.

NOTE: Please feel free to reference the following document for
details of the GMOC Emergency Stop process:
http://gmoc.grnoc.iu.edu/gmoc/documents/geni-standards/gmoc-support-e
mergency-stop-procedure-and-workflow.html 

Step 5: Site Administrator receives ES request

Step 5A (verify): Data passed from GMOC to Site Administrator is in expected format

Using:

  • Local site ES procedure
  • Documented ExoGENI sliver shut down procedure
  • GMOC monitoring tools

Verify:

  • The GMOC sends a request with slice-specific or sliver-specific data in a format that can be fed into the shut down procedure
  • There is a step in the local site ES procedure for the Site Administrator to acknowledge that the GMOC's request is being processed
  • The Site Administrator can identify the experimenter's email address, username, and affiliation with the information provided by the GMOC and GMOC monitoring tools

Results of testing step 5A: 2013-03-06

  • The GMOC's email included the slice URN, which was as expected for today's test. The ExoGENI procedure can be followed with a slice URN.
  • The example procedure includes a step to ack the GMOC's ES request, and I acked the request at 2:56 PM EST.
  • I can see the experimenter's email address, user URN, and operator organization when I look at the GMOC web UI.

Step 5B (verify): Shut down procedure can be followed to successfully shut down a sliver

Using:

  • Documented ExoGENI sliver shut down procedure
  • Administrative tools to shut down a sliver
  • GMOC monitoring tools

Verify:

  • The shut down procedure includes the complete set of steps shut down a sliver in the rack
  • Following the shut down procedure results in a sliver being deactivated on a rack
  • Experimental traffic from the sliver is no longer being sent

Results of testing step 5B: 2013-03-06

  • I followed only the steps in the documented procedure, and the sliver was successfully shut down. There was an issue with verifying that the sliver was shut down if the documented procedure was followed. We did verify that the sliver had been shut down, but we used steps that were not in the documented procedure. We will need to follow up with the ExoGENI team:
    pequod>show sms
    bbn-sm  sm      6f72d5ac-5190-46c8-bc10-cc4af5dcab6e
            http://localhost:14080/orca/spring-services
            [Service Manager at BBN ExoGENI Site for topology embedding]
    
    pequod>show slices for bbn-sm filter "jbs15"
    urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs15
    8780bcef-6b72-4897-b750-f47a713e176c
    urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs15
    4993ad93-7db4-4070-9e2b-756f9af653b6
    urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs15
    c8bb5fbb-c680-437c-be18-c02a8e433542
    
    Total: 3 slices
    pequod>
    Logging out of containers
    Shutting down commands help file set show aux manage
    Resetting terminal and exiting. Goodbye.
    
    [tupty@bbn-hn ~]$ pequod
    Pequod ORCA Shell v.4.0-SNAPSHOT.build-5214 built on 02/28/2013 12:19
    (c) 2012-2013 RENCI/UNC Chapel Hill
    
      help: Returns help for individual commands
      file: File-related commands
      set: Modify internal set variables
      show: Show the state of things
      aux: Auxiliary commands
      manage: Manage actor state
      history: show command history (!<command index> invokes the command)
      exit: Exit from the shell (Ctrl-D or Ctrl-C also works)
    Type the entire command, or enter the first word of the command to
    enter subcommand with intelligent auto-completion (Using TAB).
    
    "It is not down on any map; true places never are."
                    -Herman Melville, Moby Dick
    
    pequod>show slices for bbn-sm filter "jbs15"
    urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs15
    8780bcef-6b72-4897-b750-f47a713e176c
    urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs15
    4993ad93-7db4-4070-9e2b-756f9af653b6
    urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs15
    c8bb5fbb-c680-437c-be18-c02a8e433542
    
    Total: 3 slices
    pequod>
    Logging out of containers
    Shutting down commands help file set show aux manage
    Resetting terminal and exiting. Goodbye.
    [tupty@bbn-hn ~]$ pequod
    Pequod ORCA Shell v.4.0-SNAPSHOT.build-5214 built on 02/28/2013 12:19
    (c) 2012-2013 RENCI/UNC Chapel Hill
    
      help: Returns help for individual commands
      file: File-related commands
      set: Modify internal set variables
      show: Show the state of things
      aux: Auxiliary commands
      manage: Manage actor state
      history: show command history (!<command index> invokes the command)
      exit: Exit from the shell (Ctrl-D or Ctrl-C also works)
    Type the entire command, or enter the first word of the command to
    enter subcommand with intelligent auto-completion (Using TAB).
    
    "It is not down on any map; true places never are."
                    -Herman Melville, Moby Dick
    
    pequod>show reservations for c8bb5fbb-c680-437c-be18-c02a8e433542
    actor bbn-sm state active
    42b5b5f6-6d39-4359-b37f-5cb9f435f066    bbn-sm
            1       bbnvmsite.vm    [ active, activeticketed]
            Notices: Reservation 42b5b5f6-6d39-4359-b37f-5cb9f435f066
    (Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+jbs15) is in state
    [Active,ExtendingTicket]
            Start: Mon Feb 25 20:25:26 UTC 2013     End:Mon Mar 11 19:25:27 UTC 2013
    
    Total: 1 reservations
    pequod>manage close slice c8bb5fbb-c680-437c-be18-c02a8e433542 actor bbn-sm
    Closed slice c8bb5fbb-c680-437c-be18-c02a8e433542 on bbn-sm with result true
    pequod>show reservations for current actor bbn-sm
    ERROR: Current slice not set
    Total: 0 reservations
    

The above final command is missing a step to set the variable "current", which is why it did not work. This should be addressed in the documentation.

  • To verify that the reservations were gone, we ran:
    pequod>show reservations for c8bb5fbb-c680-437c-be18-c02a8e433542 actor bbn-sm
    42b5b5f6-6d39-4359-b37f-5cb9f435f066    bbn-sm
            1       bbnvmsite.vm    [ closed, nascent]
            Notices: Reservation 42b5b5f6-6d39-4359-b37f-5cb9f435f066 (Slice urn:pub
            Start: Mon Feb 25 20:25:26 UTC 2013     End:Mon Mar 11 19:25:27 UTC 2013
    
    Total: 1 reservations
    

This lists the reservation as being in the "closed" state as described in the document. The document says "The next to last command should show a list of active slivers within a slice. The last command verifies that all reservations in the slice are either closed or failed". That should probably actually say "The third-to-last command...".

Step 5C (verify): Documented procedure includes a step to follow up with GMOC

Using:

  • Local site ES procedure

Verify:

  • There is a step for the site administrator to follow up with the GMOC that a sliver has been shut down

Results of testing step 5C: 2013-03-06

  • The GMOC ack-ed my ack at 3:01 PM EST, although I don't think that is a requirement.
  • There is a step to follow up with the GMOC, and I did so at 3:11 PM EST.

Step 6 (verify): Sliver shut down procedure includes a clean-up step (if necessary)

Using:

  • Documented ExoGENI sliver shut down procedure

Verify:

  • Ensure the ExoGENI sliver shut down procedure contains a recovery step describing what to do if the shut down affects other experimenters.

Results of testing step 6: 2013-03-06

  • Other slices (including tuptymon and jbs16) were not affected by us shutting down jbs15.
  • There were no clean up steps necessary for the pieces involving the ExoGENI doc in today's test.