Opened 12 years ago

Last modified 11 years ago

#60 new

Nodes that are part of expired slices can be accessed when sliver is expired.

Reported by: tupty@bbn.com Owned by: somebody
Priority: major Milestone:
Component: AM Version: SPIRAL4
Keywords: Cc:
Dependencies:

Description

After a sliver has expired in ORCA, a normal manifest is returned for the sliver by ListResources:

<rspec type="manifest" xmlns="http://www.geni.net/resources/rspec/3" xmlns:ns2="http://hpn.east.isi.edu/rspec/ext/stitch/0.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.geni.net/resources/rspec/3 http://www.geni.net/resources/rspec/3/manifest.xsd http://hpn.east.isi.edu/rspec/ext/stitch/0.1/ http://hpn.east.isi.edu/rspec/ext/stitch/0.1/stitch-schema.xsd">  
      <node client_id="0ba1a49d-ef80-4669-91a4-62ae3424747f#geni0" component_id="urn:publicid:IDN+exogeni.net:bbnvmsite+authority+cm" exclusive="true" sliver_id="urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+0ba1a49d-ef80-4669-91a4-62ae3424747f#geni0">    
            <sliver_type name="m1.small">      
                  <disk_image name="http://geni-images.renci.org/images/standard/debian/debian-squeeze-amd64-neuca-2g.zfilesystem.sparse.v0.2.xml" version="397c431cb9249e1f361484b08674bc3381455bb9"/>      
            </sliver_type>    
            <services>      
                  <login authentication="ssh-keys" hostname="192.1.242.6" port="22" username="root"/>      
                  <execute command="#!/bin/bash # Automatically generated boot script execString=&amp;quot;/bin/sh -c \&amp;quot;hostname tuptymon-bbn\&amp;quot;&amp;quot; eval $execString  "/>      
            </services>    
            <interface client_id="0ba1a49d-ef80-4669-91a4-62ae3424747f#geni0:if1"/>    
      </node>  
      <link client_id="0ba1a49d-ef80-4669-91a4-62ae3424747f#center" sliver_id="urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+0ba1a49d-ef80-4669-91a4-62ae3424747f#center" vlantag="1750.0">    
            <interface_ref client_id="0ba1a49d-ef80-4669-91a4-62ae3424747f#geni0:if1"/>    
      </link>  
</rspec>

However, SliverStatus reports that the sliver doesn't exist anymore:

Failed to get SliverStatus on tuptymon at AM https://bbn-hn.exogeni.net:11443/orca/xmlrpc: ERROR: There are no reservations in the slice with sliceId = urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+tuptymon

Basically, some part of ORCA is treating the sliver like it still exists, while another part of ORCA knows that the sliver has expired. The actual compute resources for the sliver at the aggregate had been deleted. A new sliver under the same GENI slice cannot be created until the old sliver is explicitly deleted.

Once the sliver expires, it should not need to be explicitly deleted in order to create the sliver again.

Attachments (1)

omni_output.txt (22.5 KB) - added by tupty@bbn.com 12 years ago.
Sequence of omni commands and output detailing the issue

Download all attachments as: .zip

Change History (12)

comment:1 Changed 12 years ago by tupty@bbn.com

After the sliver expired, I ran through the following sequence:

  • ListResources
  • SliverStatus
  • CreateSliver
  • ListResources
  • SliverStatus
  • DeleteSliver
  • ListResources
  • SliverStatus
  • CreateSliver
  • ListResources
  • SliverStatus

This details the problem in this ticket. I will attach the output of that sequence of omni commands in the file omni_output.txt

Changed 12 years ago by tupty@bbn.com

Attachment: omni_output.txt added

Sequence of omni commands and output detailing the issue

comment:2 Changed 12 years ago by tupty@bbn.com

For what its worth, I am pretty sure that my sliver expired, but I don't really have any proof of that. This was the mesoscale connectivity test VM, and the connectivity service checks started failing earlier this weekend, which led me to start checking on the status of my ORCA sliver on bbn-hn.

comment:3 Changed 12 years ago by tupty@bbn.com

FWIW, I am still seeing this. My VM is gone, and SliverStatus knows that the reservation is done, but I still get a manifest RSpec when I call ListResources with my slice name.

I am going to leave that sliver alone for now and see if things get back in sync tomorrow. If ListResources still returns a manifest tomorrow, then I will assume that I need to take action and delete the sliver to get things back to a known state. If fixing this requires experimenter action, then that is bad in general, but it also has implications for relational data and monitoring, which will make use of the manifests.

comment:4 Changed 12 years ago by ibaldin@renci.org

This won't fix easily. We use lazy garbage collection - a manifest may persist until next time someone tries to create a slice and the system notices all reservations from this one are gone and then remove it.

comment:5 Changed 12 years ago by ibaldin@renci.org

We can play some tricks and do the GC when you query for it too, I suppose.

comment:6 Changed 12 years ago by tupty@bbn.com

That sounds like a good approach to me, assuming that the frequency of queries coupled with the duration of GC doesn't bog down the system. I personally have no clue on the expected frequency of queries or the duration of GC, so I'll leave that up to you all to decide.

comment:7 Changed 12 years ago by lnevers@bbn.com

There are various inconsistencies with ExoGENI BBN sliver EG-EXP-6-exp3:

I renewed it this morning without problem:

INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-EXP-6-exp3
expires on 2012-08-22 00:00:00 UTC
INFO:omni:Renewing Sliver urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-
EXP-6-exp3 until 2012-08-22 00:00:00+00:00 (UTC)
INFO:omni:Substituting AM nickname exobbn with URL https://bbn-
hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN
INFO:omni:Renewed sliver urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-
EXP-6-exp3 at unspecified_AM_URN (https://bbn-
hn.exogeni.net:11443/orca/xmlrpc) until 2012-08-22T00:00:00+00:00 (UTC)
INFO:omni: ------------------------------------------------------------
INFO:omni: Completed renewsliver:

  Options as run:
                aggregate: exobbn
                framework: pg
                native: True

  Args: renewsliver EG-EXP-6-exp3 2012-08-22

  Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-
EXP-6-exp3 expires on 2012-08-22 00:00:00 UTC
Renewed sliver urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-EXP-6-exp3
at unspecified_AM_URN (https://bbn-hn.exogeni.net:11443/orca/xmlrpc) until
2012-08-22T00:00:00+00:00 (UTC)

But according to sliver status it is gone:

$ ./src/omni.py sliverstatus -a exobbn EG-EXP-6-exp3
INFO:omni:Loading config file /home/lnevers2/.gcf/omni_config
INFO:omni:Using control framework pg
INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-EXP-6-exp3
expires on 2012-08-22 00:00:00 UTC
INFO:omni:Substituting AM nickname exobbn with URL https://bbn-
hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN
INFO:omni:Status of Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-
EXP-6-exp3:
INFO:omni: ------------------------------------------------------------
INFO:omni: Completed sliverstatus:

  Options as run:
                aggregate: exobbn
                framework: pg
                native: True

  Args: sliverstatus EG-EXP-6-exp3

  Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-
EXP-6-exp3 expires on 2012-08-22 00:00:00 UTC

Failed to get SliverStatus on EG-EXP-6-exp3 at AM https://bbn-
hn.exogeni.net:11443/orca/xmlrpc: ERROR: There are no reservations in the
slice with sliceId = urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-
EXP-6-exp3
Returned status of slivers on 0 of 1 possible aggregates.
INFO:omni: ============================================================

But a listresources still shows that is active:

$ ./src/omni.py listresources -a exobbn EG-EXP-6-exp3
INFO:omni:Loading config file /home/lnevers2/.gcf/omni_config
INFO:omni:Using control framework pg
INFO:omni:Gathering resources reserved for slice EG-EXP-6-exp3.
INFO:omni:Substituting AM nickname exobbn with URL https://bbn-
hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN
INFO:omni:Listed resources on 1 out of 1 possible aggregates.
INFO:omni:<?xml version="1.0" ?>
INFO:omni:<!-- Resources for:
        Slice: EG-EXP-6-exp3
        at AM:
        URN: unspecified_AM_URN
        URL: https://bbn-hn.exogeni.net:11443/orca/xmlrpc
 -->
INFO:omni:<rspec type="manifest"
xmlns="http://www.geni.net/resources/rspec/3"
xmlns:ns2="http://hpn.east.isi.edu/rspec/ext/stitch/0.1/"
xmlns:ns3="http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions
/slice-info/1"
xmlns:ns4="http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions
/sliver-info/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.geni.net/resources/rspec/3
http://www.geni.net/resources/rspec/3/manifest.xsd
http://hpn.east.isi.edu/rspec/ext/stitch/0.1/
http://hpn.east.isi.edu/rspec/ext/stitch/0.1/stitch-schema.xsd
http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions/slice-
info/1 http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions
/slice-info/1/slice_info.xsd?format=raw
http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions/sliver-
info/1 http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions
/sliver-info/1/sliver_info.xsd?format=raw">
      <node client_id="VM"
component_id="urn:publicid:IDN+exogeni.net:bbnvmsite+node+orca-vm-cloud"
component_manager_id="urn:publicid:IDN+exogeni.net:bbnvmsite+authority+am"
component_name="orca-vm-cloud" exclusive="true"
sliver_id="urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+a185b84a-
884d-4746-8fa8-c1c32e5f3674#VM">
            <sliver_type name="m1.small">
                  <disk_image name="http://geni-
images.renci.org/images/standard/debian/debian-squeeze-amd64-neuca-
2g.zfilesystem.sparse.v0.2.xml"
version="397c431cb9249e1f361484b08674bc3381455bb9"/>
            </sliver_type>
            <services>
                  <login authentication="ssh-keys" hostname="192.1.242.9"
port="22" username="root"/>
            </services>
            <interface client_id="VM:if0">
                  <ip address="10.42.11.198" netmask="255.255.0.0"
type="ipv4"/>
            </interface>
            <ns4:geni_sliver_info
creation_time="2012-08-14T18:22:26.060-04:00"
creator_urn="urn:publicid:IDN+pgeni.gpolab.bbn.com+user+lnevers2"
expiration_time="2012-08-15T18:22:26.060-04:00"
start_time="2012-08-14T18:22:26.060-04:00" state="Active"/>
      </node>
      <link client_id="lan0"
sliver_id="urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+a185b84a-
884d-4746-8fa8-c1c32e5f3674#lan0" vlantag="1750">
            <interface_ref client_id="VM:if0"/>
            <ns4:geni_sliver_info
creation_time="2012-08-14T18:22:26.060-04:00"
creator_urn="urn:publicid:IDN+pgeni.gpolab.bbn.com+user+lnevers2"
expiration_time="2012-08-15T18:22:26.060-04:00"
start_time="2012-08-14T18:22:26.060-04:00"/>
      </link>
      <ns3:geni_slice_info state="unknown"
urn="urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-EXP-6-exp3" uuid
="690de9ca-c66d-4474-9819-5930300820e9"/>
</rspec>
INFO:omni: ------------------------------------------------------------
INFO:omni: Completed listresources:

  Options as run:
                aggregate: exobbn
                framework: pg
                native: True

  Args: listresources EG-EXP-6-exp3

  Result Summary: Retrieved resources for slice EG-EXP-6-exp3 from 1
aggregates.
Wrote rspecs from 1 aggregates.

Attempts to login to the assigned host fail:

lnevers2@arendia:~/gcf-1.6.2$ ssh root@192.1.242.9
ssh: connect to host 192.1.242.9 port 22: No route to host

comment:8 Changed 11 years ago by vjo@duke.edu

Are we still seeing this?

comment:9 Changed 11 years ago by ibaldin@renci.org

I don't think it is fixed, but with the addition of a slice state machine in the latest code it should be easier to do.

comment:10 Changed 11 years ago by lnevers@bbn.com

Checked AM API commands on an expired slice (3sites-OF) and found the following commands failed as expected for an expired sliver:

  • Listresource
  • Renewsliver
  • Sliverstatus
  • Createsliver

The above were run several times with the expected results.

Although I was able to login to the node which had been assigned to the expired slice 3sites-OF, even though I should not be able to do so:

$ ssh -i /home/lnevers/.ssh/id_rsa root@152.54.14.17
Linux debian 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Mar 21 15:26:24 2013 from arendia.gpolab.bbn.com
root@renci-eg:~# 

comment:11 Changed 11 years ago by lnevers@bbn.com

Summary: Mismatch in ListResources and SliverStatus output after sliver expiration in ORCANodes that are part of expired slices can be accessed when sliver is expired.

Updating summary to capture the last issue in this ticket.

Note: See TracTickets for help on using tickets.