Opened 11 years ago
Last modified 11 years ago
#60 new
Nodes that are part of expired slices can be accessed when sliver is expired.
Reported by: | tupty@bbn.com | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | AM | Version: | SPIRAL4 |
Keywords: | Cc: | ||
Dependencies: |
Description
After a sliver has expired in ORCA, a normal manifest is returned for the sliver by ListResources:
<rspec type="manifest" xmlns="http://www.geni.net/resources/rspec/3" xmlns:ns2="http://hpn.east.isi.edu/rspec/ext/stitch/0.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.geni.net/resources/rspec/3 http://www.geni.net/resources/rspec/3/manifest.xsd http://hpn.east.isi.edu/rspec/ext/stitch/0.1/ http://hpn.east.isi.edu/rspec/ext/stitch/0.1/stitch-schema.xsd"> <node client_id="0ba1a49d-ef80-4669-91a4-62ae3424747f#geni0" component_id="urn:publicid:IDN+exogeni.net:bbnvmsite+authority+cm" exclusive="true" sliver_id="urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+0ba1a49d-ef80-4669-91a4-62ae3424747f#geni0"> <sliver_type name="m1.small"> <disk_image name="http://geni-images.renci.org/images/standard/debian/debian-squeeze-amd64-neuca-2g.zfilesystem.sparse.v0.2.xml" version="397c431cb9249e1f361484b08674bc3381455bb9"/> </sliver_type> <services> <login authentication="ssh-keys" hostname="192.1.242.6" port="22" username="root"/> <execute command="#!/bin/bash # Automatically generated boot script execString=&quot;/bin/sh -c \&quot;hostname tuptymon-bbn\&quot;&quot; eval $execString "/> </services> <interface client_id="0ba1a49d-ef80-4669-91a4-62ae3424747f#geni0:if1"/> </node> <link client_id="0ba1a49d-ef80-4669-91a4-62ae3424747f#center" sliver_id="urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+0ba1a49d-ef80-4669-91a4-62ae3424747f#center" vlantag="1750.0"> <interface_ref client_id="0ba1a49d-ef80-4669-91a4-62ae3424747f#geni0:if1"/> </link> </rspec>
However, SliverStatus reports that the sliver doesn't exist anymore:
Failed to get SliverStatus on tuptymon at AM https://bbn-hn.exogeni.net:11443/orca/xmlrpc: ERROR: There are no reservations in the slice with sliceId = urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+tuptymon
Basically, some part of ORCA is treating the sliver like it still exists, while another part of ORCA knows that the sliver has expired. The actual compute resources for the sliver at the aggregate had been deleted. A new sliver under the same GENI slice cannot be created until the old sliver is explicitly deleted.
Once the sliver expires, it should not need to be explicitly deleted in order to create the sliver again.
Attachments (1)
Change History (12)
comment:1 Changed 11 years ago by
Changed 11 years ago by
Attachment: | omni_output.txt added |
---|
Sequence of omni commands and output detailing the issue
comment:2 Changed 11 years ago by
For what its worth, I am pretty sure that my sliver expired, but I don't really have any proof of that. This was the mesoscale connectivity test VM, and the connectivity service checks started failing earlier this weekend, which led me to start checking on the status of my ORCA sliver on bbn-hn.
comment:3 Changed 11 years ago by
FWIW, I am still seeing this. My VM is gone, and SliverStatus knows that the reservation is done, but I still get a manifest RSpec when I call ListResources with my slice name.
I am going to leave that sliver alone for now and see if things get back in sync tomorrow. If ListResources still returns a manifest tomorrow, then I will assume that I need to take action and delete the sliver to get things back to a known state. If fixing this requires experimenter action, then that is bad in general, but it also has implications for relational data and monitoring, which will make use of the manifests.
comment:4 Changed 11 years ago by
This won't fix easily. We use lazy garbage collection - a manifest may persist until next time someone tries to create a slice and the system notices all reservations from this one are gone and then remove it.
comment:5 Changed 11 years ago by
We can play some tricks and do the GC when you query for it too, I suppose.
comment:6 Changed 11 years ago by
That sounds like a good approach to me, assuming that the frequency of queries coupled with the duration of GC doesn't bog down the system. I personally have no clue on the expected frequency of queries or the duration of GC, so I'll leave that up to you all to decide.
comment:7 Changed 11 years ago by
There are various inconsistencies with ExoGENI BBN sliver EG-EXP-6-exp3:
I renewed it this morning without problem:
INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-EXP-6-exp3 expires on 2012-08-22 00:00:00 UTC INFO:omni:Renewing Sliver urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG- EXP-6-exp3 until 2012-08-22 00:00:00+00:00 (UTC) INFO:omni:Substituting AM nickname exobbn with URL https://bbn- hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN INFO:omni:Renewed sliver urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG- EXP-6-exp3 at unspecified_AM_URN (https://bbn- hn.exogeni.net:11443/orca/xmlrpc) until 2012-08-22T00:00:00+00:00 (UTC) INFO:omni: ------------------------------------------------------------ INFO:omni: Completed renewsliver: Options as run: aggregate: exobbn framework: pg native: True Args: renewsliver EG-EXP-6-exp3 2012-08-22 Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG- EXP-6-exp3 expires on 2012-08-22 00:00:00 UTC Renewed sliver urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-EXP-6-exp3 at unspecified_AM_URN (https://bbn-hn.exogeni.net:11443/orca/xmlrpc) until 2012-08-22T00:00:00+00:00 (UTC)
But according to sliver status it is gone:
$ ./src/omni.py sliverstatus -a exobbn EG-EXP-6-exp3 INFO:omni:Loading config file /home/lnevers2/.gcf/omni_config INFO:omni:Using control framework pg INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-EXP-6-exp3 expires on 2012-08-22 00:00:00 UTC INFO:omni:Substituting AM nickname exobbn with URL https://bbn- hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN INFO:omni:Status of Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG- EXP-6-exp3: INFO:omni: ------------------------------------------------------------ INFO:omni: Completed sliverstatus: Options as run: aggregate: exobbn framework: pg native: True Args: sliverstatus EG-EXP-6-exp3 Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG- EXP-6-exp3 expires on 2012-08-22 00:00:00 UTC Failed to get SliverStatus on EG-EXP-6-exp3 at AM https://bbn- hn.exogeni.net:11443/orca/xmlrpc: ERROR: There are no reservations in the slice with sliceId = urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG- EXP-6-exp3 Returned status of slivers on 0 of 1 possible aggregates. INFO:omni: ============================================================
But a listresources still shows that is active:
$ ./src/omni.py listresources -a exobbn EG-EXP-6-exp3 INFO:omni:Loading config file /home/lnevers2/.gcf/omni_config INFO:omni:Using control framework pg INFO:omni:Gathering resources reserved for slice EG-EXP-6-exp3. INFO:omni:Substituting AM nickname exobbn with URL https://bbn- hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN INFO:omni:Listed resources on 1 out of 1 possible aggregates. INFO:omni:<?xml version="1.0" ?> INFO:omni:<!-- Resources for: Slice: EG-EXP-6-exp3 at AM: URN: unspecified_AM_URN URL: https://bbn-hn.exogeni.net:11443/orca/xmlrpc --> INFO:omni:<rspec type="manifest" xmlns="http://www.geni.net/resources/rspec/3" xmlns:ns2="http://hpn.east.isi.edu/rspec/ext/stitch/0.1/" xmlns:ns3="http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions /slice-info/1" xmlns:ns4="http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions /sliver-info/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.geni.net/resources/rspec/3 http://www.geni.net/resources/rspec/3/manifest.xsd http://hpn.east.isi.edu/rspec/ext/stitch/0.1/ http://hpn.east.isi.edu/rspec/ext/stitch/0.1/stitch-schema.xsd http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions/slice- info/1 http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions /slice-info/1/slice_info.xsd?format=raw http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions/sliver- info/1 http://groups.geni.net/exogeni/attachment/wiki/RspecExtensions /sliver-info/1/sliver_info.xsd?format=raw"> <node client_id="VM" component_id="urn:publicid:IDN+exogeni.net:bbnvmsite+node+orca-vm-cloud" component_manager_id="urn:publicid:IDN+exogeni.net:bbnvmsite+authority+am" component_name="orca-vm-cloud" exclusive="true" sliver_id="urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+a185b84a- 884d-4746-8fa8-c1c32e5f3674#VM"> <sliver_type name="m1.small"> <disk_image name="http://geni- images.renci.org/images/standard/debian/debian-squeeze-amd64-neuca- 2g.zfilesystem.sparse.v0.2.xml" version="397c431cb9249e1f361484b08674bc3381455bb9"/> </sliver_type> <services> <login authentication="ssh-keys" hostname="192.1.242.9" port="22" username="root"/> </services> <interface client_id="VM:if0"> <ip address="10.42.11.198" netmask="255.255.0.0" type="ipv4"/> </interface> <ns4:geni_sliver_info creation_time="2012-08-14T18:22:26.060-04:00" creator_urn="urn:publicid:IDN+pgeni.gpolab.bbn.com+user+lnevers2" expiration_time="2012-08-15T18:22:26.060-04:00" start_time="2012-08-14T18:22:26.060-04:00" state="Active"/> </node> <link client_id="lan0" sliver_id="urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+a185b84a- 884d-4746-8fa8-c1c32e5f3674#lan0" vlantag="1750"> <interface_ref client_id="VM:if0"/> <ns4:geni_sliver_info creation_time="2012-08-14T18:22:26.060-04:00" creator_urn="urn:publicid:IDN+pgeni.gpolab.bbn.com+user+lnevers2" expiration_time="2012-08-15T18:22:26.060-04:00" start_time="2012-08-14T18:22:26.060-04:00"/> </link> <ns3:geni_slice_info state="unknown" urn="urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+EG-EXP-6-exp3" uuid ="690de9ca-c66d-4474-9819-5930300820e9"/> </rspec> INFO:omni: ------------------------------------------------------------ INFO:omni: Completed listresources: Options as run: aggregate: exobbn framework: pg native: True Args: listresources EG-EXP-6-exp3 Result Summary: Retrieved resources for slice EG-EXP-6-exp3 from 1 aggregates. Wrote rspecs from 1 aggregates.
Attempts to login to the assigned host fail:
lnevers2@arendia:~/gcf-1.6.2$ ssh root@192.1.242.9 ssh: connect to host 192.1.242.9 port 22: No route to host
comment:9 Changed 11 years ago by
I don't think it is fixed, but with the addition of a slice state machine in the latest code it should be easier to do.
comment:10 Changed 11 years ago by
Checked AM API commands on an expired slice (3sites-OF) and found the following commands failed as expected for an expired sliver:
- Listresource
- Renewsliver
- Sliverstatus
- Createsliver
The above were run several times with the expected results.
Although I was able to login to the node which had been assigned to the expired slice 3sites-OF, even though I should not be able to do so:
$ ssh -i /home/lnevers/.ssh/id_rsa root@152.54.14.17 Linux debian 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Thu Mar 21 15:26:24 2013 from arendia.gpolab.bbn.com root@renci-eg:~#
comment:11 Changed 11 years ago by
Summary: | Mismatch in ListResources and SliverStatus output after sliver expiration in ORCA → Nodes that are part of expired slices can be accessed when sliver is expired. |
---|
Updating summary to capture the last issue in this ticket.
After the sliver expired, I ran through the following sequence:
This details the problem in this ticket. I will attach the output of that sequence of omni commands in the file omni_output.txt