Opened 11 years ago
Closed 11 years ago
#157 closed (fixed)
Creating a 2 VMs sliver with GIMI image fails with "resources failed to join"
Reported by: | lnevers@bbn.com | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | AM | Version: | SPIRAL5 |
Keywords: | sliver creation | Cc: | |
Dependencies: |
Description
Using the GIMI image listed at https://wiki.exogeni.net/doku.php?id=public:experimenters:images and requesting a 2 VM with one LAN setup.
One of the nodes is ready and the second nodes failed to come up with the following sliverstatus failure:
$ omni.py sliverstatus -a eg-gpo lngimi INFO:omni:Loading config file /home/lnevers/.gcf/omni_config INFO:omni:Using control framework pg INFO:omni:Substituting AM nickname eg-gpo with URL https://bbn-hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi expires on 2013-03-13 14:22:58 UTC INFO:omni:Substituting AM nickname eg-gpo with URL https://bbn-hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN INFO:omni:Status of Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi: INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi at AM https://bbn-hn.exogeni.net:11443/orca/xmlrpc has overall SliverStatus: failed INFO:omni:Sliver status for Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi at AM URL https://bbn-hn.exogeni.net:11443/orca/xmlrpc INFO:omni:{ "geni_status": "failed", "geni_urn": "urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi", "geni_resources": [ { "orca_expires": "Tue Mar 12 16:07:42 UTC 2013", "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+7bac418b-983c-4a1c-b776-e4c483cfa5ac#geni1", "geni_error": "Reservation 54e04386-d3c3-4fa1-b75b-c8a9ab3c0445 (Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi) is in state [Failed,None], err=resources failed to join: (no details)\n", "geni_status": "Failed" }, { "orca_expires": "Tue Mar 12 16:07:42 UTC 2013", "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+7bac418b-983c-4a1c-b776-e4c483cfa5ac#geni2", "geni_error": "", "geni_status": "Active" }, { "orca_expires": "Tue Mar 12 16:07:42 UTC 2013", "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+7bac418b-983c-4a1c-b776-e4c483cfa5ac#center", "geni_error": "", "geni_status": "Active" } ] } INFO:omni: ------------------------------------------------------------ INFO:omni: Completed sliverstatus: Options as run: aggregate: ['eg-gpo'] framework: pg Args: sliverstatus lngimi Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi expires on 2013-03-13 14:22:58 UTC Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi at AM https://bbn-hn.exogeni.net:11443/orca/xmlrpc has overall SliverStatus: failed. Returned status of slivers on 1 of 1 possible aggregates. INFO:omni: ============================================================
The RSpec used to request the failed sliver:
<?xml version="1.0" encoding="UTF-8"?> <rspec type="request" expires="2013-03-12T16:07:42Z" xsi:schemaLocation="http://www.geni.net/resources/rspec/3 http://www.geni.net/resources/rspec/3/request.xsd http://www.protogeni.net/resources/rspec/ext/shared-vlan/1 http://www.protogeni.net/resources/rspec/ext/shared-vlan/1/request.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:s="http://www.protogeni.net/resources/rspec/ext/shared-vlan/1" xmlns="http://www.geni.net/resources/rspec/3"> <node client_id="geni1" component_manager_id="urn:publicid:IDN+bbnvmsite+authority+cm"> <sliver_type name="m1.small"> <disk_image name="http://emmy9.casa.umass.edu/Disk_Images/ExoGENI/exogeni-umass-1.2.xml" version="49f0c193cc91d7b2fc1a6f038427935f4c296a8a" /> </sliver_type> <interface client_id="geni1:0"> <ip address="172.16.1.1" netmask="255.255.255.0" /> </interface> </node> <node client_id="geni2" component_manager_id="urn:publicid:IDN+bbnvmsite+authority+cm"> <sliver_type name="m1.large"> <disk_image name="http://emmy9.casa.umass.edu/Disk_Images/ExoGENI/exogeni-umass-1.2.xml" version="49f0c193cc91d7b2fc1a6f038427935f4c296a8a" /> </sliver_type> <interface client_id="geni2:0" > <ip address="172.16.1.2" netmask="255.255.255.0" /> </interface> </node> <link client_id="center"> <interface_ref client_id="geni1:0" /> <interface_ref client_id="geni2:0" /> </link> </rspec>
Leaving the sliver "lngimi" running.
Change History (10)
comment:1 Changed 11 years ago by
comment:2 Changed 11 years ago by
I had a second instance of the failure after I wrote the bug. I created a slice named lngimi2 and used the same RSpec and it also failed in the same way:
{ "orca_expires": "Tue Mar 12 16:07:42 UTC 2013", "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+3b13d0f9-44a8-489e-8630-d592972bd9f6#geni1", "geni_error": "Reservation 1ae2064c-3153-4199-a79f-4a70d3bd42f3 (Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi2) is in state [Failed,None], err=resources failed to join: (no details)\n", "geni_status": "Failed"
I am now creating sliver 'lngimi3' with the same RSpec, and will update the ticket with the results.
comment:3 Changed 11 years ago by
We can take a look, BUT (and this is important) - we cannot be responsible for problems with images we did not create. You should ask Jeanne Ohren - I think she uses this image frequently, to see if she had any problems. Ultimately it is for whoever created the image to resolve the problems with it. This is why we list contact information for the image.
comment:4 Changed 11 years ago by
The third attempt (slice=lngimi3) also failed in the same way:
"orca_expires": "Tue Mar 12 16:07:42 UTC 2013", "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+7814f003-b59d-49bf-a32d-cb97ddbf7fff#geni1", "geni_error": "Reservation bb3ff0a9-edfd-44a6-b1eb-a4733c1c68c1 (Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi3) is in state [Failed,None], err=resources failed to join: (no details)\n", "geni_status": "Failed"
The "lngimi1" and "lngimi3" slivers are still running.
comment:5 Changed 11 years ago by
I will talk to Jeanne Ohren and make sure that this is a valid image, but in each of the sliver creation cases one VM comes up and one fails. That should indicate that the image is "usable"?
comment:6 Changed 11 years ago by
If you try it with an image we created and it works, then the problem is with the image. We don't have an established mechanism for addressing these problems, but there is an implicit understanding that the responsibility for the image goes back to the image creator, not the operator of the rack or testbed.
comment:7 Changed 11 years ago by
According to Jeanne Ohren this is a valid image. Jeanne just created two slices with 2 VMs and 1 LAN using flukes in the GPO Rack via the GPO SM, just as I did with Omni. Both her slices were successful and the two VMs requested via flukes up. She is forwarding this issue to Mike Zink.
comment:8 Changed 11 years ago by
The failing VM used "<sliver_type name="m1.small">" and modifying this setting to "<sliver_type name="m1.large">" resolves the problem.
SUGGESTION:
Update the page https://wiki.exogeni.net/doku.php?id=public:experimenters:images to include size restrictions, if any exist. I will find out from the GIMI folks what the requirements for their image and update this ticket to capture the size requirements.
comment:10 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Thank you, closing ticket.
Luisa, we generally don't investigate single VM failures. A repeated failure would warrant a closer look. There is nothing in the logs to suggest a systemic problem here.