Opened 9 years ago

Closed 9 years ago

#157 closed (fixed)

Creating a 2 VMs sliver with GIMI image fails with "resources failed to join"

Reported by: lnevers@bbn.com Owned by: somebody
Priority: major Milestone:
Component: AM Version: SPIRAL5
Keywords: sliver creation Cc:
Dependencies:

Description

Using the GIMI image listed at https://wiki.exogeni.net/doku.php?id=public:experimenters:images and requesting a 2 VM with one LAN setup.

One of the nodes is ready and the second nodes failed to come up with the following sliverstatus failure:

$ omni.py sliverstatus -a eg-gpo lngimi 
INFO:omni:Loading config file /home/lnevers/.gcf/omni_config
INFO:omni:Using control framework pg
INFO:omni:Substituting AM nickname eg-gpo with URL https://bbn-hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN
INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi expires on 2013-03-13 14:22:58 UTC
INFO:omni:Substituting AM nickname eg-gpo with URL https://bbn-hn.exogeni.net:11443/orca/xmlrpc, URN unspecified_AM_URN
INFO:omni:Status of Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi:
INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi at AM https://bbn-hn.exogeni.net:11443/orca/xmlrpc 
has overall SliverStatus: failed
INFO:omni:Sliver status for Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi at AM URL https://bbn-hn.exogeni.net:11443/orca/xmlrpc
INFO:omni:{
  "geni_status": "failed", 
  "geni_urn": "urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi", 
  "geni_resources": [
    {
      "orca_expires": "Tue Mar 12 16:07:42 UTC 2013", 
      "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+7bac418b-983c-4a1c-b776-e4c483cfa5ac#geni1", 
      "geni_error": "Reservation 54e04386-d3c3-4fa1-b75b-c8a9ab3c0445 (Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi) 
       is in state [Failed,None], err=resources failed to join: (no details)\n", 
      "geni_status": "Failed"
    }, 
    {
      "orca_expires": "Tue Mar 12 16:07:42 UTC 2013", 
      "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+7bac418b-983c-4a1c-b776-e4c483cfa5ac#geni2", 
      "geni_error": "", 
      "geni_status": "Active"
    }, 
    {
      "orca_expires": "Tue Mar 12 16:07:42 UTC 2013", 
      "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+7bac418b-983c-4a1c-b776-e4c483cfa5ac#center", 
      "geni_error": "", 
      "geni_status": "Active"
    }
  ]
}
INFO:omni: ------------------------------------------------------------
INFO:omni: Completed sliverstatus:
  Options as run:
		aggregate: ['eg-gpo']
		framework: pg
  Args: sliverstatus lngimi
  Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi expires on 2013-03-13 14:22:58 UTC
Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi at AM https://bbn-hn.exogeni.net:11443/orca/xmlrpc has 
overall SliverStatus: failed.
 Returned status of slivers on 1 of 1 possible aggregates. 
INFO:omni: ============================================================

The RSpec used to request the failed sliver:

<?xml version="1.0" encoding="UTF-8"?>
<rspec type="request" expires="2013-03-12T16:07:42Z"
	xsi:schemaLocation="http://www.geni.net/resources/rspec/3 
	          	    http://www.geni.net/resources/rspec/3/request.xsd
                            http://www.protogeni.net/resources/rspec/ext/shared-vlan/1
                            http://www.protogeni.net/resources/rspec/ext/shared-vlan/1/request.xsd"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xmlns:s="http://www.protogeni.net/resources/rspec/ext/shared-vlan/1"
	xmlns="http://www.geni.net/resources/rspec/3">
<node client_id="geni1" component_manager_id="urn:publicid:IDN+bbnvmsite+authority+cm">
 <sliver_type name="m1.small">
   <disk_image name="http://emmy9.casa.umass.edu/Disk_Images/ExoGENI/exogeni-umass-1.2.xml" 
    version="49f0c193cc91d7b2fc1a6f038427935f4c296a8a" />
 </sliver_type>
 <interface client_id="geni1:0">
   <ip address="172.16.1.1" netmask="255.255.255.0" />
 </interface>
</node>
<node client_id="geni2" component_manager_id="urn:publicid:IDN+bbnvmsite+authority+cm">
 <sliver_type name="m1.large">
   <disk_image name="http://emmy9.casa.umass.edu/Disk_Images/ExoGENI/exogeni-umass-1.2.xml" 
    version="49f0c193cc91d7b2fc1a6f038427935f4c296a8a" />
 </sliver_type>
 <interface client_id="geni2:0" >
   <ip address="172.16.1.2" netmask="255.255.255.0" />
 </interface>
</node>
<link client_id="center">
  <interface_ref client_id="geni1:0" />
  <interface_ref client_id="geni2:0" />
</link>
</rspec>

Leaving the sliver "lngimi" running.

Change History (10)

comment:1 Changed 9 years ago by ibaldin@renci.org

Luisa, we generally don't investigate single VM failures. A repeated failure would warrant a closer look. There is nothing in the logs to suggest a systemic problem here.

comment:2 Changed 9 years ago by lnevers@bbn.com

I had a second instance of the failure after I wrote the bug. I created a slice named lngimi2 and used the same RSpec and it also failed in the same way:

    {
      "orca_expires": "Tue Mar 12 16:07:42 UTC 2013", 
      "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+3b13d0f9-44a8-489e-8630-d592972bd9f6#geni1", 
      "geni_error": "Reservation 1ae2064c-3153-4199-a79f-4a70d3bd42f3 (Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi2) is in state [Failed,None],
 err=resources failed to join: (no details)\n", 
      "geni_status": "Failed"

I am now creating sliver 'lngimi3' with the same RSpec, and will update the ticket with the results.

comment:3 Changed 9 years ago by ibaldin@renci.org

We can take a look, BUT (and this is important) - we cannot be responsible for problems with images we did not create. You should ask Jeanne Ohren - I think she uses this image frequently, to see if she had any problems. Ultimately it is for whoever created the image to resolve the problems with it. This is why we list contact information for the image.

comment:4 Changed 9 years ago by lnevers@bbn.com

The third attempt (slice=lngimi3) also failed in the same way:

      "orca_expires": "Tue Mar 12 16:07:42 UTC 2013", 
      "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+7814f003-b59d-49bf-a32d-cb97ddbf7fff#geni1", 
      "geni_error": "Reservation bb3ff0a9-edfd-44a6-b1eb-a4733c1c68c1 (Slice 
urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lngimi3) is in state [Failed,None],
 err=resources failed to join: (no details)\n", 
      "geni_status": "Failed"

The "lngimi1" and "lngimi3" slivers are still running.

comment:5 Changed 9 years ago by lnevers@bbn.com

I will talk to Jeanne Ohren and make sure that this is a valid image, but in each of the sliver creation cases one VM comes up and one fails. That should indicate that the image is "usable"?

comment:6 Changed 9 years ago by ibaldin@renci.org

If you try it with an image we created and it works, then the problem is with the image. We don't have an established mechanism for addressing these problems, but there is an implicit understanding that the responsibility for the image goes back to the image creator, not the operator of the rack or testbed.

comment:7 Changed 9 years ago by lnevers@bbn.com

According to Jeanne Ohren this is a valid image. Jeanne just created two slices with 2 VMs and 1 LAN using flukes in the GPO Rack via the GPO SM, just as I did with Omni. Both her slices were successful and the two VMs requested via flukes up. She is forwarding this issue to Mike Zink.

comment:8 Changed 9 years ago by lnevers@bbn.com

The failing VM used "<sliver_type name="m1.small">" and modifying this setting to "<sliver_type name="m1.large">" resolves the problem.

SUGGESTION:

Update the page https://wiki.exogeni.net/doku.php?id=public:experimenters:images to include size restrictions, if any exist. I will find out from the GIMI folks what the requirements for their image and update this ticket to capture the size requirements.

comment:9 Changed 9 years ago by ibaldin@renci.org

Added a note on the image.

comment:10 Changed 9 years ago by lnevers@bbn.com

Resolution: fixed
Status: newclosed

Thank you, closing ticket.

Note: See TracTickets for help on using tickets.