Opened 11 years ago

Last modified 11 years ago

#129 new

Experiments fails during provisioning for no image or obsolete request rather than reject at create

Reported by: lnevers@bbn.com Owned by: vjo@cs.duke.edu
Priority: minor Milestone:
Component: Experiment Version: SPIRAL5
Keywords: Cc:
Dependencies:

Description

When an image is requested that is obsolete or when an image is not specified in the RSpec, there is no checking. The sliver will eventually fail with an "Error during join for unit" after the sliver has been ticketed and during the configuration. Should the requests be rejected as invalid due when the rspec is requesting a node without specifying a disk_image or when an obsolete image is used?

-> Example: Sliver lnexo is a 1 vm sliver request with an rspec that does not include an image

$ omni.py sliverstatus -a eg-sm lnexo 
INFO:omni:Loading config file /home/lnevers/.gcf/omni_config
INFO:omni:Using control framework pg
INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo expires on 2012-12-11 00:00:00 UTC
INFO:omni:Substituting AM nickname eg-sm with URL https://geni.renci.org:11443/orca/xmlrpc, 
URN unspecified_AM_URN
INFO:omni:Status of Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo:
INFO:omni:Sliver status for Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo 
at AM URL https://geni.renci.org:11443/orca/xmlrpc
INFO:omni:{
  "geni_status": "failed", 
  "geni_urn": "urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo", 
  "geni_resources": [
    {
      "orca_expires": "Tue Dec 18 14:12:48 EST 2012", 
      "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+432e72ce-df8c-4514-bdc0-1d2a43600c46#geni1", 
      "geni_error": "Reservation 9eaa8d30-c7d2-4e47-ae3b-696fd7535e1b (Slice 
urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo) is in state [Failed,None], 
err=resources failed to join: Error during join for unit: 704EC8FB [1]: unable 
to create instance: exit code 1, \n", 
      "geni_status": "Failed"
    }
  ]
}
INFO:omni: ------------------------------------------------------------
INFO:omni: Completed sliverstatus:

  Options as run:
		aggregate: ['eg-sm']
		framework: pg

  Args: sliverstatus lnexo

  Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo expires on 2012-12-11 00:00:00 UTC
Returned status of slivers on 1 of 1 possible aggregates. 
INFO:omni: ============================================================

-> Sliver lnexo2 is a 1 vm sliver request with an rspec that does include an image that was supported previous to the upgrade.

$ omni.py sliverstatus -a eg-sm lnexo2
INFO:omni:Loading config file /home/lnevers/.gcf/omni_config
INFO:omni:Using control framework pg
INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo2 expires on 2012-12-05 21:10:47 UTC
INFO:omni:Substituting AM nickname eg-sm with URL https://geni.renci.org:11443/orca/xmlrpc, URN 
unspecified_AM_URN
INFO:omni:Status of Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo2:
INFO:omni:Sliver status for Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo2 at AM URL https://geni.renci.org:11443/orca/xmlrpc
INFO:omni:{
  "geni_status": "failed", 
  "geni_urn": "urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo2", 
  "geni_resources": [
    {
      "orca_expires": "Tue Dec 18 14:16:23 EST 2012", 
      "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+1f4cd917-db17-492a-af92-1322335c0625#geni1", 
      "geni_error": "Reservation 86558ca4-c593-4835-af1f-24c7c2e6d5c9 (Slice 
urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo2) is in state [Failed,None], 
err=resources failed to join: Error during join for unit: 71EA3B64 [1]: unable 
to create instance: exit code 1, \n", 
      "geni_status": "Failed"
    }
  ]
}
INFO:omni: ------------------------------------------------------------
INFO:omni: Completed sliverstatus:

  Options as run:
		aggregate: ['eg-sm']
		framework: pg

  Args: sliverstatus lnexo2

  Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+lnexo2 expires on 2012-12-05 21:10:47 UTC
Returned status of slivers on 1 of 1 possible aggregates. 
INFO:omni: ============================================================

Change History (5)

comment:1 Changed 11 years ago by lnevers@bbn.com

Checked status for the two error scenarios in this ticket and found the following:

Scenario 1: Sliver request for VM does not include a disk_image defined.

The request now fails at create sliver time and provides a helpful error:

Error from Aggregate: code 2: ERROR: Exception encountered: orca.ndl.NdlException: 
Node or NodeGroup 9399e54b-fa52-484d-aad0-be58f70c9d8d#geni1 does not specify an 
image. 

An error is returned, but a later attempt to create a sliver with the same name (~ 5 minute) showed a duplicate sliver URN. No clean up had occurred.

Scenario 2: Sliver request for VM includes an disk_image that does not exist at the specified URL.

The request to create a sliver is accepted and the sliver becomes ready but I am not able to login to the assigned node.

comment:2 Changed 11 years ago by ahelsing@bbn.com

Now that there is a default image, scenario 1 is not reachable.

comment:3 Changed 11 years ago by lnevers@bbn.com

Verified the two scenarios in this ticket:

Scenario 1: Sliver request for VM does not include a disk_image defined. RSpec contained:

<node client_id="geni1" component_manager_id="urn:publicid:IDN+bbnvmsite+authority+cm">
 <sliver_type name="m1.small">
 </sliver_type>
</node>

Result:

VM is created with image Linux debian 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 GNU/Linux

Scenario 2: Sliver request for VM that includes an disk_image that does not exist at the specified URL. Tested by using the previous RSpec and adding one line that requests the following image:

<disk_image name="http://geni-images.renci.org/images/standard/debian/does-not-exist.xml" version="42f53b64cfe44dd1607867f04b7b533bb67ade1e" />

Result: The request fails with:

  "geni_resources": [
    {
      "orca_expires": "Thu Apr 11 17:10:33 UTC 2013", 
      "geni_urn": "urn:publicid:IDN+exogeni.net:bbnvmsite+sliver+32852d34-feb4-442b-9085-1210b0586fae#geni1", 
      "geni_error": "Reservation debd626a-9bb6-415c-934c-ba8c69b7526b (Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+none3) is in state [Failed,None], err=resources failed to join: (no details)\n", 
      "geni_status": "Failed"
    }

Additional Scenario

In one of the attempts, where "name" is a non-existing URL and "version" from a valid image. Result: The image from the "version" field is loaded.

comment:4 Changed 11 years ago by ahelsing@bbn.com

Priority: majorminor

comment:5 Changed 11 years ago by ibaldin@renci.org

Owner: changed from somebody to vjo@cs.duke.edu

Behavior #3 (additional scenario) should be fixed. Victor will do it in the fullness of time. ImagePRoxy looks images up by hash, so if it is one it has seen, URL is ignored. It shouldn't be ignored.

Note: See TracTickets for help on using tickets.