Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

#1031 closed (wontfix)

"No such site" error reported when new project is added to Stitching Compute Service (SCS)

Reported by: Aaron Helsinger Owned by: xyang@maxgigapop.net
Priority: major Milestone:
Component: I2AM Version: SPIRAL5
Keywords: Cc: lnevers@bbn.com, tlehman@maxgigapop.net, xyang@maxgigapop.net
Dependencies:

Description

The ION aggregate seems to die if the project name (sub-authority) contains a hyphen. It also dies if there is a hyphen in the client_ids.

Sample errors:

ERROR:omni: {'output': "Internal API error: <Fault 102: 'person_id 2: 
AddPersonToSite: Invalid argument: No such site'>", 'geni_api': 2, 
'code': {'am_type': 'sfa', 'geni_code': 5, 'am_code': 5}, 'value': ''}
ERROR:omni: {'output': "Internal API error: <Fault 102: 'person_id 2: 
AddSite: Invalid argument: Login base must consist only of lowercase 
ASCII letters or numbers'>", 'geni_api': 2, 'code': {'am_type': 'sfa', 
'geni_code': 5, 'am_code': 5}, 'value': ''}

Attachments (2)

stitch-pg-utah-pg-uky.rspec (1.4 KB) - added by lnevers@bbn.com 7 years ago.
RSpec used when problem was found.
sfa-2.0-9-patch-11.diff (658 bytes) - added by xyang@maxgigapop.net 7 years ago.

Download all attachments as: .zip

Change History (14)

Changed 7 years ago by lnevers@bbn.com

Attachment: stitch-pg-utah-pg-uky.rspec added

RSpec used when problem was found.

comment:1 Changed 7 years ago by xyang@maxgigapop.net

Status: newassigned

Verified that hyphen in urn root authority portion caused the problem. Example below:

urn:publicid:IDN+panther:MAX-GENI+slice+stitch-test1

which is translated into hrn 'panther.MAX-GENI.stitch-test1'. MAX-GENI is considered a login base that causes

Code exception trace:

2013-06-04 15:19:39,676 - ERROR - XmlrpcApi.handle has caught Exception BEG TRACEBACK
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/sfa/server/xmlrpcapi.py", line 153, in handle
    result = self.call(source, method, *args)
  File "/usr/lib/python2.6/site-packages/sfa/server/xmlrpcapi.py", line 125, in call
    return function(*args)
  File "/usr/lib/python2.6/site-packages/sfa/util/method.py", line 79, in __call__
    result = self.call(*args, **kwds)
  File "/usr/lib/python2.6/site-packages/sfa/methods/CreateSliver.py", line 58, in call
    result = self.api.manager.CreateSliver(self.api, slice_xrn, creds, rspec, users, options)
  File "/usr/lib/python2.6/site-packages/sfa/managers/aggregate_manager_max.py", line 378, in CreateSliver
    ret = self.create_slice(api, slice_xrn, creds, rspec_string, users)
  File "/usr/lib/python2.6/site-packages/sfa/managers/aggregate_manager_max.py", line 245, in create_slice
    self.prepare_slice(api, xrn, cred, users)
  File "/usr/lib/python2.6/site-packages/sfa/managers/aggregate_manager_max.py", line 173, in prepare_slice
    site = slices.verify_site(hrn, slice_record, peer, sfa_peer)
  File "/usr/lib/python2.6/site-packages/sfa/plc/plslices.py", line 300, in verify_site
    site['site_id'] = self.driver.shell.AddSite(site)
  File "/usr/lib/python2.6/site-packages/sfa/plc/plshell.py", line 83, in func
    result=getattr(self.proxy, actual_name)(self.plauth, *args, **kwds)
  File "/usr/share/plc_api/PLC/Shell.py", line 49, in __call__
    return self.func(*args, **kwds)
  File "/usr/share/plc_api/PLC/Method.py", line 122, in __call__
    raise fault
PLCInvalidArgument: <Fault 102: 'person_id 2: AddSite: Invalid argument: Login base must consist only of lowercase ASCII letters or numbers'>

It is hard to change MyPLC. Will try a workaround by translating the hrn.

comment:2 Changed 7 years ago by xyang@maxgigapop.net

Cc: xyang@maxgigapop.net added; ckotil@grnoc.iu.edu removed
Owner: changed from xyang@maxgigapop.net to ckotil@grnoc.iu.edu
Status: assignednew

Fixed with an SFA patch. Reassigned to Chad to apply the patch.

Changed 7 years ago by xyang@maxgigapop.net

Attachment: sfa-2.0-9-patch-11.diff added

comment:3 Changed 7 years ago by ckotil@grnoc.iu.edu

Resolution: fixed
Status: newclosed

The geni-am.net.internet2.edu instance of sfa already appears to have this patch applied. I worked with Xi outside of this ticket, and the patch must have already been applied when I loaded a new /usr/lib/python2.6/site-packages/sfa/managers/aggregate_manager_max.py

--Chad

comment:4 Changed 7 years ago by lnevers@bbn.com

Resolution: fixed
Status: closedreopened

The problem is still happening. I created a sliver after the last ticket update, in the project ln-prj and still see the failure:

INFO:omni:Creating sliver(s) from rspec file /tmp/lnstitch1-createsliver-request-
11-ion-internet2-edu.xml for slice urn:publicid:IDN+ch.geni.net:ln-prj+slice+lnstitch1
ERROR:omni: {'output': "Internal API error: <Fault 102: 'person_id 2: AddSite: Invalid 
argument: Login base must consist only of lowercase ASCII letters or numbers'>", 
'geni_api': 2, 'code': {'am_type': 'sfa', 'geni_code': 5, 'am_code': 5}, 'value': ''}
INFO:stitch.Aggregate:Got AMAPIError doing createsliver lnstitch1 at <Aggregate 
urn:publicid:IDN+ion.internet2.edu+authority+cm>: AMAPIError: Error from Aggregate: 
code 5. sfa AM code: 5: Internal API error: <Fault 102: 'person_id 2: AddSite: Invalid 
argument: Login base must consist only of lowercase ASCII letters or numbers'>.

comment:5 Changed 7 years ago by lnevers@bbn.com

I re-ran the test with the project using an hypen in its name, but ran into this new issue:

INFO:omni:Slice urn:publicid:IDN+ch.geni.net:ln-prj+slice+lnstitch expires 
within 1 day on 2013-06-06 21:03:56 UTC
INFO:omni:Creating sliver(s) from rspec file /tmp/lnstitch-createsliver-request-11-ion-internet2-edu.xml 
for slice urn:publicid:IDN+ch.geni.net:ln-prj+slice+lnstitch

ERROR:omni: {'output': "Internal API error: <Fault 102: 'person_id 2: 
AddPersonToSite: Invalid argument: No such site'>", 'geni_api': 2, 'code': 
{'am_type': 'sfa', 'geni_code': 5, 'am_code': 5}, 'value': ''}

INFO:stitch.Aggregate:Got AMAPIError doing createsliver lnstitch at 
<Aggregate urn:publicid:IDN+ion.internet2.edu+authority+cm>: AMAPIError: 
Error from Aggregate: code 5. sfa AM code: 5: Internal API error: <Fault 102: 
'person_id 2: AddPersonToSite: Invalid argument: No such site'>.

WARNING:stitcher:Stitching failed but will retry: Circuit reservation failed at 
<Aggregate urn:publicid:IDN+ion.internet2.edu+authority+cm> (AMAPIError: Error from
 Aggregate: code 5. sfa AM code: 5: Internal API error: <Fault 102: 'person_id 2: 
AddPersonToSite: Invalid argument: No such site'>.). Try again from the SCS

comment:6 Changed 7 years ago by lnevers@bbn.com

Please note that the <Fault 102: 'person_id 2: AddPersonToSite: Invalid argument: No such site'> does not prevent the sliver from being set up.

comment:7 Changed 7 years ago by xyang@maxgigapop.net

This only shows up the first time a new site is added. In this case it means a new Project from GENI portal. Repeated operation for same and new slices under the same project had no problem.

It might be due to delay inside MyPLC for adding site from GENI portal. We were right in the middle or GENI portal change. The sites / projects we tested were already in MyPLC through previous authority 'panther'. That might also be a reason.

I would suggest live with this for now.

comment:8 Changed 6 years ago by lnevers@bbn.com

Summary: ION AM dies when project or client_id has a hyphen"No such site" error reported when new project is added to Stitching Compute Service (SCS)

Updating description to capture the remaining issue for this ticket, which is not planned to be fixed.

The error is reported one time only for the initial introduction of a project and does not happen on later attempts.

comment:9 Changed 6 years ago by ckotil@grnoc.iu.edu

Owner: changed from ckotil@grnoc.iu.edu to xyang@maxgigapop.net
Status: reopenednew

comment:10 Changed 6 years ago by xyang@maxgigapop.net

Resolution: wontfix
Status: newclosed

We decided to live with it and will see what we can do with next SFA upgrade.

comment:11 Changed 6 years ago by lnevers@bbn.com

I Know this ticket is closed as "wontfix", but this bug makes it so each new user runs into a ten minutes delay when they run their first stitching attempt, which is a lousy way to be introduced to Stitching!

Today's example from Bill Owens:

11:44:21 INFO     stitch.Aggregate:
        Stitcher doing createsliver at http://geni-am.net.internet2.edu:12346
11:44:22 ERROR    omni:  {'output': "Internal API error: <Fault 102: 'person_id 2: 
AddPersonToSite: Invalid argument: No such site'>", 'geni_api': 2, 'code': {'am_type': 
'sfa', 'geni_code': 5, 'am_code': 5}, 'value': ''}
11:44:22 INFO     stitch.Aggregate: Got AMAPIError doing createsliver stitchtest at 
<Aggregate urn:publicid:IDN+ion.internet2.edu+authority+am>: AMAPIError: Error from 
Aggregate: code 5. sfa AM code: 5: Internal API error: <Fault 102: 'person_id 2: 
AddPersonToSite: Invalid argument: No such site'>.
11:44:22 WARNING  stitcher: Stitching failed but will retry: Circuit reservation 
failed at <Aggregate urn:publicid:IDN+ion.internet2.edu+authority+am> (AMAPIError: 
Error from Aggregate: code 5. sfa AM code: 5: Internal API error: <Fault 102: 
'person_id 2: AddPersonToSite: Invalid argument: No such site'>.). Try again from the SCS
11:44:22 WARNING  stitcher: Had reservation at https://www.instageni.nysernet.org:12369/protogeni/xmlrpc/am
11:44:22 INFO     stitch.Aggregate: Doing deletesliver at https://www.instageni.nysernet.org:12369/protogeni/xmlrpc/am
11:45:15 WARNING  stitcher: Deleted reservation at https://www.instageni.nysernet.org:12369/protogeni/xmlrpc/am
11:45:15 WARNING  stitcher: Had reservation at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
11:45:15 INFO     stitch.Aggregate: Doing deletesliver at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
11:46:16 WARNING  stitcher: Deleted reservation at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
11:46:16 INFO     stitcher: Calling SCS for the 2rd time...
11:46:16 INFO     stitcher: Pausing for 600 seconds for Aggregates to free up resources...

Also can anything be done with the next SFA upgrade?

comment:12 Changed 6 years ago by Aaron Helsinger

I am adding some logic to the next version of Omni so that when stitcher sees this, it retries for you more quickly. Fixing this via a new SFA would be better of course.

Note: See TracTickets for help on using tickets.