Opened 11 years ago
Closed 10 years ago
#1049 closed (fixed)
Some errors should cause stitcher.py to exit
Reported by: | lnevers@bbn.com | Owned by: | Aaron Helsinger |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | STITCHING | Version: | SPIRAL5 |
Keywords: | Network Stitching | Cc: | |
Dependencies: |
Description
Some classes of aggregate failures that should make stitcher.py exit.
For example, while testing a scenario where I requested bandwidth that is beyond the maximum. My sliver fails (of course) but stitcher keeps trying.
$ stitcher.py createsliver ln1999999 ./stitch-capacity-1999999.rspec 12:31:11 INFO stitcher: Loading config file /home/lnevers/.gcf/omni_config 12:31:11 INFO stitcher: Using control framework pg 12:31:14 INFO stitcher: <Aggregate urn:publicid:IDN+emulab.net+authority+cm> speaks AM API v3, but sticking with v2 12:31:14 INFO stitcher: <Aggregate urn:publicid:IDN+utah.geniracks.net+authority+cm> speaks AM API v3, but sticking with v2 12:31:14 INFO stitch.Aggregate: Writing to '/tmp/ln1999999-createsliver-request-11-emulab-net.xml' 12:31:14 INFO stitch.Aggregate: Stitcher doing createsliver at https://www.emulab.net:12369/protogeni/xmlrpc/am 12:31:14 INFO omni: Loading config file /home/lnevers/.gcf/omni_config 12:31:14 INFO omni: Using control framework pg 12:31:15 INFO omni: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ln1999999 expires on 2013-06-11 17:30:45 UTC 12:31:15 INFO omni: Creating sliver(s) from rspec file /tmp/ln1999999-createsliver-request-11-emulab-net.xml for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ln1999999 12:32:14 ERROR omni: {'output': '*** ERROR: mapper: Reached run limit. Giving up.\nseed = 1370903756\nPhysical Graph: 276\nCalculating shortest paths on switch fabric.\nVirtual Graph: 2\nGenerating physical equivalence classes:204\nType precheck:\nType precheck passed.\nNode mapping precheck:\nNode mapping precheck succeeded\nPolicy precheck:\nPolicy precheck succeeded\nAnnealing.\nAdjusting dificulty estimate for fixed nodes, 1 remain.\nDoing melting run\nReverting: forced\nReverting to best solution\nDone\n BEST SCORE: 4.6 in 16840 iters and 0.493181 seconds\nWith 1 violations\nIters to find best score: 12\nViolations: 1\n unassigned: 0\n pnode_load: 0\n no_connect: 1\n link_users: 0\n bandwidth: 0\n desires: 0\n vclass: 0\n delay: 0\n trivial mix: 0\n subnodes: 0\n max_types: 0\n endpoints: 0\nNodes:\nig-utah interconnect-instageni\npg-utah pc411\nEnd Nodes\nEdges:\nlinksimple/link/pg-utah:0,ig-utah:0 Mapping Failed\nEnd Edges\nEnd solution\nSummary:\ninterconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm\n ?+ram: used=128 total=0\n ?+cpupercent: used=0 total=92\n ?+rampercent: used=0 total=80\npc411 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=pcvm\n ?+cpu: used=0 total=2400\n ?+ram: used=128 total=11008\n ?+cpupercent: used=0 total=92\n ?+rampercent: used=0 total=80\nTotal physical nodes used: 2\nEnd summary\nASSIGN FAILED:\nType precheck passed.\nNode mapping precheck succeeded\nPolicy precheck succeeded\nAnnealing.\nAdjusting dificulty estimate for fixed nodes, 1 remain.\nDoing melting run\nReverting: forced\nReverting to best solution\nDone\n BEST SCORE: 4.6 in 16840 iters and 0.493181 seconds\n unassigned: 0\n pnode_load: 0\n no_connect: 1\n link_users: 0\n bandwidth: 0\n desires: 0\n vclass: 0\n delay: 0\n trivial mix: 0\n subnodes: 0\n max_types: 0\n endpoints: 0\n', 'code': {'protogeni_error_log': 'urn:publicid:IDN+emulab.net+log+8d0e01debe2f270931c7ba281ccf80c8', 'am_type': 'protogeni', 'geni_code': 2, 'am_code': 2, 'protogeni_error_url': 'https://www.emulab.net/spewlogfile.php3?logfile=8d0e01debe2f270931c7ba281ccf80c8'}, 'value': 'Could not map to resources'} 12:32:14 INFO stitch.Aggregate: Got AMAPIError doing createsliver ln1999999 at <Aggregate urn:publicid:IDN+emulab.net+authority+cm>: AMAPIError: Error from Aggregate: code 2. protogeni AM code: 2: *** ERROR: mapper: Reached run limit. Giving up. seed = 1370903756 Physical Graph: 276 Calculating shortest paths on switch fabric. Virtual Graph: 2 Generating physical equivalence classes:204 Type precheck: Type precheck passed. Node mapping precheck: Node mapping precheck succeeded Policy precheck: Policy precheck succeeded Annealing. Adjusting dificulty estimate for fixed nodes, 1 remain. Doing melting run Reverting: forced Reverting to best solution Done BEST SCORE: 4.6 in 16840 iters and 0.493181 seconds With 1 violations Iters to find best score: 12 Violations: 1 unassigned: 0 pnode_load: 0 no_connect: 1 link_users: 0 bandwidth: 0 desires: 0 vclass: 0 delay: 0 trivial mix: 0 subnodes: 0 max_types: 0 endpoints: 0 Nodes: ig-utah interconnect-instageni pg-utah pc411 End Nodes Edges: linksimple/link/pg-utah:0,ig-utah:0 Mapping Failed End Edges End solution Summary: interconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm ?+ram: used=128 total=0 ?+cpupercent: used=0 total=92 ?+rampercent: used=0 total=80 pc411 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=pcvm ?+cpu: used=0 total=2400 ?+ram: used=128 total=11008 ?+cpupercent: used=0 total=92 ?+rampercent: used=0 total=80 Total physical nodes used: 2 End summary ASSIGN FAILED: Type precheck passed. Node mapping precheck succeeded Policy precheck succeeded Annealing. Adjusting dificulty estimate for fixed nodes, 1 remain. Doing melting run Reverting: forced Reverting to best solution Done BEST SCORE: 4.6 in 16840 iters and 0.493181 seconds unassigned: 0 pnode_load: 0 no_connect: 1 link_users: 0 bandwidth: 0 desires: 0 vclass: 0 delay: 0 trivial mix: 0 subnodes: 0 max_types: 0 endpoints: 0 . 12:32:14 WARNING stitcher: Stitching failed but will retry: Circuit reservation failed at <Aggregate urn:publicid:IDN+emulab.net+authority+cm> (AMAPIError: Error from Aggregate: code 2. protogeni AM code: 2: *** ERROR: mapper: Reached run limit. Giving up. seed = 1370903756 Physical Graph: 276 Calculating shortest paths on switch fabric. Virtual Graph: 2 Generating physical equivalence classes:204 Type precheck: Type precheck passed. Node mapping precheck: Node mapping precheck succeeded Policy precheck: Policy precheck succeeded Annealing. Adjusting dificulty estimate for fixed nodes, 1 remain. Doing melting run Reverting: forced Reverting to best solution Done BEST SCORE: 4.6 in 16840 iters and 0.493181 seconds With 1 violations Iters to find best score: 12 Violations: 1 unassigned: 0 pnode_load: 0 no_connect: 1 link_users: 0 bandwidth: 0 desires: 0 vclass: 0 delay: 0 trivial mix: 0 subnodes: 0 max_types: 0 endpoints: 0 Nodes: ig-utah interconnect-instageni pg-utah pc411 End Nodes Edges: linksimple/link/pg-utah:0,ig-utah:0 Mapping Failed End Edges End solution Summary: interconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm ?+ram: used=128 total=0 ?+cpupercent: used=0 total=92 ?+rampercent: used=0 total=80 pc411 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=pcvm ?+cpu: used=0 total=2400 ?+ram: used=128 total=11008 ?+cpupercent: used=0 total=92 ?+rampercent: used=0 total=80 Total physical nodes used: 2 End summary ASSIGN FAILED: Type precheck passed. Node mapping precheck succeeded Policy precheck succeeded Annealing. Adjusting dificulty estimate for fixed nodes, 1 remain. Doing melting run Reverting: forced Reverting to best solution Done BEST SCORE: 4.6 in 16840 iters and 0.493181 seconds unassigned: 0 pnode_load: 0 no_connect: 1 link_users: 0 bandwidth: 0 desires: 0 vclass: 0 delay: 0 trivial mix: 0 subnodes: 0 max_types: 0 endpoints: 0 .). Try again from the SCS 12:32:14 INFO stitcher: Pausing for 30 seconds for Aggregates to free up resources... 12:32:44 INFO stitcher: Calling SCS for the 2th time... 12:32:44 INFO stitch.Aggregate: Writing to '/tmp/ln1999999-createsliver-request-21-emulab-net.xml' 12:32:44 INFO stitch.Aggregate: Stitcher doing createsliver at https://www.emulab.net:12369/protogeni/xmlrpc/am 12:32:44 INFO omni: Loading config file /home/lnevers/.gcf/omni_config 12:32:44 INFO omni: Using control framework pg 12:32:46 INFO omni: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ln1999999 expires on 2013-06-11 17:30:45 UTC 12:32:46 INFO omni: Creating sliver(s) from rspec file /tmp/ln1999999-createsliver-request-21-emulab-net.xml for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ln1999999 .......
There no reason for sticher to try again (and again, when a "ERROR: mapper" is received.
Will add more scenarios to this ticket as the occur.
Attachments (1)
Change History (7)
comment:1 Changed 11 years ago by
Status: | new → assigned |
---|
Changed 11 years ago by
Attachment: | stitch-capacity-1999999.rspec added |
---|
comment:2 Changed 11 years ago by
This particular error is detected and we bail out now, as of commit 8b6191a
comment:3 Changed 11 years ago by
Using gcf-2.4-rc2:
This error does not cause stitcher to exit and should:
12:18:48 INFO stitch.Aggregate: Got AMAPIError doing createsliver ig-gpo-ig-utah-9 at <Aggregate urn:publicid:IDN+emulab.net+authority+cm>: AMAPIError: Error from Aggregate: code 2. protogeni AM code: 2: *** ERROR: mapper: Reached run limit. Giving up. seed = 1378962054 Physical Graph: 440 Calculating shortest paths on switch fabric. Virtual Graph: 2 Generating physical equivalence classes:368 Type precheck: Type precheck passed. Node mapping precheck: Node mapping precheck succeeded Policy precheck: Policy precheck succeeded Annealing. Adjusting dificulty estimate for fixed nodes, 0 remain. All nodes are fixed. No annealing. Done BEST SCORE: 5.1 in 0 iters and 0.110285 seconds With 1 violations Iters to find best score: 0 Violations: 1 unassigned: 0 pnode_load: 0 no_connect: 1 link_users: 0 bandwidth: 0 desires: 0 vclass: 0 delay: 0 trivial mix: 0 subnodes: 0 max_types: 0 endpoints: 0 Nodes: client-utah interconnect-instageni server-gpo ion End Nodes Edges: linksimple/link/server-gpo:0,client-utah:0 Mapping Failed End Edges End solution Summary: interconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm ?+ram: used=128 total=0 ?+cpupercent: used=0 total=92 ?+rampercent: used=0 total=80 ion 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=bbgenivm ?+ram: used=128 total=0 ?+cpupercent: used=0 total=92 ?+rampercent: used=0 total=80 Total physical nodes used: 2 End summary ASSIGN FAILED: Type precheck passed. Node mapping precheck succeeded Policy precheck succeeded Annealing. Adjusting dificulty estimate for fixed nodes, 0 remain. All nodes are fixed. No annealing. Done BEST SCORE: 5.1 in 0 iters and 0.110285 seconds unassigned: 0 pnode_load: 0 no_connect: 1 link_users: 0 bandwidth: 0 desires: 0 vclass: 0 delay: 0 trivial mix: 0 subnodes: 0 max_types: 0 endpoints: 0 . 12:18:48 WARNING stitcher: Stitching failed but will retry: Circuit reservation failed at <Aggregate urn:publicid:IDN+emulab.net+authority+cm> (AMAPIError: Error from Aggregate: code 2. protogeni AM code: 2: *** ERROR: mapper: Reached run limit. Giving up. seed = 1378962054 Physical Graph: 440 Calculating shortest paths on switch fabric. Virtual Graph: 2 Generating physical equivalence classes:368 Type precheck: Type precheck passed. Node mapping precheck: Node mapping precheck succeeded Policy precheck: Policy precheck succeeded Annealing. Adjusting dificulty estimate for fixed nodes, 0 remain. All nodes are fixed. No annealing. Done BEST SCORE: 5.1 in 0 iters and 0.110285 seconds With 1 violations Iters to find best score: 0 Violations: 1 unassigned: 0 pnode_load: 0 no_connect: 1 link_users: 0 bandwidth: 0 desires: 0 vclass: 0 delay: 0 trivial mix: 0 subnodes: 0 max_types: 0 endpoints: 0 Nodes: client-utah interconnect-instageni server-gpo ion End Nodes Edges: linksimple/link/server-gpo:0,client-utah:0 Mapping Failed End Edges End solution Summary: interconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm ?+ram: used=128 total=0 ?+cpupercent: used=0 total=92 ?+rampercent: used=0 total=80 ion 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=bbgenivm ?+ram: used=128 total=0 ?+cpupercent: used=0 total=92 ?+rampercent: used=0 total=80 Total physical nodes used: 2 End summary ASSIGN FAILED: Type precheck passed. Node mapping precheck succeeded Policy precheck succeeded Annealing. Adjusting dificulty estimate for fixed nodes, 0 remain. All nodes are fixed. No annealing. Done BEST SCORE: 5.1 in 0 iters and 0.110285 seconds unassigned: 0 pnode_load: 0 no_connect: 1 link_users: 0 bandwidth: 0 desires: 0 vclass: 0 delay: 0 trivial mix: 0 subnodes: 0 max_types: 0 endpoints: 0 .). Try again from the SCS 12:18:48 WARNING stitcher: Had reservation at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am 12:18:48 INFO stitch.Aggregate: Doing deletesliver at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am 12:19:44 WARNING stitcher: Deleted reservation at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am 12:19:44 INFO stitcher: Calling SCS for the 5th and last time... 12:19:44 INFO stitcher: Pausing for 120 seconds for Aggregates to free up resources...
comment:5 Changed 10 years ago by
Stitcher should exit on this InstaGENI error, waiting does not resolve the topology error:
16:23:40 ERROR omni: {'output': 'Could not verify topo', 'code': {'protogeni_error_log': 'urn:publicid:IDN+emulab.net+log+aa2fdf79351e00a44bdc0fbe276708af', 'am_type': 'protogeni', 'geni_code': 2, 'am_code': 2, 'protogeni_error_url': 'https://www.emulab.net/spewlogfile.php3?logfile=aa2fdf79351e00a44bdc0fbe276708af'}, 'value': 0} 16:23:40 INFO stitch.Aggregate: Got AMAPIError doing createsliver lnstitch at <Aggregate urn:publicid:IDN+emulab.net+authority+cm>: AMAPIError: Error from Aggregate: code 2. protogeni AM code: 2: Could not verify topo.
comment:6 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
This is no longer an issue. Closing ticket.
I emailed Jon D to check for valid rules for when to give up. Options: