Opened 6 years ago

Closed 5 years ago

#1049 closed (fixed)

Some errors should cause stitcher.py to exit

Reported by: lnevers@bbn.com Owned by: Aaron Helsinger
Priority: major Milestone:
Component: STITCHING Version: SPIRAL5
Keywords: Network Stitching Cc:
Dependencies:

Description

Some classes of aggregate failures that should make stitcher.py exit.

For example, while testing a scenario where I requested bandwidth that is beyond the maximum. My sliver fails (of course) but stitcher keeps trying.

$ stitcher.py createsliver ln1999999 ./stitch-capacity-1999999.rspec
12:31:11 INFO     stitcher: Loading config file /home/lnevers/.gcf/omni_config
12:31:11 INFO     stitcher: Using control framework pg
12:31:14 INFO     stitcher: <Aggregate urn:publicid:IDN+emulab.net+authority+cm> speaks AM API v3, but sticking with v2
12:31:14 INFO     stitcher: <Aggregate urn:publicid:IDN+utah.geniracks.net+authority+cm> speaks AM API v3, but sticking with v2
12:31:14 INFO     stitch.Aggregate: Writing to '/tmp/ln1999999-createsliver-request-11-emulab-net.xml'
12:31:14 INFO     stitch.Aggregate:
    Stitcher doing createsliver at https://www.emulab.net:12369/protogeni/xmlrpc/am
12:31:14 INFO     omni: Loading config file /home/lnevers/.gcf/omni_config
12:31:14 INFO     omni: Using control framework pg
12:31:15 INFO     omni: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ln1999999 expires on 2013-06-11 17:30:45 UTC
12:31:15 INFO     omni: Creating sliver(s) from rspec file /tmp/ln1999999-createsliver-request-11-emulab-net.xml for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ln1999999
12:32:14 ERROR    omni:  {'output': '*** ERROR: mapper: Reached run limit. Giving up.\nseed = 1370903756\nPhysical Graph: 276\nCalculating shortest paths on switch fabric.\nVirtual Graph: 2\nGenerating physical equivalence classes:204\nType precheck:\nType precheck passed.\nNode mapping precheck:\nNode mapping precheck succeeded\nPolicy precheck:\nPolicy precheck succeeded\nAnnealing.\nAdjusting dificulty estimate for fixed nodes, 1 remain.\nDoing melting run\nReverting: forced\nReverting to best solution\nDone\n   BEST SCORE:  4.6 in 16840 iters and 0.493181 seconds\nWith 1 violations\nIters to find best score: 12\nViolations: 1\n  unassigned:  0\n  pnode_load:  0\n no_connect:  1\n  link_users:  0\n  bandwidth:   0\n  desires: 0\n  vclass:      0\n  delay:       0\n  trivial mix: 0\n subnodes:    0\n  max_types:   0\n  endpoints:   0\nNodes:\nig-utah interconnect-instageni\npg-utah pc411\nEnd Nodes\nEdges:\nlinksimple/link/pg-utah:0,ig-utah:0 Mapping Failed\nEnd Edges\nEnd solution\nSummary:\ninterconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm\n ?+ram: used=128 total=0\n    ?+cpupercent: used=0 total=92\n ?+rampercent: used=0 total=80\npc411 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=pcvm\n    ?+cpu: used=0 total=2400\n    ?+ram: used=128 total=11008\n    ?+cpupercent: used=0 total=92\n ?+rampercent: used=0 total=80\nTotal physical nodes used: 2\nEnd summary\nASSIGN FAILED:\nType precheck passed.\nNode mapping precheck succeeded\nPolicy precheck succeeded\nAnnealing.\nAdjusting dificulty estimate for fixed nodes, 1 remain.\nDoing melting run\nReverting: forced\nReverting to best solution\nDone\n   BEST SCORE:  4.6 in 16840 iters and 0.493181 seconds\n  unassigned:  0\n pnode_load:  0\n  no_connect:  1\n  link_users:  0\n  bandwidth: 0\n  desires:     0\n  vclass:      0\n  delay:       0\n  trivial mix: 0\n  subnodes:    0\n  max_types:   0\n  endpoints:   0\n', 'code': {'protogeni_error_log': 'urn:publicid:IDN+emulab.net+log+8d0e01debe2f270931c7ba281ccf80c8', 'am_type': 'protogeni', 'geni_code': 2, 'am_code': 2, 'protogeni_error_url': 'https://www.emulab.net/spewlogfile.php3?logfile=8d0e01debe2f270931c7ba281ccf80c8'}, 'value': 'Could not map to resources'}
12:32:14 INFO     stitch.Aggregate: Got AMAPIError doing createsliver ln1999999 at <Aggregate urn:publicid:IDN+emulab.net+authority+cm>: AMAPIError: Error from Aggregate: code 2. protogeni AM code: 2: *** ERROR: mapper: Reached run limit. Giving up.
seed = 1370903756
Physical Graph: 276
Calculating shortest paths on switch fabric.
Virtual Graph: 2
Generating physical equivalence classes:204
Type precheck:
Type precheck passed.
Node mapping precheck:
Node mapping precheck succeeded
Policy precheck:
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  4.6 in 16840 iters and 0.493181 seconds
With 1 violations
Iters to find best score:  12
Violations: 1
  unassigned:  0
  pnode_load:  0
  no_connect:  1
  link_users:  0
  bandwidth:   0
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
Nodes:
ig-utah interconnect-instageni
pg-utah pc411
End Nodes
Edges:
linksimple/link/pg-utah:0,ig-utah:0 Mapping Failed
End Edges
End solution
Summary:
interconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm
    ?+ram: used=128 total=0
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
pc411 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=pcvm
    ?+cpu: used=0 total=2400
    ?+ram: used=128 total=11008
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
Total physical nodes used: 2
End summary
ASSIGN FAILED:
Type precheck passed.
Node mapping precheck succeeded
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  4.6 in 16840 iters and 0.493181 seconds
  unassigned:  0
  pnode_load:  0
  no_connect:  1
  link_users:  0
  bandwidth:   0
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
.
12:32:14 WARNING  stitcher: Stitching failed but will retry: Circuit reservation failed at <Aggregate urn:publicid:IDN+emulab.net+authority+cm> (AMAPIError: Error from Aggregate: code 2. protogeni AM code: 2: *** ERROR: mapper: Reached run limit. Giving up.
seed = 1370903756
Physical Graph: 276
Calculating shortest paths on switch fabric.
Virtual Graph: 2
Generating physical equivalence classes:204
Type precheck:
Type precheck passed.
Node mapping precheck:
Node mapping precheck succeeded
Policy precheck:
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  4.6 in 16840 iters and 0.493181 seconds
With 1 violations
Iters to find best score:  12
Violations: 1
  unassigned:  0
  pnode_load:  0
  no_connect:  1
  link_users:  0
  bandwidth:   0
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
Nodes:
ig-utah interconnect-instageni
pg-utah pc411
End Nodes
Edges:
linksimple/link/pg-utah:0,ig-utah:0 Mapping Failed
End Edges
End solution
Summary:
interconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm
    ?+ram: used=128 total=0
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
pc411 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=pcvm
    ?+cpu: used=0 total=2400
    ?+ram: used=128 total=11008
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
Total physical nodes used: 2
End summary
ASSIGN FAILED:
Type precheck passed.
Node mapping precheck succeeded
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  4.6 in 16840 iters and 0.493181 seconds
  unassigned:  0
  pnode_load:  0
  no_connect:  1
  link_users:  0
  bandwidth:   0
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
.). Try again from the SCS
12:32:14 INFO     stitcher: Pausing for 30 seconds for Aggregates to free up resources...


12:32:44 INFO     stitcher: Calling SCS for the 2th time...
12:32:44 INFO     stitch.Aggregate: Writing to '/tmp/ln1999999-createsliver-request-21-emulab-net.xml'
12:32:44 INFO     stitch.Aggregate:
    Stitcher doing createsliver at https://www.emulab.net:12369/protogeni/xmlrpc/am
12:32:44 INFO     omni: Loading config file /home/lnevers/.gcf/omni_config
12:32:44 INFO     omni: Using control framework pg
12:32:46 INFO     omni: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ln1999999 expires on 2013-06-11 17:30:45 UTC
12:32:46 INFO     omni: Creating sliver(s) from rspec file /tmp/ln1999999-createsliver-request-21-emulab-net.xml for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ln1999999
.......

There no reason for sticher to try again (and again, when a "ERROR: mapper" is received.

Will add more scenarios to this ticket as the occur.

Attachments (1)

stitch-capacity-1999999.rspec (1.3 KB) - added by lnevers@bbn.com 6 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 Changed 6 years ago by Aaron Helsinger

Status: newassigned

I emailed Jon D to check for valid rules for when to give up. Options:

  • If geni_code is 2 and am_type is protogeni and am_code is 2 and 'output' starts with '* ERROR: mapper' then give up
  • If geni_code is 2 and am_type is protogeni and am_code is 2 and value is 'Could not map to resources' then give up

Changed 6 years ago by lnevers@bbn.com

comment:2 Changed 6 years ago by Aaron Helsinger

This particular error is detected and we bail out now, as of commit 8b6191a

comment:3 Changed 6 years ago by lnevers@bbn.com

Using gcf-2.4-rc2:

This error does not cause stitcher to exit and should:

12:18:48 INFO     stitch.Aggregate: Got AMAPIError doing createsliver ig-gpo-ig-utah-9 at 
<Aggregate urn:publicid:IDN+emulab.net+authority+cm>: AMAPIError: Error from Aggregate: 
code 2. protogeni AM code: 2: *** ERROR: mapper: Reached run limit. Giving up.
seed = 1378962054
Physical Graph: 440
Calculating shortest paths on switch fabric.
Virtual Graph: 2
Generating physical equivalence classes:368
Type precheck:
Type precheck passed.
Node mapping precheck:
Node mapping precheck succeeded
Policy precheck:
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 0 remain.
All nodes are fixed.  No annealing.
Done
   BEST SCORE:  5.1 in 0 iters and 0.110285 seconds
With 1 violations
Iters to find best score:  0
Violations: 1
  unassigned:  0
  pnode_load:  0
  no_connect:  1
  link_users:  0
  bandwidth:   0
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
Nodes:
client-utah interconnect-instageni
server-gpo ion
End Nodes
Edges:
linksimple/link/server-gpo:0,client-utah:0 Mapping Failed
End Edges
End solution
Summary:
interconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm
    ?+ram: used=128 total=0
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
ion 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=bbgenivm
    ?+ram: used=128 total=0
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
Total physical nodes used: 2
End summary
ASSIGN FAILED:
Type precheck passed.
Node mapping precheck succeeded
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 0 remain.
All nodes are fixed.  No annealing.
Done
   BEST SCORE:  5.1 in 0 iters and 0.110285 seconds
  unassigned:  0
  pnode_load:  0
  no_connect:  1
  link_users:  0
  bandwidth:   0
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
.
12:18:48 WARNING  stitcher: Stitching failed but will retry: Circuit reservation 
failed at <Aggregate urn:publicid:IDN+emulab.net+authority+cm> (AMAPIError: Error 
from Aggregate: code 2. protogeni AM code: 2: *** ERROR: mapper: Reached run limit. Giving up.
seed = 1378962054
Physical Graph: 440
Calculating shortest paths on switch fabric.
Virtual Graph: 2
Generating physical equivalence classes:368
Type precheck:
Type precheck passed.
Node mapping precheck:
Node mapping precheck succeeded
Policy precheck:
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 0 remain.
All nodes are fixed.  No annealing.
Done
   BEST SCORE:  5.1 in 0 iters and 0.110285 seconds
With 1 violations
Iters to find best score:  0
Violations: 1
  unassigned:  0
  pnode_load:  0
  no_connect:  1
  link_users:  0
  bandwidth:   0
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
Nodes:
client-utah interconnect-instageni
server-gpo ion
End Nodes
Edges:
linksimple/link/server-gpo:0,client-utah:0 Mapping Failed
End Edges
End solution
Summary:
interconnect-instageni 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=interconnect-vm
    ?+ram: used=128 total=0
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
ion 1 vnodes, 0 nontrivial BW, 0 trivial BW, type=bbgenivm
    ?+ram: used=128 total=0
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
Total physical nodes used: 2
End summary
ASSIGN FAILED:
Type precheck passed.
Node mapping precheck succeeded
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 0 remain.
All nodes are fixed.  No annealing.
Done
   BEST SCORE:  5.1 in 0 iters and 0.110285 seconds
  unassigned:  0
  pnode_load:  0
  no_connect:  1
  link_users:  0
  bandwidth:   0
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
.). Try again from the SCS
12:18:48 WARNING  stitcher: Had reservation at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
12:18:48 INFO     stitch.Aggregate: Doing deletesliver at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
12:19:44 WARNING  stitcher: Deleted reservation at https://boss.instageni.gpolab.bbn.com:12369/protogeni/xmlrpc/am
12:19:44 INFO     stitcher: Calling SCS for the 5th and last time...
12:19:44 INFO     stitcher: Pausing for 120 seconds for Aggregates to free up resources...

comment:5 Changed 6 years ago by lnevers@bbn.com

Stitcher should exit on this InstaGENI error, waiting does not resolve the topology error:

16:23:40 ERROR    omni:  {'output': 'Could not verify topo', 'code': {'protogeni_error_log': 'urn:publicid:IDN+emulab.net+log+aa2fdf79351e00a44bdc0fbe276708af', 'am_type': 
'protogeni', 'geni_code': 2, 'am_code': 2, 'protogeni_error_url': 'https://www.emulab.net/spewlogfile.php3?logfile=aa2fdf79351e00a44bdc0fbe276708af'}, 
'value': 0}
16:23:40 INFO     stitch.Aggregate: Got AMAPIError doing createsliver lnstitch at 
<Aggregate urn:publicid:IDN+emulab.net+authority+cm>: AMAPIError: Error from 
Aggregate: code 2. protogeni AM code: 2: Could not verify topo.

comment:6 Changed 5 years ago by lnevers@bbn.com

Resolution: fixed
Status: assignedclosed

This is no longer an issue. Closing ticket.

Note: See TracTickets for help on using tickets.