Opened 8 years ago

Closed 8 years ago

#33 closed (fixed)

Failure to allocate resource while attempting to create sliver

Reported by: lnevers@bbn.com Owned by: somebody
Priority: major Milestone: IG-EXP-3
Component: AM Version: SPIRAL4
Keywords: vm support Cc:
Dependencies:

Description (last modified by lnevers@bbn.com)

Background: Listresources showed large number of pcvm slot counts available before the test:

  • pc5 had 97 slots
  • pc3 had 100 slots

Test sequence:

  1. Created 1 sliver named 25vmslice1 with 25 VMs without problems, with the following allocation: 10 VM on pc5, 10 VM on pc3 and 5 VMs on pc4.
  1. Created a second sliver named 25vmslice2 with 25 VMs which caused the following error:
Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+25vmslice2 expires on 2012-05-26 14:05:32 UTC
Asked https://boss.utah.geniracks.net/protogeni/xmlrpc/am/2.0 to reserve resources. No manifest Rspec returned. *** ERROR: mapper: Reached run limit. Giving up.
seed = 1338050345
Physical Graph: 6
Calculating shortest paths on switch fabric.
Virtual Graph: 25
Generating physical equivalence classes:6
Type precheck:
Type precheck passed.
Node mapping precheck:
Node mapping precheck succeeded
Policy precheck:
Policy precheck succeeded
Annealing.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  18.71 in 49000 iters and 0.307011 seconds
With 1 violations
Iters to find best score:  48288
Violations: 1
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   1
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
Nodes:
VM pc5
VM-0 pc5
VM-1 pc1
VM-10 pc5
VM-11 pc2
VM-12 pc1
VM-13 pc2
VM-14 pc3
VM-15 pc3
VM-16 pc3
VM-19 pc2
VM-2 pc1
VM-20 pc5
VM-21 pc5
VM-22 pc1
VM-23 pc2
VM-24 pc3
VM-26 pc3
VM-27 pc3
VM-3 pc2
VM-4 pc5
VM-5 pc3
VM-6 pc3
VM-7 pc3
VM-9 pc5
End Nodes
Edges:
linksimple/lan0/VM:0,VM-0:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linksimple/lan1/VM-0:1,VM-1:0 intraswitch link-pc5:eth2-procurve2:(null) (pc5/eth2,(null)) link-pc1:eth1-procurve2:(null) (pc1/eth1,(null))
linksimple/lan2/VM-1:1,VM-2:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linksimple/lan24/VM-9:0,VM-20:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linksimple/lan25/VM:1,VM-9:1 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linksimple/lan26/VM-9:2,VM-10:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linksimple/lan27/VM-10:1,VM-11:0 intraswitch link-pc5:eth2-procurve2:(null) (pc5/eth2,(null)) link-pc2:eth1-procurve2:(null) (pc2/eth1,(null))
linksimple/lan28/VM-11:1,VM-12:0 intraswitch link-pc2:eth1-procurve2:(null) (pc2/eth1,(null)) link-pc1:eth1-procurve2:(null) (pc1/eth1,(null))
linksimple/lan29/VM-12:1,VM-13:0 intraswitch link-pc1:eth1-procurve2:(null) (pc1/eth1,(null)) link-pc2:eth1-procurve2:(null) (pc2/eth1,(null))
linksimple/lan3/VM-2:1,VM-3:0 intraswitch link-pc1:eth1-procurve2:(null) (pc1/eth1,(null)) link-pc2:eth1-procurve2:(null) (pc2/eth1,(null))
linksimple/lan30/VM-13:1,VM-14:0 intraswitch link-pc2:eth1-procurve2:(null) (pc2/eth1,(null)) link-pc3:eth2-procurve2:(null) (pc3/eth2,(null))
linksimple/lan31/VM-14:1,VM-15:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan32/VM-15:1,VM-16:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan4/VM-3:1,VM-4:0 intraswitch link-pc2:eth1-procurve2:(null) (pc2/eth1,(null)) link-pc5:eth2-procurve2:(null) (pc5/eth2,(null))
linksimple/lan5/VM-4:1,VM-5:0 intraswitch link-pc5:eth2-procurve2:(null) (pc5/eth2,(null)) link-pc3:eth2-procurve2:(null) (pc3/eth2,(null))
linksimple/lan54/VM-16:1,VM-26:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan55/VM-27:1,VM-26:1 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan58/VM-24:0,VM-27:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan59/VM-23:0,VM-24:1 intraswitch link-pc2:eth1-procurve2:(null) (pc2/eth1,(null)) link-pc3:eth2-procurve2:(null) (pc3/eth2,(null))
linksimple/lan6/VM-5:1,VM-6:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan65/VM-20:1,VM-21:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linksimple/lan66/VM-21:1,VM-19:0 intraswitch link-pc5:eth2-procurve2:(null) (pc5/eth2,(null)) link-pc2:eth1-procurve2:(null) (pc2/eth1,(null))
linksimple/lan67/VM-19:1,VM-22:0 intraswitch link-pc2:eth1-procurve2:(null) (pc2/eth1,(null)) link-pc1:eth1-procurve2:(null) (pc1/eth1,(null))
linksimple/lan68/VM-22:1,VM-23:1 intraswitch link-pc1:eth1-procurve2:(null) (pc1/eth1,(null)) link-pc2:eth1-procurve2:(null) (pc2/eth1,(null))
linksimple/lan7/VM-6:1,VM-7:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan72/VM-6:2,VM-16:2 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan73/VM-5:2,VM-15:2 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan74/VM-4:2,VM-14:2 intraswitch link-pc5:eth2-procurve2:(null) (pc5/eth2,(null)) link-pc3:eth2-procurve2:(null) (pc3/eth2,(null))
linksimple/lan75/VM-3:2,VM-13:2 trivial pc2:loopback (pc2/null,(null)) pc2:loopback (pc2/null,(null)) 
linksimple/lan76/VM-2:2,VM-12:2 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linksimple/lan77/VM-1:2,VM-11:2 intraswitch link-pc1:eth1-procurve2:(null) (pc1/eth1,(null)) link-pc2:eth3-procurve2:(null) (pc2/eth3,(null))
linksimple/lan78/VM-0:2,VM-10:2 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linksimple/lan79/VM-10:3,VM-21:2 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linksimple/lan80/VM-11:3,VM-19:2 trivial pc2:loopback (pc2/null,(null)) pc2:loopback (pc2/null,(null)) 
linksimple/lan81/VM-12:3,VM-22:2 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linksimple/lan82/VM-13:3,VM-23:2 trivial pc2:loopback (pc2/null,(null)) pc2:loopback (pc2/null,(null)) 
linksimple/lan83/VM-14:3,VM-24:2 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linksimple/lan84/VM-15:3,VM-27:2 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
End Edges
End solution
Summary:
procurve2 0 vnodes, 2800000 nontrivial BW, 0 trivial BW, type=(null)
pc3 9 vnodes, 400000 nontrivial BW, 1100000 trivial BW, type=pcvm
    400000 link-pc3:eth2-procurve2:(null)
pc5 7 vnodes, 600000 nontrivial BW, 700000 trivial BW, type=pcvm
    600000 link-pc5:eth2-procurve2:(null)
pc1 4 vnodes, 700000 nontrivial BW, 300000 trivial BW, type=pcvm
    700000 link-pc1:eth1-procurve2:(null)
    ?+virtpercent: used=0 total=100
    ?+cpu: used=0 total=2666
    ?+ram: used=0 total=3574
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
pc2 5 vnodes, 1100000 nontrivial BW, 300000 trivial BW, type=pcvm
    1000000 link-pc2:eth1-procurve2:(null)
    100000 link-pc2:eth3-procurve2:(null)
    ?+virtpercent: used=0 total=100
    ?+cpu: used=0 total=2666
    ?+ram: used=0 total=3574
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
Total physical nodes used: 4
End summary
ASSIGN FAILED:
Type precheck passed.
Node mapping precheck succeeded
Policy precheck succeeded
Annealing.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  18.71 in 49000 iters and 0.307011 seconds
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   1
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0

Change History (12)

comment:1 Changed 8 years ago by lnevers@bbn.com

Another instance of the assignment failure occurred while running a 10 experiments with 10 VM test. Test was started with plenty of resources available (pc5 had 97 slots and pc3 had 100 slots). Here is the sequence of events and allocation based on sliverstatus:

  1. Setup 1st sliver 10vmslice1 - OK - Allocation: 10 VMs on pc3
  2. Setup 2nd sliver 10vmslice2 - OK - Allocation: 10 VMs on pc5
  3. Setup 3rd sliver 10vmslice3 - OK - Allocation: 10 VMs on pc5
  4. Setup 4th sliver 10vmslice4 - OK - Allocation: 10 VMs on pc5
  5. Setup 5th sliver 10vmslice5 - OK - Allocation: 10 VMs on pc3
  6. Setup 6th sliver 10vmslice6 - OK - Allocation: 10 VMs on pc3
  7. Setup 7th sliver 10vmslice7 - OK - Allocation: 10 VMs on pc4
  8. Setup 8th sliver 10vmslice8 - OK - Allocation: 10 VMs on pc2
  9. Setup 9th sliver 10vmslice9 - FAIL - Error below:
  Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+10vmslice9 expires on 2012-05-26 14:48:30 UTC
Asked https://boss.utah.geniracks.net/protogeni/xmlrpc/am/2.0 to reserve resources. No manifest Rspec returned. *** ERROR: mapper: Reached run limit. Giving up.
seed = 1337957296
Physical Graph: 5
Calculating shortest paths on switch fabric.
Virtual Graph: 11
Generating physical equivalence classes:5
Type precheck:
Type precheck passed.
Node mapping precheck:
Node mapping precheck succeeded
Policy precheck:
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  11.75 in 17000 iters and 0.140798 seconds
With 1 violations
Iters to find best score:  1
Violations: 1
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   1
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
Nodes:
VM-1 pc1
VM-10 pc1
VM-2 pc1
VM-3 pc1
VM-4 pc1
VM-5 pc1
VM-6 pc1
VM-7 pc1
VM-8 pc1
VM-9 pc1
lan/Lan pc1
End Nodes
Edges:
linklan/Lan/VM-1:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-2:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-3:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-4:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-5:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-6:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-7:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-8:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-9:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-10:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
End Edges
End solution
Summary:
pc1 11 vnodes, 0 nontrivial BW, 1000000 trivial BW, type=pcvm
    ?+virtpercent: used=0 total=100
    ?+cpu: used=0 total=2666
    ?+ram: used=0 total=3574
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
Total physical nodes used: 1
End summary
ASSIGN FAILED:
Type precheck passed.
Node mapping precheck succeeded
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  11.75 in 17000 iters and 0.140798 seconds
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   1
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
 
INFO:omni: ============================================================

comment:2 Changed 8 years ago by lnevers@bbn.com

On 5/25/12 11:40 AM, Leigh Stoller wrote:

bandwidth: 1

So here is the issue; Creating a 10 node mesh of 100Mb links requires an aggregate bandwidth of 1GB. Not cause you are going to actually use that, but the resource mapper cannot make any assumptions in the absence of other information.

This is why you get to provide a bandwidth in your rspec, to inform the mapper what you really want to do.

Diving deeper for those who are interested; this is a lan of containers on the same physical node, and there is some limit to the amount of traffic that can be sent over the loopback device between containers. At some point the physical node will no longer be able to keep up, and so we set a limit on what you can ask for. At the moment that number is set a lower then it probably should be (at 400Mb).

Bottom line; I bumped that to 1Gb which should allow your rspec to map. These nodes are pretty beefy, so I imagine they can keep up.

Lbs

The message reported to the experimenter is somewhat cryptic and I missed the bandwidth violation which was on line 25 out of 145 lines (not including the omni output).

I realized we are still developing/debugging, but I am going to ask anyways... Are there plans to modify results to provide a more intuitive output?

comment:3 Changed 8 years ago by lnevers@bbn.com

Re-ran the 10experiments with 10 VMs assuming that the configuration changes from last Friday would handle the bandwidth requirements. Before starting, verified that both shared nodes had 99 slot available.

Set up the first 8 experiments without problem. On the createsliver for the 9th experiment (10vmslice9) fails with this error:

Asked https://boss.utah.geniracks.net/protogeni/xmlrpc/am/2.0 to reserve resources. No manifest Rspec returned. *** ERROR: mapper: Reached run limit. Giving up.
seed = 1338437282
Physical Graph: 4
Calculating shortest paths on switch fabric.
Virtual Graph: 11
Generating physical equivalence classes:4
Type precheck:
Type precheck passed.
Node mapping precheck:
Node mapping precheck succeeded
Policy precheck:
Policy precheck succeeded
Annealing.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  3 in 17000 iters and 0.10192 seconds
With 1 violations
Iters to find best score:  338
Violations: 1
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   0
  desires:     1
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
Nodes:
VM-1 pc3
VM-10 pc3
VM-2 pc3
VM-3 pc3
VM-4 pc3
VM-5 pc3
VM-6 pc3
VM-7 pc3
VM-8 pc3
VM-9 pc3
lan/Lan pc3
End Nodes
Edges:
linklan/Lan/VM-1:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linklan/Lan/VM-2:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linklan/Lan/VM-3:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linklan/Lan/VM-4:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linklan/Lan/VM-5:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linklan/Lan/VM-6:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linklan/Lan/VM-7:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linklan/Lan/VM-8:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linklan/Lan/VM-9:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
linklan/Lan/VM-10:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
End Edges
End solution
Summary:
pc3 11 vnodes, 0 nontrivial BW, 1000000 trivial BW, type=pcvm
Total physical nodes used: 1
End summary
ASSIGN FAILED:
Type precheck passed.
Node mapping precheck succeeded
Policy precheck succeeded
Annealing.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  3 in 17000 iters and 0.10192 seconds
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   0
  desires:     1
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
 

comment:4 Changed 8 years ago by lnevers@bbn.com

Here is the VM allocation for the 10 experiment with 10 VMs each, in case it is of interest to anyone:

  • 10vmslice1 - 10 VMs on pc5
  • 10vmslice2 - 10 VMs on pc3
  • 10vmslice3 - 10 VMs on pc5
  • 10vmslice4 - 10 VMs on pc3
  • 10vmslice5 - 7 VMs on pc5 + 3 VMs on pc3
  • 10vmslice6 - 2 VMs on pc5 + 6 VMs on pc3 + 2 VMs on pc1
  • 10vmslice7 - 10 VMs on pc2
  • 10vmslice8 - 10 VMs on pc4

comment:5 Changed 8 years ago by lnevers@bbn.com

Unable to create 1 experiment with 25 VMs when starting with the following available resources:

  • 100 pcvm slots available on shared nodes pc3
  • 100 pcvm slots available on shared nodes pc5
  • pc1, pc2, and pc4 not available

The attempt to create a sliver (25vmslice1) failed with the following error:

*** Type precheck failed!*** Type precheck failed!*** ERROR: mapper: Unretriable error. Giving up.
seed = 1338507038
Physical Graph: 4
Calculating shortest paths on switch fabric.
Virtual Graph: 25
Generating physical equivalence classes:4
Type precheck:
  *** 25 nodes of type pcvm requested, but only 20 available nodes of type pcvm found
*** Type precheck failed!
ASSIGN FAILED:
  *** 25 nodes of type pcvm requested, but only 20 available nodes of type pcvm found
*** Type precheck failed!

I expected to be able to get 25 nodes across the two shared nodes (pc3 and pc5).

comment:6 Changed 8 years ago by lnevers@bbn.com

Description: modified (diff)

Backing off from testing!

With the following resources available:

  • 100 pcvm slots available on shared nodes pc3
  • 98 pcvm slots available on shared nodes pc5
  • pc1, pc2, and pc4 not available

Tried to create one sliver with 20 nodes (20vmslice1), which resulted in the same failure reported yesterday afternoon due (desires:1).

*** ERROR: mapper: Reached run limit. Giving up.
seed = 1338510636
Physical Graph: 4
Calculating shortest paths on switch fabric.
Virtual Graph: 21
Generating physical equivalence classes:4
Type precheck:
Type precheck passed.
Node mapping precheck:
Node mapping precheck succeeded
Policy precheck:
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  5.42 in 17000 iters and 0.470627 seconds
With 1 violations
Iters to find best score:  1
Violations: 1
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   0
  desires:     1
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
Nodes:
VM-1 pc3
VM-10 pc5
VM-11 pc5
VM-12 pc3
VM-13 pc5
VM-14 pc5
VM-15 pc3
VM-16 pc5
VM-17 pc3
VM-18 pc5
VM-19 pc5
VM-2 pc5
VM-20 pc3
VM-3 pc3
VM-4 pc5
VM-5 pc3
VM-6 pc5
VM-7 pc3
VM-8 pc3
VM-9 pc3
lan/Lan pc5
End Nodes
Edges:
linklan/Lan/VM-1:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-2:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linklan/Lan/VM-3:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-4:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linklan/Lan/VM-5:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-6:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linklan/Lan/VM-7:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-8:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-9:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-10:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linklan/Lan/VM-13:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linklan/Lan/VM-14:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linklan/Lan/VM-15:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-16:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linklan/Lan/VM-17:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-18:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linklan/Lan/VM-19:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
linklan/Lan/VM-20:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-12:0 intraswitch link-pc3:eth1-procurve2:(null) (pc3/eth1,(null)) link-pc5:eth1-procurve2:(null) (pc5/eth1,(null))
linklan/Lan/VM-11:0 trivial pc5:loopback (pc5/null,(null)) pc5:loopback (pc5/null,(null)) 
End Edges
End solution
Summary:
procurve2 0 vnodes, 2000000 nontrivial BW, 0 trivial BW, type=
pc3 10 vnodes, 1000000 nontrivial BW, 0 trivial BW, type=pcvm
    1000000 link-pc3:eth1-procurve2:(null)
pc5 11 vnodes, 1000000 nontrivial BW, 1000000 trivial BW, type=pcvm
    1000000 link-pc5:eth1-procurve2:(null)
Total physical nodes used: 2
End summary
ASSIGN FAILED:
Type precheck passed.
Node mapping precheck succeeded
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  5.42 in 17000 iters and 0.470627 seconds
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   0
  desires:     1
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
 

comment:7 Changed 8 years ago by lnevers@bbn.com

Really backing off now. :-)

Requesting one sliver with 15 VMs also generates the same failure when 198 slots are available.

comment:8 Changed 8 years ago by lnevers@bbn.com

Starting resources: 100 pcvm slots on pc3, 98 pcvm slots on pc5, and no available dedicated nodes.

=> Ran scenario with 10 experiments with 10 VMs each:

Results: All slivers were successfully created.

Allocation: (90 VMs on pc3; 10 VMs on pc5)

  • sliver 10vmslice-1 = 10 VMs on pc3
  • sliver 10vmslice-2 = 10 VMs on pc3
  • sliver 10vmslice-3 = 10 VMs on pc3
  • sliver 10vmslice-4 = 10 VMs on pc3
  • sliver 10vmslice-5 = 10 VMs on pc5
  • sliver 10vmslice-6 = 10 VMs on pc3
  • sliver 10vmslice-7 = 10 VMs on pc3
  • sliver 10vmslice-8 = 10 VMs on pc3
  • sliver 10vmslice-9 = 10 VMs on pc3
  • sliver 10vmslice-10 = 10 VMs on pc3

Running some commands on allocated nodes.

comment:9 Changed 8 years ago by chaos@bbn.com

Luisa suggested i run some of the various monitoring commands i've been looking at while her test is running. Doing so now. No particular questions, just recording this for reference:

  • look for VLANs on the management switch:
    ProCurve Switch 2610-24# show vlans 
    
    ...
    
      VLAN ID Name                             | Status     Voice Jumbo
      ------- -------------------------------- + ---------- ----- -----
      1       DEFAULT_VLAN                     | Port-based No    No   
      10      control-hardware                 | Port-based No    No   
      11      control-alternate                | Port-based No    No   
    
    So there are no per-experiment VLANs on the control switch, as expected.
  • IG-MON-2 Step 1: see what state the nodes are in:
    • Three nodes are in reloading, which is suspicious, and i'll investigate in a bit if it's still true
    • Two nodes, pc3 and pc5, are in the shared-nodes experiment
  • IG-MON-3 Step 4: get information about running VMs:
    • https://boss.utah.geniracks.net/showpool.php also shows two shared nodes
    • https://boss.utah.geniracks.net/shownode.php3?node_id=pc3 shows that the following VMs are on pc3:
      pcvm3-87 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-85 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-84 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-83 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-71 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-72 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-73 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-74 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-75 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-76 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-77 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-78 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-79 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-80 	pgeni-gpolab-bbn-com/10vmslice-9
      pcvm3-81 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-89 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-88 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-86 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-82 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-70 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-61 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-62 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-63 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-64 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-65 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-66 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-67 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-68 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-69 	pgeni-gpolab-bbn-com/10vmslice-8
      pcvm3-90 	pgeni-gpolab-bbn-com/10vmslice-10
      pcvm3-54 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-55 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-56 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-57 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-58 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-59 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-60 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-53 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-52 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-51 	pgeni-gpolab-bbn-com/10vmslice-7
      pcvm3-46 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-47 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-48 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-49 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-50 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-45 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-44 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-43 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-42 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-41 	pgeni-gpolab-bbn-com/10vmslice-6
      pcvm3-1 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-2 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-3 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-4 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-5 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-6 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-7 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-8 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-9 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-10 	pgeni-gpolab-bbn-com/10vmslice-1
      pcvm3-11 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-12 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-13 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-14 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-15 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-16 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-17 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-18 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-19 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-20 	pgeni-gpolab-bbn-com/10vmslice-2
      pcvm3-21 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-22 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-23 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-24 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-25 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-26 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-27 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-28 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-29 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-30 	pgeni-gpolab-bbn-com/10vmslice-3
      pcvm3-31 	pgeni-gpolab-bbn-com/10vmslice-4
      pcvm3-32 	pgeni-gpolab-bbn-com/10vmslice-4
      pcvm3-33 	pgeni-gpolab-bbn-com/10vmslice-4
      pcvm3-34 	pgeni-gpolab-bbn-com/10vmslice-4
      pcvm3-35 	pgeni-gpolab-bbn-com/10vmslice-4
      pcvm3-36 	pgeni-gpolab-bbn-com/10vmslice-4
      pcvm3-37 	pgeni-gpolab-bbn-com/10vmslice-4
      pcvm3-38 	pgeni-gpolab-bbn-com/10vmslice-4
      pcvm3-39 	pgeni-gpolab-bbn-com/10vmslice-4
      pcvm3-40 	pgeni-gpolab-bbn-com/10vmslice-4
      
    • https://boss.utah.geniracks.net/shownode.php3?node_id=pc5 shows that the following VMs are on pc5:
      pcvm5-2 	pgeni-gpolab-bbn-com/ecgtest
      pcvm5-1 	pgeni-gpolab-bbn-com/ecgtest
      pcvm5-12 	pgeni-gpolab-bbn-com/10vmslice-5
      pcvm5-11 	pgeni-gpolab-bbn-com/10vmslice-5
      pcvm5-3 	pgeni-gpolab-bbn-com/10vmslice-5
      pcvm5-4 	pgeni-gpolab-bbn-com/10vmslice-5
      pcvm5-5 	pgeni-gpolab-bbn-com/10vmslice-5
      pcvm5-6 	pgeni-gpolab-bbn-com/10vmslice-5
      pcvm5-7 	pgeni-gpolab-bbn-com/10vmslice-5
      pcvm5-8 	pgeni-gpolab-bbn-com/10vmslice-5
      pcvm5-9 	pgeni-gpolab-bbn-com/10vmslice-5
      pcvm5-10 	pgeni-gpolab-bbn-com/10vmslice-5
      
    • So 90 VMs were placed on pc3, and 10 on pc5
    • Looking at an individual experiment, say 10vmslice-1:
      • Corroborates that all VMs are on pc3:
        Virtual Node Info:
        ID              Type         OS              Qualified Name
        --------------- ------------ --------------- --------------------
        VM-1 (pc3)      pcvm         OPENVZ-STD      VM-1.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        VM-10 (pc3)     pcvm         OPENVZ-STD      VM-10.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        VM-2 (pc3)      pcvm         OPENVZ-STD      VM-2.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        VM-3 (pc3)      pcvm         OPENVZ-STD      VM-3.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        VM-4 (pc3)      pcvm         OPENVZ-STD      VM-4.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        VM-5 (pc3)      pcvm         OPENVZ-STD      VM-5.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        VM-6 (pc3)      pcvm         OPENVZ-STD      VM-6.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        VM-7 (pc3)      pcvm         OPENVZ-STD      VM-7.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        VM-8 (pc3)      pcvm         OPENVZ-STD      VM-8.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        VM-9 (pc3)      pcvm         OPENVZ-STD      VM-9.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        
      • Note that, even on boss, those hostnames are not defined. I'll follow up with Leigh about that (it's an outstanding question i had):
        boss,[~],12:45(127)$ host VM-8.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
        Host VM-8.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net not found: 3(NXDOMAIN)
        
      • I note that MAC addresses are not listed here:
        Physical Lan/Link Mapping:
        ID              Member          IP              MAC                  NodeID
        --------------- --------------- --------------- -------------------- ---------
        Lan             VM-10:0         10.10.1.10                           pcvm3-2
        Lan             VM-1:0          10.10.1.1                            pcvm3-1
        Lan             VM-2:0          10.10.1.2                            pcvm3-3
        Lan             VM-3:0          10.10.1.3                            pcvm3-4
        Lan             VM-4:0          10.10.1.4                            pcvm3-5
        Lan             VM-5:0          10.10.1.5                            pcvm3-6
        Lan             VM-6:0          10.10.1.6                            pcvm3-7
        Lan             VM-7:0          10.10.1.7                            pcvm3-8
        Lan             VM-8:0          10.10.1.8                            pcvm3-9
        Lan             VM-9:0          10.10.1.9                            pcvm3-10
        
    • Looking at the vzhosts themselves:
      vhost2,[~],12:50(0)$ sudo vzlist -a
            CTID      NPROC STATUS    IP_ADDR         HOSTNAME
               1         15 running   -               VM-1.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
               2         15 running   -               VM-10.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
               3         15 running   -               VM-2.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
               4         15 running   -               VM-3.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
               5         15 running   -               VM-4.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
               6         18 running   -               VM-5.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
               7         15 running   -               VM-6.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
               8         15 running   -               VM-7.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
               9         15 running   -               VM-8.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
              10         15 running   -               VM-9.10vmslice-1.pgeni-gpolab-bbn-com.utah.geniracks.net
              11         15 running   -               VM-1.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              12         15 running   -               VM-10.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              13         15 running   -               VM-2.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              14         15 running   -               VM-3.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              15         15 running   -               VM-4.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              16         15 running   -               VM-5.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              17         15 running   -               VM-6.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              18         15 running   -               VM-7.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              19         15 running   -               VM-8.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              20         15 running   -               VM-9.10vmslice-2.pgeni-gpolab-bbn-com.utah.geniracks.net
              21         15 running   -               VM-1.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              22         15 running   -               VM-10.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              23         15 running   -               VM-2.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              24         15 running   -               VM-3.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              25         15 running   -               VM-4.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              26         15 running   -               VM-5.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              27         15 running   -               VM-6.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              28         15 running   -               VM-7.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              29         15 running   -               VM-8.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              30         15 running   -               VM-9.10vmslice-3.pgeni-gpolab-bbn-com.utah.geniracks.net
              31         15 running   -               VM-1.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              32         15 running   -               VM-10.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              33         15 running   -               VM-2.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              34         15 running   -               VM-3.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              35         15 running   -               VM-4.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              36         15 running   -               VM-5.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              37         15 running   -               VM-6.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              38         15 running   -               VM-7.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              39         15 running   -               VM-8.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              40         15 running   -               VM-9.10vmslice-4.pgeni-gpolab-bbn-com.utah.geniracks.net
              41         15 running   -               VM-1.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              42         15 running   -               VM-10.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              43         15 running   -               VM-2.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              44         15 running   -               VM-3.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              45         15 running   -               VM-4.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              46         15 running   -               VM-5.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              47         15 running   -               VM-6.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              48         15 running   -               VM-7.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              49         15 running   -               VM-8.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              50         15 running   -               VM-9.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
              51         15 running   -               VM-1.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              52         15 running   -               VM-10.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              53         15 running   -               VM-2.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              54         15 running   -               VM-3.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              55         15 running   -               VM-4.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              56         15 running   -               VM-5.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              57         15 running   -               VM-6.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              58         15 running   -               VM-7.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              59         15 running   -               VM-8.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              60         15 running   -               VM-9.10vmslice-7.pgeni-gpolab-bbn-com.utah.geniracks.net
              61         15 running   -               VM-1.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              62         15 running   -               VM-10.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              63         15 running   -               VM-2.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              64         17 running   -               VM-3.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              65         15 running   -               VM-4.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              66         15 running   -               VM-5.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              67         15 running   -               VM-6.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              68         15 running   -               VM-7.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              69         15 running   -               VM-8.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              70         15 running   -               VM-9.10vmslice-8.pgeni-gpolab-bbn-com.utah.geniracks.net
              71         15 running   -               VM-1.10vmslice-9.pgeni-gpolab-bbn-com.utah.geniracks.net
              72         15 running   -               VM-10.10vmslice-9.pgeni-gpolab-bbn-com.utah.geniracks.net
              73         15 running   -               VM-2.10vmslice-9.pgeni-gpolab-bbn-com.utah.geniracks.net
              74         15 running   -               VM-3.10vmslice-9.pgeni-gpolab-bbn-com.utah.geniracks.net
              75         15 running   -               VM-4.10vmslice-9.pgeni-gpolab-bbn-com.utah.geniracks.net
              76         15 running   -               VM-5.10vmslice-9.pgeni-gpolab-bbn-com.utah.geniracks.net
              78         15 running   -               VM-7.10vmslice-9.pgeni-gpolab-bbn-com.utah.geniracks.net
              79         15 running   -               VM-8.10vmslice-9.pgeni-gpolab-bbn-com.utah.geniracks.net
              80         15 running   -               VM-9.10vmslice-9.pgeni-gpolab-bbn-com.utah.geniracks.net
              81         15 running   -               VM-1.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
              82         15 running   -               VM-10.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
              83         15 running   -               VM-2.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
              84         15 running   -               VM-3.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
              85         15 running   -               VM-4.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
              86         15 running   -               VM-5.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
              87         15 running   -               VM-6.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
              88         15 running   -               VM-7.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
              89         15 running   -               VM-8.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
              90         15 running   -               VM-9.10vmslice-10.pgeni-gpolab-bbn-com.utah.geniracks.net
      
      vhost1,[~],12:51(0)$ sudo vzlist -a
            CTID      NPROC STATUS    IP_ADDR         HOSTNAME
               1         15 running   -               virt1.ecgtest.pgeni-gpolab-bbn-com.utah.geniracks.net
               2         15 running   -               virt2.ecgtest.pgeni-gpolab-bbn-com.utah.geniracks.net
               3         15 running   -               VM-1.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
               4         15 running   -               VM-10.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
               5         15 running   -               VM-2.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
               6         15 running   -               VM-3.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
               7         15 running   -               VM-4.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
               8         15 running   -               VM-5.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
               9         15 running   -               VM-6.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
              10         15 running   -               VM-7.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
              11         15 running   -               VM-8.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
              12         17 running   -               VM-9.10vmslice-5.pgeni-gpolab-bbn-com.utah.geniracks.net
      
    • I am interested in investigating a process which is running on pc3:
      vhost2,[~],12:59(0)$ pg ping
      20001     656004  468218  0 12:56 ?        00:00:00 csh -c ping 10.10.0.1
      20001     656012  656004  0 12:56 ?        00:00:00 ping 10.10.0.1
      chaos     659649  647640  0 12:59 pts/0    00:00:00 grep --color=auto ping
      
      vhost2,[~],12:59(0)$ sudo vzpid 656012
      Pid     CTID    Name
      656012  41      ping
      
    • From above, CTID 41 is VM-1.10vmslice-6.pgeni-gpolab-bbn-com.utah.geniracks.net
    • Use top to see if the two hosts seem to be performing well:
      • pc3:
        top - 13:01:14 up 1 day, 46 min,  1 user,  load average: 1.28, 2.24, 1.75
        Tasks: 2111 total,   3 running, 2108 sleeping,   0 stopped,   0 zombie
        Cpu(s):  0.9%us,  1.5%sy,  0.0%ni, 94.5%id,  3.1%wa,  0.0%hi,  0.0%si,  0.0%st
        Mem:  49311612k total,  7402468k used, 41909144k free,   446780k buffers
        Swap:  1050168k total,        0k used,  1050168k free,  2819928k cached
        
      • pc5:
        top - 13:02:00 up 1 day, 47 min,  1 user,  load average: 0.06, 0.22, 0.21
        Tasks: 492 total,   1 running, 491 sleeping,   0 stopped,   0 zombie
        Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.6%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
        Mem:  49311612k total,  1904952k used, 47406660k free,   151880k buffers
        Swap:  1050168k total,        0k used,  1050168k free,   906872k cached
        
    • So neither machine is all that busy, and pc3 having a large number of inactive processes seems to be no big deal performance-wise.

comment:10 Changed 8 years ago by lnevers@bbn.com

=> Ran a scenario with 5 experiments with 20 VMs each, which failed on the 4th sliver with a "no_connect" violation.

Starting resources before start of scenario:

  • 100 pcvm slots on pc3
  • 98 pcvm slots on pc5
  • no available dedicated nodes.

Test details:

  • 1st sliver 20vmslice-1 = OK = 10 VMs on pc3, 10 VMs on pc5
  • 2nd sliver 20vmslice-2 = OK = 10 VMs on pc3, 10 VMs on pc5
  • 3rd sliver 20vmslice-3 = OK = 10 VMs on pc3, 10 VMs on pc5
  • 4th sliver 20vmslice-4 = failed with this error:
    *** ERROR: mapper: Reached run limit. Giving up.
    seed = 1338542374
    Physical Graph: 4
    Calculating shortest paths on switch fabric.
    Virtual Graph: 21
    Generating physical equivalence classes:4
    Type precheck:
    Type precheck passed.
    Node mapping precheck:
    Node mapping precheck succeeded
    Policy precheck:
    Policy precheck succeeded
    Annealing.
    Doing melting run
    Reverting: forced
    Reverting to best solution
    Done
       BEST SCORE:  8.5 in 17000 iters and 0.050399 seconds
    With 10 violations
    Iters to find best score:  23
    Violations: 10
      unassigned:  0
      pnode_load:  0
      no_connect:  10
      link_users:  0
      bandwidth:   0
      desires:     0
      vclass:      0
      delay:       0
      trivial mix: 0
      subnodes:    0
      max_types:   0
      endpoints:   0
    Nodes:
    VM-1 pc3
    VM-10 pc3
    VM-11 pc3
    VM-12 pc5
    VM-13 pc5
    VM-14 pc5
    VM-15 pc5
    VM-16 pc5
    VM-17 pc5
    VM-18 pc3
    VM-19 pc3
    VM-2 pc3
    VM-20 pc3
    VM-3 pc3
    VM-4 pc3
    VM-5 pc5
    VM-6 pc5
    VM-7 pc5
    VM-8 pc3
    VM-9 pc5
    lan/Lan pc3
    End Nodes
    Edges:
    linklan/Lan/VM-1:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    linklan/Lan/VM-2:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    linklan/Lan/VM-3:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    linklan/Lan/VM-4:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    linklan/Lan/VM-5:0 Mapping Failed
    linklan/Lan/VM-6:0 Mapping Failed
    linklan/Lan/VM-7:0 Mapping Failed
    linklan/Lan/VM-8:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    linklan/Lan/VM-9:0 Mapping Failed
    linklan/Lan/VM-10:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    linklan/Lan/VM-13:0 Mapping Failed
    linklan/Lan/VM-14:0 Mapping Failed
    linklan/Lan/VM-15:0 Mapping Failed
    linklan/Lan/VM-16:0 Mapping Failed
    linklan/Lan/VM-17:0 Mapping Failed
    linklan/Lan/VM-18:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    linklan/Lan/VM-19:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    linklan/Lan/VM-20:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    linklan/Lan/VM-12:0 Mapping Failed
    linklan/Lan/VM-11:0 trivial pc3:loopback (pc3/null,(null)) pc3:loopback (pc3/null,(null)) 
    End Edges
    End solution
    Summary:
    pc3 11 vnodes, 0 nontrivial BW, 1000000 trivial BW, type=pcvm
    pc5 10 vnodes, 0 nontrivial BW, 0 trivial BW, type=pcvm
    Total physical nodes used: 2
    End summary
    ASSIGN FAILED:
    Type precheck passed.
    Node mapping precheck succeeded
    Policy precheck succeeded
    Annealing.
    Doing melting run
    Reverting: forced
    Reverting to best solution
    Done
       BEST SCORE:  8.5 in 17000 iters and 0.050399 seconds
      unassigned:  0
      pnode_load:  0
      no_connect:  10
      link_users:  0
      bandwidth:   0
      desires:     0
      vclass:      0
      delay:       0
      trivial mix: 0
      subnodes:    0
      max_types:   0
      endpoints:   0
     
    

comment:11 Changed 8 years ago by lnevers@bbn.com

After the dedicated nodes were restored. I re-ran the test that had just failed, the scenario with 5 experiments with 20 VMs each, and it now works. The following were allocated:

  • 1st sliver 20vmslice-1 = OK = 10 VMs on pc3, 10 VMs on pc5
  • 2nd sliver 20vmslice-2 = OK = 10 VMs on pc3, 10 VMs on pc5
  • 3rd sliver 20vmslice-3 = OK = 10 VMs on pc3, 10 VMs on pc5
  • 4th sliver 20vmslice-4 = OK = 10 VMs on pc3, 10 VMs on pc5
  • 5th sliver 20vmslice-5 = OK = 10 VMs on pc3, 10 VMs on pc1

comment:12 Changed 8 years ago by lnevers@bbn.com

Resolution: fixed
Status: newclosed

All test scenarios have been completed for 100 VM. This ticket is being closed.

The 100 VM scenarios will be re-run if the default rack configuration that is shipped is different than the current one (2 shared node and 3 exclusive nodes).

Note: See TracTickets for help on using tickets.