Opened 7 years ago

Closed 7 years ago

#32 closed (fixed)

Track 100 VM scenario findings

Reported by: lnevers@bbn.com Owned by: somebody
Priority: major Milestone: IG-EXP-3
Component: Experiment Version: SPIRAL4
Keywords: vm support Cc:
Dependencies:

Description

This ticket is being written to capture the findings for the planned IG-EXP-3 InstaGENI Single Site 100 VM Test. All issues captured are known and require no action at this time.

These are the 100 VM scenarios planned for testing:

Scenario 1: 1 Slice with 100 VMs Scenario 2: 2 Slices with 50 VMs each Scenario 3: 4 Slices with 25 VMS each Scenario 4: 50 Slices with 2 VMs each Scenario 5: 100 Slices with 1 VM each Scenario 6: 10 slices with 10 VM each (note 1)

(note 1) This scenario was not in the original test plan, but it is being added based on input from Leigh Stroller.

=> Scenario 1: 1 Slice with 100 VMs in a grid topology

Results: FAILED due to:

 100 nodes of type pcvm requested, but only 50 available nodes of type pcvm found

=> Scenario 2: 2 Slices with 50 VMs each, each sliver uses grid topology

Results: FAILED

Two test runs were completed. 1st try:

  • Created first 50 node experiment, no error reported.
  • Sliver status reported "resource is busy; try again later" for about 5 minutes,
  • then sliverstatus reported error:
    Failed to get SliverStatus on 2exp-50vm at AM https://boss.utah.geniracks.net/protogeni/xmlrpc/am/2.0: 
<Fault -32600: 'Internal Error executing SliverStatus'>

Results 2nd test run:

  1. Creatsliver first 50 node sliver which completes without error
  2. Sliver status is busy approximately 5 minutes reporting "resource is busy; try again later"
  3. Slivestatus eventually fails with "<Fault -32400: 'XMLRPC Server Error'>"

Additional tests results will also be captured as completed.

Change History (7)

comment:1 Changed 7 years ago by lnevers@bbn.com

=> Scenario 3: 4 Slices with 25 VMS each

Results:

  1. Create the first slice "4exp-25vm" and sliver with 25 nodes in a grid topology no problem was reported.
  1. Create the second slice and sliver "4exp-25vma", which fails with the following error:
       ASSIGN FAILED:
       *** 25 nodes of type pcvm requested, but only 20 available nodes of type pcvm found
       *** Type precheck failed!
    

comment:2 Changed 7 years ago by lnevers@bbn.com

Ran Scenario 6: 10 slices with 10 VM each.

  1. Created 1st sliver in slice 10exp-10vm-1 -> no problem
  2. Created 2nd sliver in slice 10exp-10vm-2 -> no problem
  3. Created 3rd sliver in slice 10exp-10vm-3 -> no problem
  4. Created 4th sliver in slice 10exp-10vm-4 -> no problem
  5. Created 5th sliver in slice 10exp-10vm-5 -> failure below:
Asked https://boss.utah.geniracks.net/protogeni/xmlrpc/am/2.0 to reserve resources. 
No manifest Rspec returned. *** ERROR: mapper: Reached run limit. Giving up.
seed = 1337780864
Physical Graph: 3
Calculating shortest paths on switch fabric.
Virtual Graph: 11
Generating physical equivalence classes:3
Type precheck:
Type precheck passed.
Node mapping precheck:
Node mapping precheck succeeded
Policy precheck:
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  11.75 in 17000 iters and 0.135319 seconds
With 1 violations
Iters to find best score:  1
Violations: 1
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   1
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
Nodes:
VM-1 pc1
VM-10 pc1
VM-2 pc1
VM-3 pc1
VM-4 pc1
VM-5 pc1
VM-6 pc1
VM-7 pc1
VM-8 pc1
VM-9 pc1
lan/Lan pc1
End Nodes
Edges:
linklan/Lan/VM-1:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-2:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-3:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-4:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-5:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-6:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-7:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-8:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-9:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
linklan/Lan/VM-10:0 trivial pc1:loopback (pc1/null,(null)) pc1:loopback (pc1/null,(null)) 
End Edges
End solution
Summary:
pc1 11 vnodes, 0 nontrivial BW, 1000000 trivial BW, type=pcvm
    ?+virtpercent: used=0 total=100
    ?+cpu: used=0 total=2666
    ?+ram: used=0 total=3574
    ?+cpupercent: used=0 total=92
    ?+rampercent: used=0 total=80
Total physical nodes used: 1
End summary
ASSIGN FAILED:
Type precheck passed.
Node mapping precheck succeeded
Policy precheck succeeded
Annealing.
Adjusting dificulty estimate for fixed nodes, 1 remain.
Doing melting run
Reverting: forced
Reverting to best solution
Done
   BEST SCORE:  11.75 in 17000 iters and 0.135319 seconds
  unassigned:  0
  pnode_load:  0
  no_connect:  0
  link_users:  0
  bandwidth:   1
  desires:     0
  vclass:      0
  delay:       0
  trivial mix: 0
  subnodes:    0
  max_types:   0
  endpoints:   0
 

comment:3 Changed 7 years ago by lnevers@bbn.com

Ran Scenario 5: 100 Slices with 1 VM each

At the start of the test, available resource were based on the following listresources details:

  • pc3 sliver_type "emulab-openvz" for "pcvm" had type_slots="100" = 100 VMs possible
  • pc5 sliver_type "emulab-openvz" for "pcvm" had type_slots="97" = 97 VMs possible

As each 1 VM sliver was created, both pc3 and pc5 counters for available slots decreased.

=> Results: Success!!

Each of the 100 experiments had a node assigned, was able to login to several.

Final allocation distribution:

  • 61 VMs on pc3
  • 39 VMs on pc5

comment:4 Changed 7 years ago by lnevers@bbn.com

Note the allocation listed in previous update are based on sliver manifests. The slot count from listrsources was as follows for each shared node:

  • pc5 = 58
  • pc3 = 39

Which is as expected based on sliver manifest counts.

comment:5 Changed 7 years ago by lnevers@bbn.com

Ran Scenario 4: 50 Slices with 2 VMs each

=> Results: Success - Each of the 50 experiments had 2 node assigned, was able to login to several. Final allocation distribution:

  • 58 VMs on pc3
  • 42 VMs on pc5

comment:6 Changed 7 years ago by lnevers@bbn.com

=> Re-ran Scenario 2: 2 Slices with 50 VMs each.

In previous run, this test case had failed with a "<Fault -32400: 'XMLRPC Server Error'>".

Verified this is no longer the case and that this scenario fails as expected due to configured rack resource allocation.

comment:7 Changed 7 years ago by lnevers@bbn.com

Resolution: fixed
Status: newclosed

Summary of the 100 VMs scenarios that have been completed with the InstaGENI rack configured to have 2 pcshared nodes:

=> Scenario 1: 1 Slice with 100 VMs

Results: - FAIL - Not allowed with current rack configuration.
Sliver failure reported:100 nodes, but only 20 available.

=> Scenario 2: 2 Slices with 50 VMs each

Results: - FAIL - Not allowed with current rack configuration.
Sliver failure reported on 1st sliver: 50 nodes requested, but only 30 available.

=> Scenario 3: 4 Slices with 25 VMS each

Results: - FAIL - Not allowed with current rack, only 3 slices set up.
Allocation:pc3=30 VMs, pc5=30 VMS, pc1,pc4,pc2=5 VMs each
Sliver failure reported on 4th sliver: 25 nodes requested, but only 20 available.

=> Scenario 7: 5 slices with 20 VMs each

Results: - PASS - Allocation:pc3=50 VMs, pc5=40 VMs, pc1=10 VMs

=> Scenario 6: 10 Slices with 10 VMs each

Results: - PASS - Allocation:pc3=90 VMs, pc5=10 VMs

=> Scenario 4: 50 Slices with 2 VMs each

Results: - PASS - Allocation:pc3=59 VMs, pc5=42 VMs

=> Scenario 5: 100 Slices with 1 VM each

Results: - PASS - Allocation:pc3=61 VMs, pc5=39 VMs

Note: See TracTickets for help on using tickets.