wiki:CmuLab-ApproachToUpgradingEmulabTestbed

Version 1 (modified by Vic Thomas, 13 years ago) (diff)

--

Using Virtualization to Ease Emulab Upgrades

by Pat Gunn

Intuitions

  • Upgrading emulab is hard and risky
  • We sometimes have had downtime longer than we would like
    • Zero downtime/Zero risk would be fantastic

Virtualisation as a solution?

  • Benefits:
    • Snapshots
    • Upgrades on a forked system
    • Grow the testbed more smoothly
    • Resilience against hardware failures
    • Easily upgrade hardware by allocating more resources from resource pool
  • Issues:
    • Networking
      • Configuration? Getting the VLANs right
    • Performance
      • FreeBSD vs Linux - FreeBSD needs recent CPU hardware features to be efficiently virtualized
      • General performance - Will UDP performance be good enough? Good NFS performance?

Solution so far

  • VMWare ESX
    • Expensive, but we already have the license
  • Boss and Ops live in a cloud distant from the machines they manage. We might move them.
  • Systems connect through virtual ports on real switches
  • We can only safely snapshot nodes that are down (On Linux, this is slightly more flexible)
  • Performance tests so far show good UDP and TCP performance
    • We will see how this works in practice as we scale up
  • Imagined upgrade progress:
    • Disable logins and testbed daemons
    • Take testbed down
    • Clone the system
    • Take testbed back up
    • Boot clones of boss/ops, isolated from real versions
    • Upgrade clones
    • If upgrade not successful, delete clones, complain to Utah
    • If upgrade successful, shutdown old boss/ops, massage database and experiment disk state changes into upgraded boss/ops? Not clear on this part.
  • Fallback upgrade progress: Like a normal upgrade, but with a very good backup beforehand
  • Other nice things:
    • Virtual switches, virtual power controllers?
    • Compute nodes in a cloud? (Planetlab-esque?) Virtualisation technologies are an active area of research at CMU/PDL
    • Boss/ops can run on the same suitably powerful system
    • Storage separate from boss/ops nodes

Our solution is in the very early stages of deployment. The new testbed we're building will be about 120 nodes, big enough to know if it's reasonable as a longer-term solution.

Mitch adds: We're looking into other virtualisation solutions that are more scriptable than VMWare.