Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of CmuLab-ApproachToUpgradingEmulabTestbed

Timestamp:: 06/03/11 12:21:31 (13 years ago)
Author:: Vic Thomas
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CmuLab-ApproachToUpgradingEmulabTestbed

                       v1
+== Using Virtualization to Ease Emulab Upgrades ==
+   by  Pat Gunn
+=== Intuitions ===
+        * Upgrading emulab is hard and risky
+        * We sometimes have had downtime longer than we would like
+         * Zero downtime/Zero risk would be fantastic
+=== Virtualisation as a solution? ===
+        * Benefits:
+                * Snapshots
+                * Upgrades on a forked system
+                * Grow the testbed more smoothly
+                * Resilience against hardware failures
+                * Easily upgrade hardware by allocating more resources from resource pool
+        * Issues:
+                * Networking
+                        * Configuration? Getting the VLANs right
+                * Performance
+                        * FreeBSD vs Linux - FreeBSD needs recent CPU hardware features to be efficiently virtualized
+                        * General performance - Will UDP performance be good enough? Good NFS performance?
+=== Solution so far ===
+        * VMWare ESX
+                *  Expensive, but we already have the license
+        * Boss and Ops live in a cloud distant from the machines they manage. We might move them.
+        * Systems connect through virtual ports on real switches
+        * We can only safely snapshot nodes that are down (On Linux, this is slightly more flexible)
+        * Performance tests so far show good UDP and TCP performance
+                * We will see how this works in practice as we scale up
+        * Imagined upgrade progress:
+                * Disable logins and testbed daemons
+                * Take testbed down
+                * Clone the system
+                * Take testbed back up
+                * Boot clones of boss/ops, isolated from real versions
+                * Upgrade clones
+                * If upgrade not successful, delete clones, complain to Utah
+                * If upgrade successful, shutdown old boss/ops, massage database and experiment disk state changes into upgraded boss/ops? Not clear on this part.
+        * Fallback upgrade progress: Like a normal upgrade, but with a very good backup beforehand
+        * Other nice things:
+                * Virtual switches, virtual power controllers?
+                * Compute nodes in a cloud? (Planetlab-esque?)  Virtualisation technologies are an active area of research at CMU/PDL
+                * Boss/ops can run on the same suitably powerful system
+                * Storage separate from boss/ops nodes
+Our solution is in the very early stages of deployment. The new testbed we're building will be about 120 nodes, big enough to know if it's reasonable as a longer-term solution.
+Mitch adds: We're looking into other virtualisation solutions that are more scriptable than VMWare.