Changes between Initial Version and Version 1 of CmuLab-ApproachToUpgradingEmulabTestbed


Ignore:
Timestamp:
06/03/11 12:21:31 (13 years ago)
Author:
Vic Thomas
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CmuLab-ApproachToUpgradingEmulabTestbed

    v1 v1  
     1== Using Virtualization to Ease Emulab Upgrades ==
     2   by  Pat Gunn
     3
     4=== Intuitions ===
     5        * Upgrading emulab is hard and risky
     6        * We sometimes have had downtime longer than we would like
     7         * Zero downtime/Zero risk would be fantastic
     8
     9=== Virtualisation as a solution? ===
     10        * Benefits:
     11                * Snapshots
     12                * Upgrades on a forked system
     13                * Grow the testbed more smoothly
     14                * Resilience against hardware failures
     15                * Easily upgrade hardware by allocating more resources from resource pool
     16        * Issues:
     17                * Networking
     18                        * Configuration? Getting the VLANs right
     19                * Performance
     20                        * FreeBSD vs Linux - FreeBSD needs recent CPU hardware features to be efficiently virtualized
     21                        * General performance - Will UDP performance be good enough? Good NFS performance?
     22
     23=== Solution so far ===
     24        * VMWare ESX
     25                *  Expensive, but we already have the license
     26        * Boss and Ops live in a cloud distant from the machines they manage. We might move them.
     27        * Systems connect through virtual ports on real switches
     28        * We can only safely snapshot nodes that are down (On Linux, this is slightly more flexible)
     29        * Performance tests so far show good UDP and TCP performance
     30                * We will see how this works in practice as we scale up
     31        * Imagined upgrade progress:
     32                * Disable logins and testbed daemons
     33                * Take testbed down
     34                * Clone the system
     35                * Take testbed back up
     36                * Boot clones of boss/ops, isolated from real versions
     37                * Upgrade clones
     38                * If upgrade not successful, delete clones, complain to Utah
     39                * If upgrade successful, shutdown old boss/ops, massage database and experiment disk state changes into upgraded boss/ops? Not clear on this part.
     40        * Fallback upgrade progress: Like a normal upgrade, but with a very good backup beforehand
     41        * Other nice things:
     42                * Virtual switches, virtual power controllers?
     43                * Compute nodes in a cloud? (Planetlab-esque?)  Virtualisation technologies are an active area of research at CMU/PDL
     44                * Boss/ops can run on the same suitably powerful system
     45                * Storage separate from boss/ops nodes
     46
     47Our solution is in the very early stages of deployment. The new testbed we're building will be about 120 nodes, big enough to know if it's reasonable as a longer-term solution.
     48
     49Mitch adds: We're looking into other virtualisation solutions that are more scriptable than VMWare.