== Using Virtualization to Ease Emulab Upgrades == by Pat Gunn === Intuitions === * Upgrading emulab is hard and risky * We sometimes have had downtime longer than we would like * Zero downtime/Zero risk would be fantastic === Virtualisation as a solution? === * Benefits: * Snapshots * Upgrades on a forked system * Grow the testbed more smoothly * Resilience against hardware failures * Easily upgrade hardware by allocating more resources from resource pool * Issues: * Networking * Configuration? Getting the VLANs right * Performance * FreeBSD vs Linux - FreeBSD needs recent CPU hardware features to be efficiently virtualized * General performance - Will UDP performance be good enough? Good NFS performance? === Solution so far === * VMWare ESX * Expensive, but we already have the license * Boss and Ops live in a cloud distant from the machines they manage. We might move them. * Systems connect through virtual ports on real switches * We can only safely snapshot nodes that are down (On Linux, this is slightly more flexible) * Performance tests so far show good UDP and TCP performance * We will see how this works in practice as we scale up * Imagined upgrade progress: * Disable logins and testbed daemons * Take testbed down * Clone the system * Take testbed back up * Boot clones of boss/ops, isolated from real versions * Upgrade clones * If upgrade not successful, delete clones, complain to Utah * If upgrade successful, shutdown old boss/ops, massage database and experiment disk state changes into upgraded boss/ops? Not clear on this part. * Fallback upgrade progress: Like a normal upgrade, but with a very good backup beforehand * Other nice things: * Virtual switches, virtual power controllers? * Compute nodes in a cloud? (Planetlab-esque?) Virtualisation technologies are an active area of research at CMU/PDL * Boss/ops can run on the same suitably powerful system * Storage separate from boss/ops nodes Our solution is in the very early stages of deployment. The new testbed we're building will be about 120 nodes, big enough to know if it's reasonable as a longer-term solution. Mitch adds: We're looking into other virtualisation solutions that are more scriptable than VMWare.