Title: A Control Plane for Experimenter Support in GENI PIs: Tom Anderson and Arvind Krishnamurthy Why is experimenter support important to GENI? For GENI to be effective at enabling experimental network and distributed systems research, it must be accessible to the broadest set of researchers. We are particularly concerned for the lone PI at a low ranked research school with perhaps a single graduate student; will they have the resources to be able to make productive use of GENI? Or will GENI be the sole playground of the schools with large teams of researchers? If so, there will be a tremendous backlash against GENI and MREFC's in general. Unfortunately, PlanetLab does not provide us much helpful guidance. While PlanetLab has a number of strengths, it is hard to use for the uninitiated. Users often find they need a local "expert guide" who has already tripped over the potholes in getting started. Writing and running a distributed "hello world" program should be a few lines of code, but instead requires, among other things, clicking several hundred times on a web page to request sliver creation on each node, waiting for the system to install sliver access permissions on the node, figuring out how to install your own software execution environment on each of those nodes, building scripts to monitor node and process failures, building a data distribution system to pipe logs back to the developer's workstation, etc. This lack of effective tools places a high bar against PlanetLab's occasional use, and addressing this in GENI is a high priority in the research community. We seek to build a control plane toolkit that will reduce this startup cost to less than a day. Key steps will be to develop abstractions for job control and failure recovery and make them available through a shell, a programmable API, and a scripting language interface to the API. These different interfaces will target different types of users, namely, the new user community, existing PlanetLab user community, and experienced developers maintaining long-running services and desiring greater control. The control plane will smooth the development process by which new ideas go from initial experimentation to eventual deployment, and it will do so by providing an abstraction layer that allows experimenters to migrate from locally available clusters to planetary-scale testbeds. We have the following schedule for the deliverables: 6 Months (Feb '07): Develop and implement API for experiment instantiation and system-wide job control using parallel exec. Develop a shell program and an interactive GUI that allows the user to interactively invoke the tasks provided by the API. At the end of the six month period, demonstrate that a new user can quickly and easily deploy experiments on Emulab and PlanetLab using the shell. 12 Months (Aug '07): Develop support for using the API with scripting languages, such as Python, Perl. Develop more advanced support for advanced parallel process management, including suspend/resume/kill, and automated I/O management. Also develop support for issuing system-wide commands in asynchronous mode, querying the status of previously issued asynchronous operations, and to synchronize subsequent operations with those initiated earlier. Provide support for simple forms of node selection, e.g., based on processor load or memory availability. Make all of the developed components available on both the Emulab and PlanetLab platforms. The goal is to provide the abstraction of "run this experiment on some set of nodes." 18 Months (Feb '08): Develop support for permanent services. Develop mechanisms for continuous monitoring of program state and reporting exceptions, faults to the user. Develop support for deploying experiments on heterogeneous platforms such as VINI and GENI. This would include support for specifying network configuration, controlling (or injecting) network events, exporting information regarding network conditions, and providing more control over resource allocation for a diversity of resources. The goal is to provide the abstractions of "keep this service running" for service developers and "run this experiment on a matching set of nodes/topologies" to network experimenters desiring a specific workload. 24 Months (Aug '08): Interface the control plane to existing services and sensors. Integrate with CPU performance monitoring sensors such as Slicestat, CoTop, and Ganglia and also integrate with the IPlane, a network performance monitoring system that we are concurrently building. Also provide interfaces to different resource discovery and allocation mechanisms (such as SWORD, Bellagio, SHARP), and different content distribution systems (such as Bullet, BitTorrent, Coral, Codeen), so that the user can just change an environment variable or a parameter in the API to use these services. 30 Months (Feb '09): Provide support for common design patterns that application developers use for recognizing and overcoming faults, such as using transactional operations and process isolation. Develop mechanisms and tools for end-hosts to subscribe to overlay services (VINI/GENI services), with support for interfacing at different levels of the protocol stack. 36 Months (Aug '09): Develop intrusive and non-intrusive techniques for monitoring program state, detecting abnormal behavior, and debugging support such as single-stepping. Address scalability issues so that the control infrastructure can scale to hundreds and thousands of nodes without developing hotspots. Address network reliability issues by having the control plane use a resilient communication layer that routes control messages around network faults and hides transient connectivity problems.