OldGPGReferencesDevelopment: dev_control_plane.txt

File dev_control_plane.txt, 5.6 KB (added by peter.stickney@bbn.com, 14 years ago)

Control Plane

1Title: A Control Plane for Experimenter Support in GENI
2PIs: Tom Anderson and Arvind Krishnamurthy
4Why is experimenter support important to GENI?  For GENI to be
5effective at enabling experimental network and distributed systems
6research, it must be accessible to the broadest set of researchers.
7We are particularly concerned for the lone PI at a low ranked research
8school with perhaps a single graduate student; will they have the
9resources to be able to make productive use of GENI?  Or will GENI be
10the sole playground of the schools with large teams of researchers?
11If so, there will be a tremendous backlash against GENI and MREFC's in
14Unfortunately, PlanetLab does not provide us much helpful guidance.
15While PlanetLab has a number of strengths, it is hard to use for the
16uninitiated.  Users often find they need a local "expert guide" who
17has already tripped over the potholes in getting started.  Writing and
18running a distributed "hello world" program should be a few lines of
19code, but instead requires, among other things, clicking several
20hundred times on a web page to request sliver creation on each node,
21waiting for the system to install sliver access permissions on the
22node, figuring out how to install your own software execution
23environment on each of those nodes, building scripts to monitor node
24and process failures, building a data distribution system to pipe logs
25back to the developer's workstation, etc.  This lack of effective
26tools places a high bar against PlanetLab's occasional use, and
27addressing this in GENI is a high priority in the research community.
29We seek to build a control plane toolkit that will reduce this startup
30cost to less than a day.  Key steps will be to develop abstractions
31for job control and failure recovery and make them available through a
32shell, a programmable API, and a scripting language interface to the
33API.  These different interfaces will target different types of users,
34namely, the new user community, existing PlanetLab user community, and
35experienced developers maintaining long-running services and desiring
36greater control.  The control plane will smooth the development
37process by which new ideas go from initial experimentation to eventual
38deployment, and it will do so by providing an abstraction layer that
39allows experimenters to migrate from locally available clusters to
40planetary-scale testbeds.
42We have the following schedule for the deliverables:
446 Months (Feb '07): Develop and implement API for experiment
45instantiation and system-wide job control using parallel exec.
46Develop a shell program and an interactive GUI that allows the user to
47interactively invoke the tasks provided by the API.  At the end of the
48six month period, demonstrate that a new user can quickly and easily
49deploy experiments on Emulab and PlanetLab using the shell.
5112 Months (Aug '07): Develop support for using the API with scripting
52languages, such as Python, Perl.  Develop more advanced support for
53advanced parallel process management, including suspend/resume/kill,
54and automated I/O management.  Also develop support for issuing
55system-wide commands in asynchronous mode, querying the status of
56previously issued asynchronous operations, and to synchronize
57subsequent operations with those initiated earlier.  Provide support
58for simple forms of node selection, e.g., based on processor load or
59memory availability.  Make all of the developed components available
60on both the Emulab and PlanetLab platforms.  The goal is to provide
61the abstraction of "run this experiment on some set of nodes."
6318 Months (Feb '08): Develop support for permanent services.  Develop
64mechanisms for continuous monitoring of program state and reporting
65exceptions, faults to the user.  Develop support for deploying
66experiments on heterogeneous platforms such as VINI and GENI.  This
67would include support for specifying network configuration,
68controlling (or injecting) network events, exporting information
69regarding network conditions, and providing more control over resource
70allocation for a diversity of resources.  The goal is to provide the
71abstractions of "keep this service running" for service developers and
72"run this experiment on a matching set of nodes/topologies" to network
73experimenters desiring a specific workload.
7524 Months (Aug '08): Interface the control plane to existing services
76and sensors.  Integrate with CPU performance monitoring sensors such
77as Slicestat, CoTop, and Ganglia and also integrate with the IPlane, a
78network performance monitoring system that we are concurrently
79building.  Also provide interfaces to different resource discovery and
80allocation mechanisms (such as SWORD, Bellagio, SHARP), and different
81content distribution systems (such as Bullet, BitTorrent, Coral,
82Codeen), so that the user can just change an environment variable or a
83parameter in the API to use these services.
8530 Months (Feb '09): Provide support for common design patterns that
86application developers use for recognizing and overcoming faults, such
87as using transactional operations and process isolation.  Develop
88mechanisms and tools for end-hosts to subscribe to overlay services
89(VINI/GENI services), with support for interfacing at different levels
90of the protocol stack.
9236 Months (Aug '09): Develop intrusive and non-intrusive techniques
93for monitoring program state, detecting abnormal behavior, and
94debugging support such as single-stepping.  Address scalability issues
95so that the control infrastructure can scale to hundreds and thousands
96of nodes without developing hotspots.  Address network reliability
97issues by having the control plane use a resilient communication layer
98that routes control messages around network faults and hides transient
99connectivity problems.