Context Navigation

Changes between Version 7 and Version 8 of WorryingSlivers

Timestamp:: 12/08/11 10:08:17 (12 years ago)
Author:: chase@cs.duke.edu
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

WorryingSlivers

-                      v7
+                      v8
 But if a virtual resource graph can be partitioned across aggregates, then it must also be possible to partition the graph within aggregates.   We can break the graph into named regions and use the aggregate API to talk about specific regions, passing the rspec for only the regions of interest.  We can let the aggregate handle any edges that cross region boundaries within the aggregate.
 If we can partition the graph into regions, how shall we decide where to set the boundaries?  How big shall we make the regions?  At one extreme there is a single region: this is the degenerate case represented by the v2.0 AM-API.   We can make any changes we want using a single API call, but we must pass rspec for the entire graph, even for a minor change.  If we introduce region boundaries, then we have a tradeoff.  With smaller regions, we need more requests to instantiate or update a given graph, but each request passes a smaller rspec.  With larger regions, we make fewer API calls to instantiate or update the graph, but the rspec documents are larger.
+If we can partition the graph into regions, how shall we decide where to set the boundaries?  How big shall we make the regions?  At one extreme there is a single region: this is the degenerate case represented by the v2.0 AM-API.   We can make the changes we want using a single API call, but we must pass rspec for the entire graph, even for a minor change.  If we introduce region boundaries, then we have a tradeoff.  With smaller regions, we need more requests to instantiate or update a given graph, but each request passes a smaller rspec.  With larger regions, we make fewer API calls to instantiate or update the graph, but the rspec documents are larger.
 == Slivers ==
 Finally we come to slivers.  What is a sliver?  A sliver is a region of a slice's virtual resource graph.  Either the graph is partitionable, or it is not.  If the graph is partitionable, and we choose to partition it, then we need a name for the partitions.  Sliver is a fine name.  But perhaps the community will insist on a different name.  (Virtual Resource Assembly?)
+Finally we come to slivers.  What is a sliver?  A sliver is a region of a slice's virtual resource graph.  A sliver API allows a client to operate on one region independently of other regions.
+There seem to be three classes of arguments "against" slivers.  First: the name sliver is confusing to people.  We can change the name, but it probably won't help.  Using a name that we're already using for something else (like resource, or component, both of which refer to physical substrate) might make the situation worse, rather than better.  I think sliver is a good name.
+The second common argument makes various assumptions about what others assume slivers to be, and then argues that system X is different.  Indeed, perhaps system X is different.
+The third argument is (in essence) that virtual resources are tightly coupled, and we can't operate on them independently, or that trying to do so might add undue complexity to the API, or might get us into trouble later if we discover unexpected dependencies.
+The third argument merits a response.  My response is: either the virtual resource graph is partitionable, or it is not.
 If the graph is not partitionable, then it is not partitionable across aggregates.  Then there are only slices, and each aggregate must receive the entire graph for the entire slice.   If we change any part of the graph, we must pass the new slice rspec to at least all aggregates participating in the slice.  We can make this approach work for demos, but it will not scale to large slices, and it will not succeed in accommodating dynamic slices.  And then somebody in another project will figure out how to describe a virtual resource graph in a way that makes it partitionable, and we will move forward again from there.
+I believe that we know how to describe the resources we care about as partitionable graphs.
+If the graph is partitionable across aggregates, then it is partitionable within aggregates.  Then the only question is whether the aggregate API permits any partitioning within an aggregate.  And why would the API prohibit an aggregate from grouping and organizing the virtual resources that it serves?  Why would the API prohibit an aggregate from breaking its virtual resources into slivers that can be operated on through sliver APIs?
+On the other hand, if the graph is partitionable across aggregates, then it is also partitionable within aggregates.  I hope and believe that the graph is partitionable.  Then the only question is whether the aggregate API permits any partitioning within an aggregate, at the aggregate's discretion.  And why would the API prohibit that?  Why would the API prohibit an aggregate from grouping and organizing the virtual resources that it serves?  Why would the API prohibit an aggregate from breaking its virtual resources into slivers that can be operated on through sliver APIs?
+But what is a sliver *really*?  What does a region of the graph represent?  I have been vague in talking about "virtual resource elements" and "entities" and "virtual resources".  It is an abstraction.   The Vision Slide says that GENI will support heterogeneous deeply programmable virtualized infrastructure resources.  What are those?  We do not know.  But they are heterogeneous, so there could be many different kinds.  And if the architecture is to have impact over more than a few years, then it must accommodate resources that have not been invented yet.  We do not know what these will look like.
+== Types of Slivers ==
+What we do know is that it must be possible to describe these new virtual resources using a semantic model, and that the description will be a graph of elements and edges representing relationships among the elements.  And if we have a stitching architecture that can propagate labels across edges, then the graph will be partitionable.    And if the graph is partitionable, then it will be convenient to partition it into regions in order to allow the possibility that we might use the aggregate API to operate on different regions independently of other regions.  For example, we can add virtual resources to a slice by attaching a new region, without changing anything about the graph as it exists.  And we can remove virtual resources from a slice by detaching a region, without changing anything about the rest of the graph as it exists.
+But what is a sliver *really*?  I have been speaking at a high level of abstraction of these groupings as "regions" of a graph describing any set of virtual resources.  But what does a region of the graph represent?  If we know something about a specific aggregate, we can see that these groupings correspond to well-understood resource abstractions that are meaningful to users of the aggregate.
 We also know that this abstract notion of "regions" must cover the virtual resource cases we already understand.  For example, a cloud site offers an infrastructure service that allows us to instantiate graphs of related virtual resource elements such as virtual CPU cores, memories, network interfaces, storage volumes, and virtual networks.  We can take a pen and draw regions around parts of this richly connected graph of virtual resource elements.   We can decide that the collection of elements adjacent to a memory constitute a useful grouping.  We can draw a region encompassing all of those virtual elements and call it a "virtual machine" or "instance".  That is a reasonable choice of a region: it is the choice made by EC2-like cloud sites.  EC2 also draws regions around VLANs and calls them "security groups".  It considers storage volumes separately from virtual machine instances.
+Let's consider some virtual resource cases we already understand.  For example, a cloud site offers an infrastructure service that allows us to instantiate graphs of related virtual resource elements such as virtual CPU cores, memories, network interfaces, storage volumes, and virtual networks.  We can take a pen and draw regions around parts of this richly connected graph of virtual resource elements.   We can decide that the collection of elements adjacent to a memory constitute a useful grouping.  We can draw a region encompassing all of the cores and virtual devices adjacent to a memory and call it a "virtual machine" or "instance".  That is a reasonable choice of a region: it is the choice made by EC2-like cloud sites.  EC2 also draws regions around VLANs and calls them "security groups".  It considers storage volumes separately from virtual machine instances.
 The [https://ben.renci.org/ Breakable Experimental Network] is another interesting case.  BEN is a network substrate with a multi-layer topology.   We can allocate virtual network topologies from BEN.  Given a virtual network topology that is planar (forget about layers for now), there are many reasonable ways to partition the network into connected regions.   What is important about a region is that offers some connectivity service among a set of locations.   The aggregate might choose to expose more or less information about its internal structure.  [http://www.science.uva.nl/research/sne/ndl People who understand network description languages] call this topology aggregation.  But it is up to the network aggregate whether it allows its clients to create subnetworks separately and then stitch them together.  Most advanced networks today permit only creation of paths from point A to point B.  If the aggregate does support multi-point network topologies, then it is up to the client how it chooses to use those primitives.   It may be useful for a client to build and evolve a virtual network one piece at a time, or, it might be simpler to create a static network in one shot and then leave it alone.
+The [https://ben.renci.org/ Breakable Experimental Network] is another interesting case.  BEN is a network substrate with a multi-layer topology.   We can allocate virtual network topologies from BEN.  Given a virtual network topology that is planar (forget about layers for now), there are many reasonable ways to partition the network into connected regions.   What is important about a region is that offers some connectivity service among a set of locations.   The aggregate might choose to expose more or less information about its internal structure.  [http://www.science.uva.nl/research/sne/ndl People who understand network description languages] call this topology aggregation.  But it is up to the network aggregate whether it allows its clients to create subnetworks separately and then stitch them together, and it is up to the client how it chooses to use those primitives.   It may be useful for a client to build and evolve a virtual network one piece at a time, or, it might be simpler to create a static network in one shot and then leave it alone.
+== Sliver Types ==
+These examples show that a given virtual resource service incorporates its own groupings of the virtual resource element graph into regions (slivers), and these groupings may allow useful operations on a sliver other than creating it and releasing it.  EC2 separates networks, storage volumes, and virtual machines, and as a result it can offer primitives to attach and detach storage volumes to/from virtual machines, and attach/detach virtual machines to/from networks.   These are specific examples of generalized stitching, but these groupings can also support other useful verbs, like cloning storage volumes or suspending virtual machines.
 At one level of abstraction, we can speak of these groupings as "regions" of a graph describing any set of virtual resources.  But if we know something about a specific aggregate, we can see that these groupings correspond to well-understood resource abstractions that are meaningful to users of the aggregate.  For example, EC2 separates networks, storage volumes, and virtual machines.  As a result, it can offer primitives to attach and detach storage volumes to/from virtual machines, and attach/detach virtual machines to/from networks.   These are specific examples of generalized stitching, but these groupings can also support other useful verbs, like cloning storage volumes or suspending virtual machines.
+Thus virtual resources have types that define what we can say about them and do to them.    An aggregate could provide supplementary type-specific operations on slivers, in addition to common operations supported by the base sliver API.   Of course, some virtual resources are programmable, and programs running on them may also expose interfaces and operations.  But in general those interfaces are above the virtual resource management layer and are outside our scope of concern.
+These examples show that a given virtual resource service incorporates its own groupings of the virtual resource element graph into regions (slivers), and these groupings may allow useful operations on a sliver other than creating it and releasing it.    Thus virtual resources have types that define what we can say about them and do to them.    An aggregate could provide supplementary type-specific operations on slivers, in addition to common operations supported by the base sliver API.   In the past, some have seemed to argue that the impracticality of a one-size-fits-all sliver API undermines the whole dream of GENI.  But the notion of subtyping has been proven in many other contexts and should be comfortable here as well.
+But sliver is a very abstract abstraction.  There will be other kinds of slivers that don't look like these examples.  There will be aggregates whose mapping to slivers is unclear, including some (like OpenFlow) whose functions have little to do with resource allocation.
+The Vision Slide says that GENI will support heterogeneous deeply programmable virtualized infrastructure resources.  What are those?  We do not know.  But they are heterogeneous, so there could be many different kinds.  And if the architecture is to have impact over more than a few years, then it must accommodate resources that have not been invented yet.  We do not know what these will look like.
+Of course, some virtual resources are programmable, and programs running on them may also expose interfaces and operations.  But in general those interfaces are above the virtual resource management layer and are outside our scope of concern.
+What we do know is that it must be possible to describe these new virtual resources using a semantic model, and that the description will be a graph of elements and edges representing relationships among the elements.  And if we have a stitching architecture that can propagate labels across edges, then the graph will be partitionable.    And if the graph is partitionable, then it will be convenient to partition it into regions in order to allow the possibility that we might use the aggregate API to operate on different regions independently of other regions.  For example, we can add virtual resources to a slice by attaching a new region, without changing anything about the graph as it exists.  And we can remove virtual resources from a slice by detaching a region, without changing anything about the rest of the graph as it exists.  But we can't say in advance what the regions might represent, or what the various type-specific sliver APIs might be (except for the ones we understand now).
+In the past, some have seemed to argue that the impracticality of a one-size-fits-all sliver API undermines the whole dream of GENI.  But the notion of subtyping has been proven in many other contexts and should be comfortable here as well.
 == The Boundary Between Software and Semantic Specifications ==
 …
 A key property of ORCA resource leases is that they expire if the client does not renew them.  That property is important for GENI, but is out of scope for this discussion.   The set of slivers in an ORCA lease may be changed in various ways when the lease is renewed (extended).  This is one way to grow and shrink slices in ORCA.  However, I now believe that the idea of multiple slivers per lease was a mistake.  It complicated the code and caused a lot of unnecessary debugging effort (in 2005), is useless for networks, and makes it impossible to change some slivers independently of other slivers if they are in the same lease.  In GENI we always use ORCA with one sliver per lease.  Used in this way, an ORCA lease is a pretty close analogue of a sliver.  One can grow slices by adding leases (slivers), and shrink slices by closing leases or allowing them to expire.
 Recently people have started saying that ORCA does not have UpdateSliver, but I am not sure if they are right because I still don't know what they mean by UpdateSliver.   ORCA defines another operation on a lease (sliver), called Modify, that has never yet been fully implemented.  Modify was intended as a hook for pluggable type-specific actions on the slivers in a lease.  One might think of it as sort of a kitchen-sink ioctl.  But this seems different from the UpdateSliver planned for the AM-API.  An ORCA slice can have many slivers at the same AM, and can create and release them independently, so the stated motivation for the AM-API UpdateSliver does not seem to apply. (?)
+Recently people have started saying that ORCA does not have the UpdateSliver function.  An ORCA slice can have many slivers at the same AM, and can create and release them independently, so ORCA has the function of UpdateSliver to grow or shrink a slice at an aggregate.  Also, a caller can change certain sliver parameters at lease extension time, which may cover other planned functions of UpdateSliver. ORCA defines another operation on a lease (sliver), called Modify, which has never yet been fully implemented.  Modify is intended as a hook for pluggable type-specific actions on the slivers in a lease.  One might think of it as sort of a kitchen-sink ioctl.  But this seems different from the UpdateSliver planned for the AM-API.