Changes between Version 2 and Version 3 of AaronHelsinger/GAPI_AM_API_DRAFT

03/22/12 13:43:58 (9 years ago)
Aaron Helsinger



  • AaronHelsinger/GAPI_AM_API_DRAFT

    v2 v3  
    4646 - ADOPTED: Change Set I3: !CreateSlivers return becomes a struct, adds sliver expiration
    4747 - Change Set I4: !CreateSlivers optionally does not start resources.
    48  - Withdrawn: [#ChangeSetJ:Proxyaggregatemanagersaresupported Change Set J]: Support proxy aggregates with 1 new option and 1 new !GetVersion entry
     48 - Withdrawn: [#Withdrawn:ChangeSetJ Change Set J]: Support proxy aggregates with 1 new option and 1 new !GetVersion entry
    4949 - ADOPTED: [#ChangeSetK:Standardizecertificatesandcredentials Change Set K]: Standardize certificate contents, etc
    5050  - Include a real serial number, holder email, holder uuid, and optionally authority URL in certificates
    402402The proposal here elicited concerns (the method !ActOnSlivers is an ioctl, and the states mix allocation and operational states).
    404 For a newer alternative proposal, see
     404A later alternative proposal, see
     406At the GEC13 coding sprint, a variant on the above was approved. It is documented here as Change Set F3.
     408A variant on the operational states proposal is defines as Change Set F4 and documented here:
    406410== Motivation ==
    409413== Change Set F1: Define Sliver States ==
     414'''This change is superceded by Change Sets F3 and F4.'''
    410416Currently the AM API defines several possible states as valid returns in !SliversStatus: {{{configuring}}}, {{{ready}}}, {{{unknown}}}, and {{{failed}}}. This change changes and expands that list of valid states, and explicitly defines the expected states after each AM API method call. Additionally, this change provides a mechanism for aggregates to supply their own states.
    412418The GENI AM API can be thought of as manipulating slivers. As such, each method potentially changes the state of 1 or more slivers. With the changes proposed here, several of the methods return a new {{{geni_status}}} field, whose value is one of the standard GENI sliver status values. Aggregates must use one of the standard GENI values for that return.
    414 {{{geni_status}}} legal values (case insensitive):
    415  - {{{uninitialized}}}: This is the state before any AM-local operation for this slice.
    416  - {{{ticketed}}}: The resources are reserved for the slice, but not currently provisioned for the slice. Slivers are {{{ticketed}}} after !GetTicket, !UpdateTicket, or after !UpdateSlivers. Note in particular that a slice may have some resources that are {{{ready}}} and others which are {{{ticketed}}} after an !UpdateSlivers call: we call the whole slice {{{ticketed}}} in this case.
    417  - {{{allocated}}}: The sliver(s) are currently provisioned for the slice, but not necessarily fully ready for experimental use (eg, not booted). This is the state after !RedeemTicket, or after !CreateSlivers with the {{{geni_donotstart}}} option.
    418  - {{{ready}}}: The resources are ready for experimental use, as in after !CreateSlivers completes any booting or starting. Similarly after !ActOnSlivers with the {{{start}}} command. Note that each of those methods starts a process that may take significant time to complete. During that time the sliver will not yet be {{{ready}}}.
    419  - {{{closed}}}: When the slice was previously provisioned resources, which have now expired or been de-allocated with !DeleteSlivers, we call the sliver {{{closed}}}. Note that this state is rarely seen in practice - aggregates do not respond in this API to queries about slices that do not currently have outstanding allocations or tickets.
    420  - {{{changing}}}: This is the state of a sliver in transition. For example, while a machine is booting (changing from {{{allocated}}} to {{{ready}}}). This state used to be known as {{{configuring}}}.
    421  - {{{shutdown}}}: This is the state of a sliver after the Shutdown operation - the sliver is still allocated to the slice, possibly still booted and configured for the slice, but is not available for experimental use. And administrator must intervene to recover or delete the slivers.
    422  - {{{failed}}}: When an operation fails leaving the sliver unusable and requiring administrative intervention, it will be marked {{{failed}}}.
    423  - {{{unknown}}}: If the aggregate does not know the state of a sliver, it will be marked {{{unknown}}}. This state may be transitive, or may require an admin to recover.
    425 As in previous versions of this API, the state of the full set of slivers in a slice at an aggregate is a roll-up of the states of each sliver. For each of {{{ticketed}}}, {{{allocated}}}, and {{{ready}}}, the set of slivers is only in that state if all individual slivers are in that state. If any sliver is {{{shutdown}}} or {{{failed}}} or {{{changing}}} (in order of decreasing precedence), then the set of slivers is in that state. If all slivers are {{{unknown}}} or {{{closed}}}, then the slice at this aggregate is {{{unknown}}} or {{{closed}}}.
    426  - If not all resources in the sliver/slice can be moved to the desired next state, then the call fails.
    427  - When moving from state 1 to 2, the slice is in state 1 until all slivers are in state 2 (EG moving from {{{ticketed}}}->{{{allocated}}}).
    429 Aggregates are free to ''also'' return an aggregate specific status - either in an AM-specifically-named entry, or in {{{am_specific_status}}}. Such values should be thought of as sub-states within the GENI state. For example, where the GENI state might be {{{changing}}}, the AM specific state might also be {{{imaging}}} or {{{booting}}}. Methods which accept a state (!ActOnSlivers) may accept either one of the {{{geni_status}}} values, or an aggregate specific value. Aggregates must document the meaning and use of aggregate specific status values.
    431 State changes by method:
    432  - !GetTicket: From {{{uninitialized}}} to {{{ticketed}}}
    433  - !UpdateTicket: From {{{ticketed}}} to {{{ticketed}}}
    434  - !ReleaseTicket: From {{{ticketed}}} to {{{uninitialized}}} (or {{{allocated}}} if this was an update)
    435  - !RedeemTicket: From {{{ticketed}}} to {{{allocated}}}
    436  - !CreateSlivers: From {{{uninitialized}}} to {{{allocated}}}
    437   - And then to {{{ready}}} via {{{changing}}} if the {{{geni_donotstart}}} option is not supplied
    438  - !UpdateSlivers: From {{{allocated}}} or {{{ready}}} to {{{ticketed}}}
    439  - !DeleteSlivers: From {{{ready}}} or {{{allocated}}} (or {{{changing}}}, etc) to {{{closed}}} (not {{{ticketed}}})
    440  - Shutdown: From {{{allocated}}} or {{{ready}}} to {{{shutdown}}}
    441 Note that {{{changing}}} or {{{unknown}}} may be a source state for any of these methods. Operations may fail, leaving a sliver {{{failed}}}, and operations may take time leaving a sliver {{{changing}}} for some time.
    443 Methods and state transitions as a picture:
    445 [[Image(sliver-states.jpg)]]
    447 {{{
    448 #!comment
    449  - {{{uninitialized}}} -> (!GetTicket) -> {{{ticketed}}} (you have a ticket)
    450   - and back via !ReleaseTicket
    451  - {{{ticketed -> (!UpdateTicket) -> {{{ticketed}}}
    452  - {{{ticketed -> (!RedeemTicket) -> {{{allocated}}} (you have slivers)
    453  - {{{uninitialized}}} -> (!CreateSlivers) -> {{{allocated}}} and then via {{{changing}}} to {{{ready}}}
    454  - {{{allocated}}} (or {{{ticketed}}} when you also have {{{allocated}}} slivers)->(!DeleteSlivers) -> {{{closed}}}
    455  - {{{allocated}}} (or some {{{allocated}}} and some {{{ticketed}}}}) -> (Shutdown) -> {{{shutdown}}}
    456  - {{{shutdown}}} -> [some operator action] -> {{{closed}}} or {{{allocated}}}
    457  - {{{allocated}}} -> (!UpdateSlivers) -> whole is called {{{ticketed}}}, some slivers are {{{allocated}}} and some {{{ticketed}}}
    458  - Some slivers {{{allocated}} and some {{{ticketed}}} -> (!UpdateTicket) -> {{{allocated}}}+{{{ticketed}}}
    459  - {{{allocated}}}+{{{ticketed}}} -> (!ReleaseTicket) -> {{{allocated}}}
    460  - {{{allocated}}}+{{{ticketed}}} -> (!RedeemTicket) -> {{{allocated}}}
    461 }}}
    463 Note: some resources may not require an explicit 'start' operation. In this case !CreateSlivers may leave some slivers {{{ready}}}, skipping right past {{{allocated}}}.
    465 Summary of changes:
    466  - {{{configuring}}} becomes {{{changing}}}, which can be used in many other cases, in returns from !SliversStatus
    467  - New states {{{uninitialized}}}, {{{ticketed}}}, {{{allocated}}}, {{{closed}}}, and {{{shutdown}}} are added
    468  - State transitions for each method are defined
    469  - {{{am_specific_status}}} optional return defined
    470  - {{{geni_status}}} is returned by !SliversStatus, !RedeemTicket, !UpdateSlivers, !CreateSlivers, and !ActOnSlivers (if all relevant change sets are adopted).
    472420== Change Set F2: !ActOnSlivers ==
     421'''This change is superceded by Change Sets F3 and F4.'''
    473423This change introduces a new method, providing a generic way to act on slivers in an AM or resource type specific way. This method shall be used to 'start' or 'stop' or 'restart' resources that have been allocated but not started by !CreateSlivers or !RedeemTicket. It may also be used to change the state of slivers (or their contained resources) in an aggregate or resource specific way. Some aggregates may use this method to change configuration details of allocated resources. This might include changing acceptable login keys.
    475 !ActOnSlivers takes a {{{command}}}, {{{urn}}}, {{{state}}}, and {{{options}}}. The method return is a struct that includes the {{{urn}}}, {{{geni_status}}} of the sliver(s), and any other AM and operation specific options. The URN may be a slice urn, meaning all slivers in that slice at this AM are effected. Or the URN may be a particular sliver URN. The {{{state}}} argument is one of the {{{geni_status}}} values, or an AM-specific value. The {{{state}}} meaning depends on the {{{command}}}, but typically indicates the desired or resulting new state of the sliver(s). If the AM wishes to return an aggregate specific sliver status, it should still return a valid {{{geni_status}}}, and use an additional entry to also return the aggregate specific state. The {{{command}}} argument is aggregate defined. This API does not specify how aggregates advertise valid commands.
    477 Three particular commands are specified however: {{{start}}}, {{{stop}}}, and {{{restart}}} (case insensitive). If an aggregate provides resources which require an explicit action to make {{{allocated}}} resources {{{ready}}} for experimenter use (booting, applying a configuration change) then the aggregate must make that operation available using these commands. These commands are used after !RedeemSlivers or when the {{{geni_donotstart}}} option is supplied to !CreateSlivers for example.
    479 For example, to start allocated resources:
    480 {{{
    481 Arguments:
    482   command = start
    483   urn = <slice or sliver urn>
    484   state = ready
    485   options = <none required>
    486 Result:
    487   urn = <same as input>
    488   geni_status = changing or ready on success
    489 }}}
    491 FIXME: After !UpdateSlivers, does {{{start}}} on the slice start only new stuff? How do changes to existing resources take effect? Does {{{restart}}} on the slice restart everything or only changed things? Must the experimenter selectively {{{restart}}} changed things and use {{{start}}} to start new things?
    493 Method signature:
    494 {{{
    495 struct ActOnSlivers(string command, string credentials[], string urn, string state, struct options)
    496 }}}
    498 Return struct:
    499 {{{
    500 {
    501  string urn=<urn of sliver or slice>,
    502  string geni_status=<new state of the slivers>,
    503  <other entries specific to the AM or resources - specifically am_specific_status>
    504 }
    505 }}}
     425== Change Set F3: Sliver Allocation States and methods ==
     426'''This change was discussed and adopted at the GEC13 Coding Sprint.'''
     427For meeting minutes, see: [wiki:GEC13Agenda/CodingSprint the GEC13 Coding Sprint agenda page].
     429 - We agreed to use two kinds of states: allocation states, and operational states. We put off discussion of operational states (ie is the node booted), noting however that this is critical.
     430 - We debated whether the API should specify a limited number of states, or allow for aggregate or resource specific states. We agreed that for allocation states, the API should define a limited set of states, while operational states might be more permissive.
     431 - We discussed the pros and cons of including a single all-in-one method to change allocation states, or a single method per desired transition. Rob Ricci noted at least 1 case where there are 2 paths between the same 2 allocation states with very different meaning. As a result, we agreed to use a separate method per allocation state change.
     433The key result of the discussion was agreement on 3 allocation states for slivers and enumeration of methods for transitioning between those states. We did not select names for the states or the methods. Here is a diagram with placeholder labels for methods and states, which illustrates the decisions described below.
     437We spent a long time discussing allocation states of slivers. For reference, we looked at [ Jon Duerig's slides] from the AM API session (unpresented). We finally agreed there are 2 or 3 or 4 allocation states for slivers, depending on how you count.
     438 1. Start (alternatively called 'null' or 'unallocated'). The sliver does not exist. This is the small black circle in typical state diagrams.
     439 2. Allocated (alternatively called 'offered' or 'promised'). The sliver exists, defines particular resources, and is in a sliver. The aggregate has not (if possible) done any time consuming or expensive work to instantiate the resources, provision them, or make it difficult to revert the slice to the state prior to allocating this sliver. This state is what the aggregate is offering the experimenter.
     440 X. ~~Accepted.~~ We chose NOT to include this intermediary state, occasionally called 'accepted', where the experimenter has accepted the aggregate's offer of resources, but the resources have still not been provisioned.
     441 3. Provisioned. The aggregate has started instantiating resources, and otherwise making changes to resources and the slice to make the resources available to the experimenter. At this point, operational states are valid to specify further when the resources are available for experimenter use.
     443Having ruled out the 'accepted' state as unnecessary, we were left with 3 states, the first being the 'null' state. We spent a long time clarifying the semantics of each state, but could not quite agree on names for these states. We took to referring to the states by number, leaving the honor of naming the states to the API documenter.
     445The key change is the addition of state 2, representing resources that have been allocated to a slice without provisioning the resources. This represents a cheap and un-doable resource allocation, such as we previously discussed in the context of tickets. This compares reasonably well to the 'transaction' proposal written up by Gary Wong ( When a sliver is created and moved into state 2, the aggregate produces a manifest RSpec identifying which resources are included in the sliver. This is something like the current !CreateSlivers, except that it does not provision nor start the resources. These resources are exclusively available to the containing sliver, but are not ready for use. In particular, allocating a sliver should be a cheap and quick operation, which the aggregate can readily un-do without impacting the state of slivers which are fully provisioned. For some aggregates, transitioning to this state may be a no-op.
     447States 2 and 3 have aggregate and possibly resource specific timeouts. By convention the state 2 timeout is typically short, like the {{{redeem_before}}} in ProtoGENI tickets, or the {{{commit_by}}} in Gary's transactions proposal. The state 3 timeout is the existing sliver expiration. If the client does not transition the sliver from state 2 to 3 before the end of the state 2 timeout, the sliver reverts to unallocated. If the experimenter needs more time, the experimenter should be allowed to request a renewal of either timeout.  Note that typically the sliver expiration time (timeout for state 3, provisioned) will be notably longer than the timeout for state 2, allocated.
     449The AM API does not yet have a method for moving from state 2, to state 3. State 3 is the state of the sliver allocation after the aggregate begins to instantiate the sliver. Note that fully provisioning a sliver may take noticeable time. This state also includes a timeout - the sliver expiration time (which is not necessarily related to the time it takes to provision a resource). !RenewSlivers extends this timeout. For some aggregates and resource types, moving to this state from state 2 (allocated) may be a no-op.
     451If the transition from one state to another fails, the sliver shall remain in its original state.
     453These are the only operational states supported by this API. Since the state transitions are finite, but include potentially multiple transitions between the same 2 states, this API uses separate methods to perform each state transition, rather than a single method for requesting a new state for the sliver. We did not agree on method names for these transitions (we agreed to leave it as an exercise for the API documenter). Logically however these methods are something like:
     454 1. !CreateSlivers moves 1+ slivers from unallocated (state 1)  to allocated (state 2). This method can be described as creating an instance of the state machine for each sliver. If the aggregate cannot fully satisfy the request, the whole request fails. This is a change from the version 2 !CreateSliver, which also provisioned the resources, and 'started' them. That is !CreateSlivers does 1 of the 3 things that it did previously.
     455 2. !DeleteSlivers moves 1+ slivers from either state 2 or 3, back to state 1. This is similar to the AM API version 2 !DeleteSliver.
     456 3. !RenewSomething (name TBD) requests an extended timeout for slivers in state 2 - the allocated but not provisioned state.
     457 4. !RenewSlivers requests an extended timeout for slivers in state 3 - the provisioned state, as before.
     458 5. !SomethingSlivers (name TBD) moves 1+ slivers from state 2 (allocated) to state 3 (provisioned). This is some of what version 2 !CreateSliver did. Note however that this does not 'start' the resources, or otherwise change their operational state. This method only fully instantiates the resources in the slice. This may be a no-op for some aggregates or resources.
     460These states apply to each sliver individually. Logically, the state transition methods then take a single sliver URN. For convenience, we agreed to allow a list of sliver URNs, or a slice URN as a simple alias for all slivers in this slice at this aggregate.
     462Since each method may operate on multiple slivers, each of these methods returns a list of structs as the value:
     464value = [
     465  {
     466   geni_sliver_urn,
     467   geni_allocation_status,
     468   geni_expires <time when the sliver expires from its current state>,
     469   <others AM or method specific>
     470   <Method 5 SomethingSlivers returns geni_operational_status>
     471  },
     472  ...
     476!CreateSlivers returns a single manifest RSpec, plus the above list of structs.
     478We spent a while discussing what it means for an aggregate to operate on multiple slivers at once. Can the aggregate partially succeed? What will experimenters want? We agreed that aggregates must be consistent across all these methods whether they are all or nothing, or support partial success.
     480These methods all take a new option (aggregates must support it, clients do not need to supply it):
     482   geni_atomic = True/False, default True
     484If true, the client is requesting that the aggregate either fully satisfy the request, moving all listed slivers to the desired state, or fully fail the request, leaving all slivers in their original state.
     485If the aggregate cannot guarantee all or nothing success or failure given the included slivers and resource types, the aggregate shall fail the request, returning an appropriate error code. If this option is false, then some slivers may transition to the new state, and some note. Aggregates must examine the return closely to know the state of their slivers.
     487[FIXME: Is this what we agreed to? Or should the default be False? Should there be a !GetVersion option for advertising the AM default?]
     489'''Note''': !CreateSlivers remains all or nothing (either the aggregate can allocate all desired resources as requested, or the call fails).
     491'''Note''': These calls are synchronous - when they return, the slivers shall be in their final state. In particular, the transition from state 2 to 3 (allocated to provisioned) should be quick. The resource that is now in the 'provisioned' state may take a long time to actually be ready for operational use (e.g. imaging and booting the node) -- this remains true as in version 2 after !CreateSliver.
     493!SliverStatus, where it currently includes {{{geni_status}}}, shall now return {{{geni_allocation_status}}} with one of the above defined values, and {{{geni_operational_status}}}. The values of {{{geni_operational_status}}} are still under discussion.
     495Currently, !SliverStatus returns a single {{{geni_status}}} for the entire slice at this aggregate. With this change, the top-level allocation status for the slice is not defined, and that field is not required.
     498== Change Set F4: Sliver Operational States and methods ==
     499This proposal was discussed on the geni-dev mailing list:
     501The canonical source for documentation on this proposal is here:
    507503= Adopted: Change Set G: Credentials are general authorization tokens. =