[[PageOutline]] = GENI AM API Changes from version 2 to version 3 = This page documents changes to the GENI Aggregate Manager API from [wiki:GAPI_AM_API_V2 version 2] to [wiki:GAPI_AM_API_V3 version 3]. It consists of the text describing the changes to the API which was used to define and adopt the changes to the API for version 3. * [wiki:GAPI_AM_API_V2 Version 2 of the GENI Aggregate Manager API] * [wiki:GAPI_AM_API_V3 Version 3 of the GENI Aggregate Manager API] * [wiki:GAPI_AM_API_DRAFT Draft changes to the GENI Aggregate Manager API] for future versions. That is where this text originally was written. == Summary of Changes == This version of the AM API includes multiple changes since version 2 of the AM API. For experimenters, a few things are worth noting: - The old `CreateSliver` operation has now been broken into 3 steps: - `Allocate` to reserve the resources - `Provision` to instantiate the resources, which may take time to complete - `PerformOperationalAction(geni_start)` to start (e.g. boot) the resources, which also may take time to complete - Use the new intermediate `geni_allocated` state after `Allocate` to coordinate reservations across aggregates, e.g. to ensure another aggregate can give you nodes to be the other end of a requested link. - Multiple methods have been renamed, typically by removing the `Sliver` term from method names. - Sliver expiration is available in the return from multiple other methods, like `Provision` - You no longer use `ListResources` to see the contents of your slice - use `Describe` instead. `ListResources` is only for the AM's Ad RSpec. - Experimenters can select when to start or stop resources, e.g. when to boot a VM. Consult the operational state machine in the AM's Ad RSpec, and use `PerformOperationalAction`. - SSH login names and keys should be available in manifest RSpecs in a standard format. - Slice name restrictions have been codified and standardized. - Slice names are <=19 characters, only alphanumeric plus hyphen (no hyphen in first character): `'^[a-zA-Z0-9][-a-zA-Z0-9]+$'` Tool developers should also be aware: - The `credentials` argument to methods is now a struct, including a type and version for each credential. AMs should advertise which credential types they accept. SAs should advertise which type they provide. - Aggregates may have their own operational states and actions. The Ad RSpec should define these, probably by `sliver_type`. Listing of the Change Sets: - [#ChangeSetD:Sliver-specificoperations Change Set D: Slivers]: Change methods to clarify that there may be multiple slivers per slice at an AM, and to allow operating on individual slivers - [#ChangeSetF3:SliverAllocationStatesandmethods Change Set F3]: Sliver Allocation States and methods - [#ChangeSetF4:SliverOperationsMethod Change Set F4]: Method to perform Sliver Operational actions - [#ChangeSetF5:SliverOperationalStates Change Set F5]: Sliver Operational States - [#Adopted:ChangeSetG:Credentialsaregeneralauthorizationtokens. Change Set G]: Generalize the credentials argument, allowing ABAC support - Change Set I1: !SliversStatus return structure includes sliver expiration - Change Set I2: !SliversStatus return includes SSH logins/key for nodes that support SSH access - Change Set I3: !CreateSlivers return becomes a struct, adds sliver expiration - [#Adopted:ChangeSetK:Standardizecertificatesandcredentials Change Set K]: Standardize certificate contents, etc. - Include a real serial number, holder email, holder uuid, and optionally authority URL in certificates - Define slice ID as the UUID plus URN in slice certificates - Define slice name, sliver name, and user name restrictions, and similar for URNs - Publish schemas for credentials and certificates - [#ChangeSetM:NewMethodSignatures Change Set M]: New method signatures, incorporating all other adopted change sets == Adopted Change Details == === Change Set D: Sliver-specific operations === A slice may have multiple slivers at a single AM. Experimenters can operate on slivers independently, if the AM supports it. AMs define slivers as groups of resources, and give them locally unique sliver_urns for identifying that group of resources. This change was briefly discussed at GEC13, and remains open for discussion. See http://lists.geni.net/pipermail/dev/2012-March/000593.html as well. ==== Motivation ==== This change set was discussed at the [http://groups.geni.net/geni/wiki/GEC12GeniAmAPI GEC12 AM API session]. The current AM API calls take a Slice URN, and operate on all resources under that label at the given aggregate - all the resources for that slice at the aggregate are allocated, renewed, and deleted together. There is no provision for releasing some of the resources allocated to the slice at that aggregate, or for adding new resources to the reservation for that slice at a given aggregate. This ties closely to the precise definition of a Slice vs a Sliver. The current AM API methods imply that a sliver represents all resources at an aggregate for a given slice. However, this does not match the definition that previous GENI documents have used, nor the functionality that experimenters desire. Previous GENI documents have used this definition: ''A sliver is the smallest set of resources at an aggregate that can be independently reserved and allocated. A given slice may contain multiple slivers at a single aggregate. A sliver may contain multiple components.'' Given this definition, the current AM API methods in fact operate on a group of slivers. This change set provides a means for experimenters to operate on individual slivers within their slice at a given aggregate. ==== Define sliver ==== A Sliver is an aggregate defined grouping of resources within a slice at this aggregate, whose URN identifies the sliver, and can be used as an argument to methods such as Delete or Renew, and whose status can be independently reported in the return from !SliversStatus. The AM defines 1 or more of these groupings to satisfy a given resource request for a slice. All reserved resources are directly contained by exactly 1 such sliver container, which is in precisely 1 slice. Slivers are identified by an aggregate selected URN. See other change proposals for details on standardizing such URNs. == Addressable Slivers == Considering the clarified sliver definition, several API names are misleading. This change proposal modifies those method names to clarify that they may work with multiple slivers. Additionally, some methods can logically operate on individual slivers: this change modifies those methods' arguments to allow specifying a particular sliver. 1. Rename some existing methods to clarify that they act on 1+ slivers: - !CreateSliver -> Allocate, Provision (see below) - !RenewSliver -> Renew - !DeleteSliver -> Delete - !SliverStatus -> Status 2. Some methods that take {{{slice_urn}}} now take a {{{urn}}} that may be a slice or sliver: - E.G. `Renew`, `Delete`, `Status` - AMs are responsible for distinguishing whether the request operates on a slice or a sliver (see [#Adopted:ChangeSetK:Standardizecertificatesandcredentials Change Set K] which defines how slice and sliver URNs differ). - AMs are free to refuse to `Renew`, `Delete`, or provide status on an individual sliver, if the local AM or that resource type does not support it. - AMs should return an error message if the operation is not supported. - See below for ways that aggregates advertise their supported behavior. 3. Define new returns from !GetVersion, for specifying the semantics of operating on individual slivers. These returns are only required if the aggregate supports non-standard behavior. Aggregates that support the default behavior may omit these !GetVersion returns. - `geni_single_allocation`: : When true (not default), when performing one of (`Describe`, `Allocate`, `Renew`, `Provision`, `Delete`), such an AM requires you to include either the slice urn or the urn of all the slivers in the same state. If you attempt to run one of those operations on just some slivers in a given state, such an AM will return an error. For example, at an AM where `geni_single_allocation` is true you must `Provision` all `geni_allocated` slivers at once. If you supply a list of sliver URNs to `Provision` that is only 'some' of the `geni_allocated` slivers for this slice at this AM, then the AM will return an error. Similarly, such an aggregate would return an error from `Describe` if you request a set of sliver URNs that is only some of the `geni_provisioned` slivers. - `geni_allocate`: A string, one of fixed set of possible values. Default is `geni_single`. This option defines whether this AM allows adding slivers to slices at an AM (i.e. calling `Allocate()` multiple times, without first deleting the allocated slivers). Possible values: - `geni_single`: Performing multiple `Allocate`s without a delete is an error condition because the aggregate only supports a single sliver per slice or does not allow incrementally adding new slivers. This is the AM API v2 behavior. - `geni_disjoint`: Additional calls to `Allocate` must be disjoint from slivers allocated with previous calls (no references or dependencies on existing slivers). The topologies must be disjoint in that there can be no connection or other reference from one topology to the other. - `geni_many`: Multiple slivers can exist and be incrementally added, including those which connect or overlap in some way. New aggregates should strive for this capability. Note that these options interact with `geni_best_effort` defined in Change Set F3, defining whether operations on a set of slivers (or whole slice) should either all fail/succeed together, or if some slivers can succeed and others fail. Default behavior is false - all slivers succeed or all fail. If the aggregate cannot guarantee all or nothing success or failure given the included slivers and resource types, the aggregate shall fail the request, returning an appropriate error code. If this option is true, then some slivers may transition to the new state, and some not. Experimenters must examine the return closely to know the state of their slivers - such methods will return data about all requested slivers. Aggregates may optionally return `geni_error` for each sliver for which the operation failed, to indicate further details. Note that `Allocate` is always all-or-nothing. It is expected that many aggregates will implement one of the following combinations of options: - `geni_best_effort` = true, `geni_allocate` = `geni_many`, `geni_single_allocation` = false (E.G. FOAM, !PlanetLab) - `geni_best_effort` = false, `geni_allocate` = `geni_disjoint`, `geni_single_allocation` = true (E.G. ProtoGENI) === Change Set F: Support AM and resource-type specific methods. === Define the control API (the AM API) as about moving slivers through various states at an AM. The proposal originally here elicited concerns (the method !ActOnSlivers is an ioctl, and the states mix allocation and operational states). A later alternative proposal was proposed via email: http://lists.geni.net/pipermail/dev/2012-March/000721.html At the GEC13 coding sprint, a variant on the above was approved. It is documented here as [#ChangeSetF3:SliverAllocationStatesandmethods Change Set F3]. A variant on the operational states proposal is defined as Change Set F4 and documented here: https://openflow.stanford.edu/display/FOAM/GENI+-+PerformOperationalAction ==== Motivation ==== AM API methods logically change the state of the slivers at this AM. But the API is not clear what experimenters should expect, and does not provide easy ways for experimenters to control when and how states change. There is in particular no way to move slivers through states and change them in ways otherwise undefined by the API. ==== Change Set F3: Sliver Allocation States and methods ==== This change was discussed and adopted at the GEC13 Coding Sprint. For meeting minutes, see: [wiki:GEC13Agenda/CodingSprint the GEC13 Coding Sprint agenda page]. - We agreed to use two kinds of states: allocation states, and operational states. We put off discussion of operational states (i.e. is the node booted), noting however that this is critical. See Change Set F4. - We debated whether the API should specify a limited number of states, or allow for aggregate or resource specific states. We agreed that for allocation states, the API should define a limited set of states, while operational states might be more permissive. - We discussed the pros and cons of including a single all-in-one method to change allocation states, or a single method per desired transition. There is at least 1 case where there are 2 paths between the same 2 allocation states with very different meaning. As a result, we agreed to use a separate method per allocation state change. We agreed on 3 allocation states for slivers and an enumeration of methods for transitioning between those states. [[Image(sliver-alloc-states3.jpg)]] Allocation states: 1. `geni_unallocated` (alternatively called 'null'). The sliver does not exist. This is the small black circle in typical state diagrams. 2. `geni_allocated` (alternatively called 'offered' or 'promised'). The sliver exists, defines particular resources, and is in a sliver. The aggregate has not (if possible) done any time consuming or expensive work to instantiate the resources, provision them, or make it difficult to revert the slice to the state prior to allocating this sliver. This state is what the aggregate is offering the experimenter. 3. `geni_provisioned`. The aggregate has started instantiating resources, and otherwise making changes to resources and the slice to make the resources available to the experimenter. At this point, operational states are valid to specify further when the resources are available for experimenter use. {{{ #!comment plus in select situations: 4. `geni_failed`: A call to `Provision` failed for this sliver. The sliver is in indeterminate state. Check the `geni_error` return if provided, or call `Status`. The sliver may require operator action to recover it. }}} The key change is the addition of state 2, representing resources that have been allocated to a slice without provisioning the resources. This represents a cheap and reversible resource allocation, such as we previously discussed in the context of tickets. This compares reasonably well to the 'transaction' proposal written up by Gary Wong (http://www.protogeni.net/trac/protogeni/wiki/AM_API_proposals). When a sliver is created and moved into state 2 (`geni_allocated`), the aggregate produces a manifest RSpec identifying which resources are included in the sliver. This is something like the current !CreateSliver, except that it does not provision nor start the resources. These resources are exclusively available to the containing sliver, but are not ready for use. In particular, allocating a sliver should be a cheap and quick operation, which the aggregate can readily undo without impacting the state of slivers which are fully provisioned. For some aggregates, transitioning to this state may be a no-op. States 2 and 3 (`geni_allocated` and `geni_provisioned`) have aggregate and possibly resource specific timeouts. By convention the `geni_allocated` state timeout is typically short, like the {{{redeem_before}}} in ProtoGENI tickets, or the {{{commit_by}}} in Gary's transactions proposal. The `geni_provisioned` state timeout is the existing sliver expiration. If the client does not transition the sliver from `geni_allocated` to `geni_provisioned` before the end of the `geni_allocated` state timeout, the sliver reverts to `geni_unallocated`. If the experimenter needs more time, the experimenter should be allowed to request a renewal of either timeout. Note that typically the sliver expiration time (timeout for state 3, `geni_provisioned`) will be notably longer than the timeout for state 2, `geni_allocated`. State 3, `geni_provisioned`, is the state of the sliver allocation after the aggregate begins to instantiate the sliver. Note that fully provisioning a sliver may take noticeable time. This state also includes a timeout - the sliver expiration time (which is not necessarily related to the time it takes to provision a resource). !RenewSliver extends this timeout. For some aggregates and resource types, moving to this state from state 2 (`geni_allocated`) may be a no-op. If the transition from one state to another fails, the sliver shall remain in its original state. These are the only allocation states supported by this API. Since the state transitions are finite, but include potentially multiple transitions between the same two states, this API uses separate methods to perform each state transition, rather than a single method for requesting a new state for the sliver. 1. `Allocate` moves 1+ slivers from `geni_unallocated` (state 1) to `geni_allocated` (state 2). This method can be described as creating an instance of the state machine for each sliver. If the aggregate cannot fully satisfy the request, the whole request fails. This is a change from the version 2 !CreateSliver, which also provisioned the resources, and 'started' them. That is `Allocate` does 1 of the 3 things that !CreateSliver did previously. 2. `Delete` moves 1+ slivers from either state 2 or 3 (`geni_allocated` or `geni_provisioned`), back to state 1 (`geni_unallocated`). This is similar to the AM API version 2 !DeleteSliver. 3. `Renew`, when given allocated slivers, requests an extended timeout for slivers in state 2 (`geni_allocated`). 4. `Renew` can also be used to request an extended timeout for slivers in state 3 - the `geni_provisioned` state. That is, this method's semantics can be the same as !RenewSliver from AM API v2. 5. `Provision` moves 1+ slivers from state 2 (`geni_allocated`) to state 3 (`geni_provisioned`). This is some of what version 2 !CreateSliver did. Note however that this does not 'start' the resources, or otherwise change their operational state. This method only fully instantiates the resources in the slice. This may be a no-op for some aggregates or resources. When `Provision` fails for only some slivers, and `geni_best_effort` option was supplied, the aggregate will return the status of each requested sliver individually. The `geni_allocation_state` for slivers that failed will remain `geni_allocated`. This typically suggests that the experimenter may retry the call. For some aggregates or resource types, the sliver may be 'dead', and `Provision` may never succeed. Experimenters should check `geni_error` for more information. {{{ #!comment When `Provision` fails for only some slivers, and `geni_best_effort` option was supplied, the aggregate will return the status of each requested sliver individually. The `geni_allocation_state` for slivers that failed may remain `geni_allocated`. This typically suggests that the experimenter may retry the call. The state may instead become `geni_unallocated`. This indicates that the sliver is being deleted, and all resource reservations under this sliver have been freed. Alternatively, the state may be `geni_failed`, indicating an error or refusal by the AM to complete the operation. An operator may need to intervene. Alternatively, the sliver may shortly self transition and become either `geni_allocated` again or `geni_unallocated`. Experimenters should check `geni_error` for more information. }}} These states apply to each sliver individually. Logically, the state transition methods then take a single sliver URN. For convenience, these methods accept a list of sliver URNs, or a slice URN as a simple alias for all slivers in this slice at this aggregate. Since each method may operate on multiple slivers, each of these methods returns a list of structs as the value: {{{ value = [ { geni_sliver_urn: , geni_allocation_status: , geni_expires: