Changes between Version 10 and Version 11 of GEC13Agenda/CodingSprint


Ignore:
Timestamp:
03/23/12 14:35:54 (12 years ago)
Author:
Aaron Helsinger
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GEC13Agenda/CodingSprint

    v10 v11  
    5555
    5656For more information, contact mailto:ignite@mozillafoundation.org.
     57
     58== Coding Sprint Meeting Summary ==
     59
     60=== Attendees ===
     61 - Rob Ricci (U Utah, ProtoGENI)
     62 - Jon Duerig (U Utah, ProtoGENI)
     63 - Matt Strum (U Utah, Flack)
     64 - Victor Orlikowski (Duke, Orca)
     65 - Prateek Jaipuria (Duke, Orca)
     66 - Jeff Chase (Duke, Orca)
     67 - Nick Bastin (!OpenFlow, FOAM)
     68 - Marshall Brinn (GPO)
     69 - Sarah Edwards (GPO)
     70 - Tom Mitchell (GPO)
     71 - Aaron Helsinger (GPO)
     72 - Max Ott (arrived late) (NICTA, OMF)
     73 - Tom Rothe (Ofelia)
     74 - On phone: Gary Wong & Leigh Stoller (U Utah, ProtoGENI)
     75 - others
     76
     77=== Summary ===
     78At the developers portion of the Coding Sprint session around 20 aggregate and tool developers actively debated several additions to the Aggregate Manager API, agreeing on several key additions to the next version of the AM API, version 3. This session followed the Tuesday afternoon [wiki:GEC13Agenda/AMAPIRevisions AM API session], allowing for further in depth discussion of key topics. The group met for about 4 hours. The group first discussed some details of proposals adopted during the AM API session, and then discussed additional proposals postponed during the AM API session.
     79
     80We agreed on multiple changes to be included in AM API version 3.  Details of these proposals, and new proposals for AM API version 3 will be discussed on the GENI developers mailing list. Proposals not resolved relatively soon will be discussed for inclusion in a later version of the AM API.
     81
     82 - '''Change Set G''': Advertise and pass credentials with an explicit type and version.
     83We agreed on how Aggregates will advertise supported credential types, and changed the credentials argument for each AM API method to be a list of structures with an explicit type and version.
     84
     85 - '''Change Set I2''': Return SSH login keys in a standard way.
     86We agreed to adopt a modified version of the RSpec extension drafted by Jon Duerig (http://lists.geni.net/pipermail/dev/2012-March/000727.html), requiring aggregates that use SSH logins for resources to return valid login names and keys in the manifest RSpec. This extension supplies SSH login information to clients in a standard way.
     87
     88 - '''Change Set K''': Use a UUID to uniquely identify slices over time.
     89Currently URNs identify slices, but they are not unique over time. This change adds UUIDs to slice identifiers. URNs remain the identifier for slices in AM API calls, and uniquely identify slices for a moment in time. UUID plus URN together uniquely identify slices over time, and can be used for forensics, or for use by authorization modules, such as ABAC. UUIDs alone should not be used to identify slices, but only in conjunction with the URN, which scopes the UUID to the authority which generated the UUID. The UUID essentially may only be used to distinguish between slices with identical URNs.
     90
     91 - '''Change Set K''': Include UUID and email in slice and user certificates.
     92We adopted the proposal as on [wiki:GAPI_AM_API_DRAFT#Changes the AM API Draft Changes wiki page], to include the UUID and email address in the subjectAltName of both slice and user certificates.
     93As with the use of UUIDs for slices, the user UUID should only be used in conjunction with the user URN. However, user URNs are required to be temporally and globally unique.
     94
     95 - '''Change Set F''': Define sliver allocation states and state change methods.
     96We discussed Nick Bastin's proposal on sliver states and state change methods (http://lists.geni.net/pipermail/dev/2012-March/000730.html).
     97
     98We had a long and useful discussion, which resulted in agreement on a variant of his ideas. We agreed to use two kinds of states: allocation states, and operational states. These states apply to each sliver individually. We defined in particular a new
     99allocation state representing resources that have been allocated to a slice without provisioning the resources. This represents a cheap and un-doable resource allocation, such as we previously discussed in the context of tickets. This compares reasonably well to the 'transaction' proposal written up by Gary Wong (http://www.protogeni.net/trac/protogeni/wiki/AM_API_proposals).
     100
     101We added 2 new methods to the API, and changed the arguments and return value for 4 other methods, to accommodate these new states. In particular, methods return the state of each affected sliver, several methods take a list of sliver URNs or a slice URN. We also changed the semantics of !CreateSlivers, in a way that touches on operational state. !CreateSlivers is intended to NOT 'start' resources (the meaning of 'start' is TBD). We have not yet defined the method for starting resources though.
     102
     103=== Session Details ===
     104
     105==== Change Set G: Specifying credential types ====
     106This discussion follows the adoption [wiki:GEC13Agenda/AMAPIRevisions#MeetingSummary at the AM API session] of [wiki:GAPI_AM_API_DRAFT#Adopted:ChangeSetG:Credentialsaregeneralauthorizationtokens. Change Set G], allowing for multiple credential types in the AM API.
     107That change introduces the problem of aggregates advertising what credential types they support, and aggregates understanding what credential types they have been passed.
     108
     109We agreed that aggregates must advertise what types of credentials they accept so clients know how to gain authorization for API methods.
     110
     111Aggregates are required to return a new entry in !GetVersion:
     112{{{
     113geni_credential_types = <a list of structs>: [
     114  {
     115   geni_type = <string, case insensitive>,
     116   geni_version = <string containing an integer>,
     117   <others fields optionally. EG A URL for more info, or a schema>
     118  }
     119]
     120}}}
     121
     122We agreed that "sfa" slice credentials as defined pre AM API version 3 will have type=`geni_sfa` and version=`2`. "sfa" slice credentials as of AM API version 3 will be type=`geni_sfa`, version=`3`.
     123ABAC credentials as of AM API version 3 will be type=`geni_abac`, version=`1`.
     124
     125For example, an aggregate that accepts ABAC credentials, SFA slice credentials that were issued prior to AM API v3, and SFA slice credentials from AM API version 3, would include this in !GetVersion:
     126
     127{{{
     128geni_credential_types = [
     129  {
     130   geni_type = "geni_sfa",
     131   geni_version = "2"
     132  },
     133  {
     134   geni_type="geni_sfa",
     135   geni_version = "3"
     136  },
     137  {
     138   geni_type="geni_abac",
     139   geni_version="1"
     140  }
     141]
     142}}}
     143
     144Note that there might be multiple {{{geni_type}}} entries to support multiple versions of SFA credentials.
     145Note there might also be multiple kinds of ABAC credentials: identity certificates, attribute certificates, references to those certificates, bundles of those certificates. We briefly considered adding a sub_type field or a flag for 'credentials' that are really references to credentials.
     146
     147===== Credentials argument =====
     148
     149We then discussed whether any changes are required to allow AMs to understand the type of each supplied credential. Several people argued that simple heuristics in the AM would be sufficient, and no changes are needed. But credentials may be in any format, including URLs that reference how to retrieve credentials, hashes to identify pre-cached credentials, etc. As a result credentials may not be easily distinguishable by simply looking at them. Therefore if the API only passed credentials themselves, each AM would have to apply heurisitcs to distinguish them, and those heuristics might differ.
     150
     151Instead, we agreed that the API will require that credentials be explicitly typed. This change makes methods take in the credentials argument a struct:
     152{{{
     153credentials = [
     154   {
     155    geni_type = <string>,
     156    geni_version = <string>,
     157    geni_value = <string>,
     158    <others>
     159   }
     160]
     161}}}
     162
     163Note that the value may be a credential, a URL, an XLink compliant string, etc. Clients are required to identify the type of each credential they supply. Instead of requiring clients to apply similar heuristics, authorities are required to identify credentials they supply with the same type and version fields. Specifically, ProtoGENI representatives suggested that their slice authorities would issue credentials as these structures in the near future.
     164
     165==== Change Set I2: Return SSH logins in the RSpec ====
     166At [wiki:GEC13Agenda/AMAPIRevisions#MeetingSummary the AM API session], we agreed in principle to [wiki:GAPI_AM_API_DRAFT#ChangeSetsHI:Miscothermethodchanges Change Set I2], calling for SSH logins to be returned to the experimenter in a standard way. We agreed this should be an RSpec extension for use in manifest RSpecs.
     167
     168Jon Duerig drafted such an extension, disseminated to the community via [http://lists.geni.net/pipermail/dev/2012-March/000727.html email to the dev list].
     169
     170After a brief discussion, we agreed to adopt this extension. We noted that the extension claims to be general, but in practice is fairly SSH-specific. We noted in particular that the manifest is not a secure way to supply private information like passwords. Instead this extension will primarily be used for transmitting SSH public keys.
     171
     172The extension adds tags within the existing {{{<services> <login>}}} tag, supplying login names, and 1 or more keys to be used with each login name. Each login name may optionally have a URN, identifying the user.
     173
     174We adopted the extension with three changes:
     175 1. Change 'key' to 'public-key'
     176 2. Change 'user' to 'ssh-user'
     177 3. Change 'urn' to 'user-urn'
     178
     179FIXME: Did we agree to change the name of the schema/extension to specify that this is an SSH-specific extension?
     180
     181Jon Duerig took the action to update this extension.
     182
     183==== Change Set K: Use a UUID to uniquely identify slices over time ====
     184This change represents a part of [wiki:GAPI_AM_API_DRAFT#Changes Change Set K] that was briefly discussed at the AM API session, but on which we could not get agreement. After the AM API session, offline discussions seemed to bring us to consensus. We discussed the reasons we can now agree, and then adopted this change.
     185
     186Currently URNs identify slices, but they are not unique over time. This change adds UUIDs to slice identifiers. URNs remain the identifier for slices in AM API calls, and uniquely identify slices for a moment in time. UUID plus URN together uniquely identify slices over time, and can be used for forensics, or for use by authorization modules, such as ABAC. UUIDs alone should not be used to identify slices, but only in conjunction with the URN, which scopes the UUID to the authority which generated the UUID. The UUID essentially may only be used to distinguish between slices with identical URNs.
     187
     188We discussed a couple reasons for this limitation to the use of UUIDs:
     189 1. UUIDs are unique to the generator (i.e. the slice authority), not globally. The URN encodes the authority (slice authority in this case). So only the 2 fields together are actually unique. (Assuming of course that authority names are unique...)
     190 2. There is an ABAC naming problem. Who is allowed to assert attributes of this ID? It should only be the authority that created the ID. So we want to include the authority whenever referring to the ID. Again, the URN covers this. This is the web certificate authority problem: If someone hacks a small-time but trusted authority in Denmark or wherever, they can issue credentials saying they are google.com. Even if Google actually gets its domain name from some larger more trustworthy authority elsewhere.
     191
     192If UUIDs were to be centrally registered, such that authorities had to ensure they were globally unique, and authorities registered the associated URN, then you could use some such service to map from UUID to URN, and from URN to a list of UUIDs, at most 1 of which should be currently active. But we aren't requiring any such service.
     193
     194As far as the AM API is concerned, nothing changes. We still use URN everywhere. This change defines the semantics of the UUID, such that UUID is useful only in the narrow way of distinguishing 2 slices with identical URNs.
     195
     196==== Change Set K: Include UUID and email address in slice and user certificates ====
     197
     198This is a part of Change Set K which we did not discuss at the AM API session.
     199
     200After very brief discussion, we adopted the proposal as on [wiki:GAPI_AM_API_DRAFT#Changes the wiki page], to include the UUID and email address in the subjectAltName of both slice and user certificates.
     201These fields may be in any order, comma separated. As with the use of UUIDs for slices, the user UUID should only be used in conjunction with the user URN. However, user URNs are required to be temporally and globally unique.
     202 
     203For both UUIDs and email addresses, the 'COPY' tag is not supported.
     204
     205Note that the slice and user email addresses are addresses for contacting the responsible party - the slice owner or creator and the user. These may be aliases.
     206
     207==== Change Set F: Define sliver allocation states and state change methods ====
     208
     209At the AM API session, we briefly discussed [wiki:GAPI_AM_API_DRAFT#ChangeSetF:SupportAMandresource-typespecificmethods. Change Set F]: sliver states and the proposed !ActOnSlivers method. Nick Bastin discussed and then circulated an alternative proposal: http://lists.geni.net/pipermail/dev/2012-March/000730.html).
     210
     211At the coding sprint, we had a long and useful discussion on Nick's proposal, which resulted in agreement on a variant of his ideas. This discussion touched on a number of areas.  First there were several overall questions:
     212 - We agreed to use two kinds of states: allocation states, and operational states. We put off discussion of operational states (ie is the node booted), noting however that this is critical.
     213 - We debated whether the API should specify a limited number of states, or allow for aggregate or resource specific states. We agreed that for allocation states, the API should define a limited set of states, while operational states might be more permissive.
     214 - We discussed the pros and cons of including a single all-in-one method to change allocation states, or a single method per desired transition. Rob Ricci noted at least 1 case where there are 2 paths between the same 2 allocation states with very different meaning. As a result, we agreed to use a separate method per allocation state change.
     215
     216The key result of the discussion was agreement on 3 allocation states for slivers and enumeration of methods for transitioning between those states. We did not select names for the states or the methods. Here is a diagram with placeholder labels for methods and states, which illustrates the decisions described below.
     217
     218[[Image(sliver-alloc-states.jpg)]]
     219
     220We spent a long time discussing allocation states of slivers. For reference, we looked at [https://groups.geni.net/geni/attachment/wiki/GEC13Agenda/AMAPIRevisions/JDuerig-AMAPI-TransactionsAndUpdate.pdf Jon Duerig's slides] from the AM API session (unpresented). We finally agreed there are 2 or 3 or 4 allocation states for slivers, depending on how you count.
     221 1. Start (alternatively called 'null' or 'unallocated'). The sliver does not exist. This is the small black circle in typical state diagrams.
     222 2. Allocated (alternatively called 'offered' or 'promised'). The sliver exists, defines particular resources, and is in a sliver. The aggregate has not (if possible) done any time consuming or expensive work to instantiate the resources, provision them, or make it difficult to revert the slice to the state prior to allocating this sliver. This state is what the aggregate is offering the experimenter.
     223 X. ~~Accepted.~~ We chose NOT to include this intermediary state, occasionally called 'accepted', where the experimenter has accepted the aggregate's offer of resources, but the resources have still not been provisioned.
     224 3. Provisioned. The aggregate has started instantiating resources, and otherwise making changes to resources and the slice to make the resources available to the experimenter. At this point, operational states are valid to specify further when the resources are available for experimenter use.
     225
     226Having ruled out the 'accepted' state as unnecessary, we were left with 3 states, the first being the 'null' state. We spent a long time clarifying the semantics of each state, but could not quite agree on names for these states. We took to referring to the states by number, leaving the honor of naming the states to the API documenter.
     227
     228The key change is the addition of state 2, representing resources that have been allocated to a slice without provisioning the resources. This represents a cheap and un-doable resource allocation, such as we previously discussed in the context of tickets. This compares reasonably well to the 'transaction' proposal written up by Gary Wong (http://www.protogeni.net/trac/protogeni/wiki/AM_API_proposals). When a sliver is created and moved into state 2, the aggregate produces a manifest RSpec identifying which resources are included in the sliver. This is something like the current !CreateSlivers, except that it does not provision nor start the resources. These resources are exclusively available to the containing sliver, but are not ready for use. In particular, allocating a sliver should be a cheap and quick operation, which the aggregate can readily un-do without impacting the state of slivers which are fully provisioned. For some aggregates, transitioning to this state may be a no-op.
     229
     230States 2 and 3 have aggregate and possibly resource specific timeouts. By convention the state 2 timeout is typically short, like the {{{redeem_before}}} in ProtoGENI tickets, or the {{{commit_by}}} in Gary's transactions proposal. The state 3 timeout is the existing sliver expiration. If the client does not transition the sliver from state 2 to 3 before the end of the state 2 timeout, the sliver reverts to unallocated. If the experimenter needs more time, the experimenter should be allowed to request a renewal of either timeout.  Note that typically the sliver expiration time (timeout for state 3, provisioned) will be notably longer than the timeout for state 2, allocated.
     231
     232The AM API does not yet have a method for moving from state 2, to state 3. State 3 is the state of the sliver allocation after the aggregate begins to instantiate the sliver. Note that fully provisioning a sliver may take noticeable time. This state also includes a timeout - the sliver expiration time (which is not necessarily related to the time it takes to provision a resource). !RenewSlivers extends this timeout. For some aggregates and resource types, moving to this state from state 2 (allocated) may be a no-op.
     233
     234If the transition from one state to another fails, the sliver shall remain in its original state.
     235
     236These are the only allocation states supported by this API. Since the state transitions are finite, but include potentially multiple transitions between the same 2 states, this API uses separate methods to perform each state transition, rather than a single method for requesting a new state for the sliver. We did not agree on method names for these transitions (we agreed to leave it as an exercise for the API documenter). Logically however these methods are something like:
     237 1. !CreateSlivers moves 1+ slivers from unallocated (state 1)  to allocated (state 2). This method can be described as creating an instance of the state machine for each sliver. If the aggregate cannot fully satisfy the request, the whole request fails. This is a change from the version 2 !CreateSliver, which also provisioned the resources, and 'started' them. That is !CreateSlivers does 1 of the 3 things that it did previously.
     238 2. !DeleteSlivers moves 1+ slivers from either state 2 or 3, back to state 1. This is similar to the AM API version 2 !DeleteSliver.
     239 3. !RenewSomething (name TBD) requests an extended timeout for slivers in state 2 - the allocated but not provisioned state.
     240 4. !RenewSlivers requests an extended timeout for slivers in state 3 - the provisioned state, as before.
     241 5. !SomethingSlivers (name TBD) moves 1+ slivers from state 2 (allocated) to state 3 (provisioned). This is some of what version 2 !CreateSliver did. Note however that this does not 'start' the resources, or otherwise change their operational state. This method only fully instantiates the resources in the slice. This may be a no-op for some aggregates or resources.
     242
     243These states apply to each sliver individually. Logically, the state transition methods then take a single sliver URN. For convenience, we agreed to allow a list of sliver URNs, or a slice URN as a simple alias for all slivers in this slice at this aggregate.
     244
     245Since each method may operate on multiple slivers, each of these methods returns a list of structs as the value:
     246{{{
     247value = [
     248  {
     249   geni_sliver_urn,
     250   geni_allocation_status,
     251   geni_expires <time when the sliver expires from its current state>,
     252   <others AM or method specific>
     253   <Method 5 SomethingSlivers returns geni_operational_status>
     254  },
     255  ...
     256]
     257}}}
     258
     259!CreateSlivers returns a single manifest RSpec, plus the above list of structs.
     260
     261We spent a while discussing what it means for an aggregate to operate on multiple slivers at once. Can the aggregate partially succeed? What will experimenters want? We agreed that aggregates must be consistent across all these methods whether they are all or nothing, or support partial success.
     262
     263These methods all take a new option (aggregates must support it, clients do not need to supply it):
     264{{{
     265   geni_atomic = True/False, default True
     266}}}
     267If true, the client is requesting that the aggregate either fully satisfy the request, moving all listed slivers to the desired state, or fully fail the request, leaving all slivers in their original state.
     268If the aggregate cannot guarantee all or nothing success or failure given the included slivers and resource types, the aggregate shall fail the request, returning an appropriate error code. If this option is false, then some slivers may transition to the new state, and some note. Aggregates must examine the return closely to know the state of their slivers.
     269
     270[FIXME: Is this what we agreed to? Or should the default be False? Should there be a !GetVersion option for advertising the AM default?]
     271
     272'''Note''': !CreateSlivers remains all or nothing (either the aggregate can allocate all desired resources as requested, or the call fails).
     273
     274'''Note''': These calls are synchronous - when they return, the slivers shall be in their final state. In particular, the transition from state 2 to 3 (allocated to provisioned) should be quick. The resource that is now in the 'provisioned' state may take a long time to actually be ready for operational use (e.g. imaging and booting the node) -- this remains true as in version 2 after !CreateSliver.
     275
     276!SliverStatus, where it currently includes {{{geni_status}}}, shall now return {{{geni_allocation_status}}} with one of the above defined values, and {{{geni_operational_status}}}. The values of {{{geni_operational_status}}} are still under discussion.
     277
     278Currently, !SliverStatus returns a single {{{geni_status}}} for the entire slice at this aggregate. With this change, the top-level allocation status for the slice is not defined, and that field is not required.
     279
     280==== Still to do ====
     281
     282We may want to add a method for modifying a set of allocated-but-not-provisioned slivers, something like Change Set E's !UpdateTicket or Gary's proposal's !UpdateTransaction.
     283
     284The allocation states proposal appears to include semantics sufficient to cover transactions, satisfying our desire for 'tickets'. The ProtoGENI team had agreed to float a proposal for such functionality, and should consider what other functionality or modifications to this proposal, if any is required.
     285
     286Note that we still intend to discuss new operational states. These will likely be hierarchical, allowing aggregates to use aggregate or resource specific states within the more general GENI wide states. We will define separate methods for transitioning between operational states and for querying the operational state machine for a particular sliver. These states and methods will be used in particular for making a provisioned resource available for operational use (aka !StartSliver, for booting a node). Nick Bastin's proposal covers this, but needs to be discussed.
     287
     288We still want the ability to add or remove individual slivers from a slice. In particular, the ability to supply a single sliver URN to some of the methods as described above suggests that this would be possible. The ProtoGENI team has agreed to float a proposal for supporting this.
     289
     290We also want the ability to modify an existing sliver, aka !UpdateSlivers. The ProtoGENI team has agreed to float a proposal for supporting this.
     291
     292We noted that these are still changes for AM API version 3. However, we will shortly close down the set of changes for version 3. Further changes will be for a future AM API version 4.