Changes between Version 9 and Version 10 of GEC19Agenda/CodingSprintTutoring

03/26/14 11:22:09 (10 years ago)



  • GEC19Agenda/CodingSprintTutoring

    v9 v10  
    8181== Session Summary ==
     82=== Experimenters ===
     84=== Operations ===
     86=== Developers ===
     87 * '''Topic: MA signed speaks-for credentials'''
     88  - Max Ott proposed during the [wiki:GEC19Agenda/DeveloperTopics Developers Topics] session that in addition to user-signed speaks-for credentials, GENI should also allow MA-signed speaks-for credentials. Max thought this would allow for a better user experience because more of the authorization process could be automated by tools without the user remembering passphrases and following a complicated web UI.
     89  - Tom Mitchell offered to investigate what such a credential would look like and propose an additional authorization strategy that would allow MA-signed speaks-for credential or user-signed speaks-for credentials.
     90  - Aggregate developers would need to add additional logic to allow for MA-signed credentials.
     91  - Rob Ricci noted that the ProtoGENI federation would probably whitelist specific MAs rather than routinely accept all MA-signed speaks-for credentials. Their criteria would be that the MA signs credentials when the user directs and never automatically.
     92  - Further discussion will be handled by a small working group consisting of some aggregate developers and some tool developers.
     93  - Results will be published to
     94 * '''Topic: Scaling experiments'''
     95  - Marshall Brinn introduced a discussion of what tools are available to help an experimenter scale their experiment from small (one or two nodes at one site) to medium (several nodes at a few sites) to large (many nodes and many sites).
     96  - There was agreement that a key to scaling such experiments is the definition of roles that resources play in a given configuration that are somewhat homogeneous. That is, there are "head" nodes and "coordinator" nodes and "compute" nodes and each element of the same role is configured similarly, i.e. with the same software (image or post-boot configured).
     97  - Ilya, Victor and Paul of RENCI suggested that an experimenter think of themselves as the sysadmin of their slice and use tried-and-true sysadmin tool such as [ Puppet] or [ Chef] to install their
     98  - Ilya also pointed out that they use Velocity which is a program that generates scripts or configuration files by instantiating parameterized templates. Not everyone uses Velocity but it is an idea worth considering.
     99  - Max described how !LabWiki uses Ruby code to control experiments which is parameterized by the set of slivers observed or provided, so a well-written script should be able to handle different topologies of different scopes.
     100  - Generalizing from some of these pieces, Rob suggested that the best way to build a scalable configuration was to write code that takes a set of scaling parameters and generates the appropriate artifacts for that scale. The program could write an RSpec, plus some configuration files (which nodes talk to which services) plus some experiment coordination files. In that way, the important thing that gets saved is really this generation code, and the artifacts can be easily regenerated at different scales and, as needed, distributed.
     101 * '''Topic: Long-lived slices'''
     102  - An agreement was reached between slice authority developers and aggregate manager developers that long-lived slice credentials would be issued by slice authorities by adding a field indicating the long-lived duration (in days) to the existing slice credential format. Aggregates in the GENI federation will honor this additional field. Aggregates in other federations like ProtoGENI may require additional credentials before granting extended slice lifetimes (for example, by issuing their own credentials in the same format after proper AM-local authorization). See the [wiki:GAPI_AM_API_DRAFT#ChangeSetT:LongLivedSlices long-lived slices changeset proposal].
     103  - Long-lived slice credentials always involve an out-of-band process where people talk to people to get permission for long-lived slices. In the case of a slice authority, an experimenter would have to request permission of the slice authority administrators to enable the long-lived property on a slice.
     104 * '''Topic: Stitching'''
     105  - Ads should reflect what VLANs and bandwidth have been reserved
     106   - InstaGENI and ExoGENI say that is there now. It is not supported at MAX or ION.
     107  - VLAN tag negotiation should be supported, including APIv3 `Allocate` and `Provision`
     108   - Aggregates must accept a suggested tag of `any` and then pick a tag from the requested available range, reserving that tag temporarily and reporting the actual available range.
     109   - Rob noted that this is required only in general. In GENI today, all circuits connect via from a rack to ION or MAX that do VLAN translation, or connect two InstaGENI racks (which will coordinate a free VLAN tag among themselves). The result is that if the tool allows the endpoint VLAN producer aggregates to select any tag from a range, no VLAN tag negotiation should be required.
     110   - Jon Duerig agreed to implement support for this in InstaGENI aggregates
     111   - Aaron Helsinger agreed to implement this in stitcher
     112  - Aggregates should use common error codes for common errors
     113   - Use `VLAN_UNAVAILABLE` (code `24`) when you know that is the problem
     114   - Use `INSUFFICIENT_BANDWIDTH` (code `25`) when that is the issue
     115   - Use `SERVERERROR` (code `5`) when something internal to the server went wrong
     116   - Use `BADARGS` (code `1`) if the request is malformed in some way
     117  - The group agreed to update to [ version 2 of the stitching schema] by May. This requires coordination among the SCS, stitcher, and the aggregates.
     118  - Inconsistent expirations of parts of a circuit: Because of different AM local policies, parts of a stitched sliver may have different expirations.
     119   - Without an AM to AM protocol (as in chain mode), the tool must coordinate - as with other resources, except that VLANs are a scarce shared resource
     120   - Tools should extract the sliver expiration (from the return struct in APIv3, from the `expires` RSpec tag in v2 or from `SliverStatus`). Then tools can warn the user, or try to renew slivers to a consistent time.
     121    - Note that ION and MAX support shortening a reservation but most aggregates do not. The tool may need to renew resources until the maximum of the non ION/MAX sliver expirations.
     122  - Stitch to all nodes at an aggregate (not just a single node)
     123   - This works at InstaGENI, with a few caveats:
     124    - You must explicitly bind 2+ nodes to different physical interfaces, to ensure you get a VLAN
     125     - You can do this without binding the VMs to specific VM servers by giving the node interface the component ID `eth1` and `eth2` (an incomplete URN)
     126     - The SCS requires a patch before it can handle this
     127    - Currently, you cannot make such a LAN stitched later - you must request the stitched circuit at the same time, so that the VLAN tag comes from the pool of stitched VLAN tags.