wiki:GEC19Agenda/CodingSprintTutoring

Version 12 (modified by griff@netlab.uky.edu, 10 years ago) (diff)

--

GEC19 Experimenter Drop-in, Operations Tutoring, Developers Coding Sprint

This is an informal session help session for:

  • Experimenters: Get help with your experiments, get answers to your questions about GENI, etc.
  • Network operators/campus IT: Get answers to your questions about GENI, GENI Operations, etc.
  • Software developers: Discuss details of software integration, software issues that affect multiple control frameworks or tools, etc.

Schedule

Tuesday, 1:30pm - 3:30pm

Session Leaders

Experimenters Operations Developers
Sarah Edwards
GPO
Niky Riga
GPO
Josh Smift
GPO
Aaron Helsinger
GPO

If you have any questions or comments before/after the tutorial, please find one of us!

Experimenters: Niky Riga, Sarah Edwards, GENI Project Office

Operations: Josh Smift, Chaos Golubtisky, GENI Project Office

Developers: Aaron Helsinger, GENI Project Office

Agenda / Details

Experimenters

The GPO will also hold a tutoring session for GENI experimenters, where experimenters can come and work on various online GENI tutorials. The list of available tutorials will be advertised in advance of the GEC. The GPO will answer questions and provide assistance. Experimenters with an idea they want to work on are encouraged to come do so in the same room, so that they can have support from the on site GPO staff and other experimenters.

Operations

GPO operations staff will also be available to help GENI ops/admin folks who'd like a hand getting set up with GENI credentials, the Omni command-line tool, etc. This can be useful if you're running a GENI aggregate, so you can create a slice and some slivers, and test for yourself whether the aggregate manager is working like you expect. We'll also be happy to discuss any other topics of interest to the ops/admin community.

Developers

This software development session provides an opportunity for GENI engineers to collaborate in real time on a particular software or documentation issue. The topic(s) will be selected well in advance of the conference, based on need and key party availability. Expected topics may include:

  • The GENI Aggregate Manager API: Adopt some proposals that have already begun to be implemented (N: Add AM type to GetVersion, O: Change Sliver name legal characters, P: Speaks For, Q: Update Users and SSH keys)
  • The GENI Aggregate Manager API: Support Long Lived Slices (Change Proposal T)
  • Stitching
    • Common error codes and typical error messages
    • Advertising actual availability of VLANs and bandwidth
    • Support for VLAN tag negotiation - through AM API v3, or a part of it?
    • Tool support for discovering possible circuits
    • Stitching all machines on a LAN at an AM to the same VLAN (aka Stitching to Aggregates)
    • Service for client tool workflow support (make the stitcher tool a service)
    • Stitch points (stitching to arbitrary non-GENI resources)
  • TSM; Tools to Scale Up Experiments
    • How can we help experimenters start with small experiments and grow from there?
  • The GENI Aggregate Manager API: Advertise the aggregate URN(s) in GetVersion (change proposal S)
  • Updates on other topics raised at the GEC 18 coding sprint, like long lived slices and updating SSH keys on nodes
  • Other topics that come up during the GEC or that participants in the room want to discuss
  • Updates and issues on the roll out of Speaks For to aggregates, tools, and clearinghouses, including use of ABAC credentials
  • An update on the state of the Clearinghouse API and implementations at the GENI Clearinghouse, ProtoGENI, and support in Europe

Session Summary

Experimenters

Several experimenters came and worked on their experiments and hands on tutorials.

Operations

Developers

  • Topic: MA signed speaks-for credentials
    • Max Ott proposed during the Developers Topics session that in addition to user-signed speaks-for credentials, GENI should also allow MA-signed speaks-for credentials. Max thought this would allow for a better user experience because more of the authorization process could be automated by tools without the user remembering passphrases and following a complicated web UI.
    • Tom Mitchell offered to investigate what such a credential would look like and propose an additional authorization strategy that would allow MA-signed speaks-for credential or user-signed speaks-for credentials.
    • Aggregate developers would need to add additional logic to allow for MA-signed credentials.
    • Rob Ricci noted that the ProtoGENI federation would probably whitelist specific MAs rather than routinely accept all MA-signed speaks-for credentials. Their criteria would be that the MA signs credentials when the user directs and never automatically.
    • Further discussion will be handled by a small working group consisting of some aggregate developers and some tool developers.
    • Results will be published to dev@geni.net
  • Topic: Scaling experiments
    • Marshall Brinn introduced a discussion of what tools are available to help an experimenter scale their experiment from small (one or two nodes at one site) to medium (several nodes at a few sites) to large (many nodes and many sites).
    • There was agreement that a key to scaling such experiments is the definition of roles that resources play in a given configuration that are somewhat homogeneous. That is, there are "head" nodes and "coordinator" nodes and "compute" nodes and each element of the same role is configured similarly, i.e. with the same software (image or post-boot configured).
    • Ilya, Victor and Paul of RENCI suggested that an experimenter think of themselves as the sysadmin of their slice and use tried-and-true sysadmin tool such as Puppet or Chef to install their
    • Ilya also pointed out that they use Velocity which is a program that generates scripts or configuration files by instantiating parameterized templates. Not everyone uses Velocity but it is an idea worth considering.
    • Max described how LabWiki uses Ruby code to control experiments which is parameterized by the set of slivers observed or provided, so a well-written script should be able to handle different topologies of different scopes.
    • Generalizing from some of these pieces, Rob suggested that the best way to build a scalable configuration was to write code that takes a set of scaling parameters and generates the appropriate artifacts for that scale. The program could write an RSpec, plus some configuration files (which nodes talk to which services) plus some experiment coordination files. In that way, the important thing that gets saved is really this generation code, and the artifacts can be easily regenerated at different scales and, as needed, distributed.
  • Topic: Long-lived slices
    • An agreement was reached between slice authority developers and aggregate manager developers that long-lived slice credentials would be issued by slice authorities by adding a field indicating the long-lived duration (in days) to the existing slice credential format. Aggregates in the GENI federation will honor this additional field. Aggregates in other federations like ProtoGENI may require additional credentials before granting extended slice lifetimes (for example, by issuing their own credentials in the same format after proper AM-local authorization). See the long-lived slices changeset proposal.
    • Long-lived slice credentials always involve an out-of-band process where people talk to people to get permission for long-lived slices. In the case of a slice authority, an experimenter would have to request permission of the slice authority administrators to enable the long-lived property on a slice.
  • Topic: Stitching
    • Ads should reflect what VLANs and bandwidth have been reserved
      • InstaGENI and ExoGENI say that is there now. It is not supported at MAX or ION.
    • VLAN tag negotiation should be supported, including APIv3 Allocate and Provision
      • Aggregates must accept a suggested tag of any and then pick a tag from the requested available range, reserving that tag temporarily and reporting the actual available range.
      • Rob noted that this is required only in general. In GENI today, all circuits connect via from a rack to ION or MAX that do VLAN translation, or connect two InstaGENI racks (which will coordinate a free VLAN tag among themselves). The result is that if the tool allows the endpoint VLAN producer aggregates to select any tag from a range, no VLAN tag negotiation should be required.
      • Jon Duerig agreed to implement support for this in InstaGENI aggregates
      • Aaron Helsinger agreed to implement this in stitcher
    • Aggregates should use common error codes for common errors
      • Use VLAN_UNAVAILABLE (code 24) when you know that is the problem
      • Use INSUFFICIENT_BANDWIDTH (code 25) when that is the issue
      • Use SERVERERROR (code 5) when something internal to the server went wrong
      • Use BADARGS (code 1) if the request is malformed in some way
    • The group agreed to update to version 2 of the stitching schema by May. This requires coordination among the SCS, stitcher, and the aggregates.
    • Inconsistent expirations of parts of a circuit: Because of different AM local policies, parts of a stitched sliver may have different expirations.
      • Without an AM to AM protocol (as in chain mode), the tool must coordinate - as with other resources, except that VLANs are a scarce shared resource
      • Tools should extract the sliver expiration (from the return struct in APIv3, from the expires RSpec tag in v2 or from SliverStatus). Then tools can warn the user, or try to renew slivers to a consistent time.
        • Note that ION and MAX support shortening a reservation but most aggregates do not. The tool may need to renew resources until the maximum of the non ION/MAX sliver expirations.
    • Stitch to all nodes at an aggregate (not just a single node)
      • This works at InstaGENI, with a few caveats:
        • You must explicitly bind 2+ nodes to different physical interfaces, to ensure you get a VLAN
          • You can do this without binding the VMs to specific VM servers by giving the node interface the component ID eth1 and eth2 (an incomplete URN)
          • The SCS requires a patch before it can handle this
        • Currently, you cannot make such a LAN stitched later - you must request the stitched circuit at the same time, so that the VLAN tag comes from the pool of stitched VLAN tags.
  • Topic: Swapping Slices Out
    • Emulab had a facility to "swap and experiment out" so that its resources could be reused by other experiments
      • Running applications/service in the experiment would be lost.
      • However, the experiment could be swapped back in at a later date.
      • After swapping in, all the resources and files would be restored to their pre-swap status.
      • You were not guaranteed to get the same resources again.
    • A similar facility would be nice for GENI slices
      • Swapping in/out is nice if the user is not working in their slice continually
        • Setting up an experiment is often time consuming. Being able to restore the state of a swapped experiment can save a lot of setup time.
      • GENI users are often frustrated (and surprised) by the fact that their slice expired and deleted all their resources (and files)
    • Issues: (While everyone saw the value of such a service, several issues were raised).
      • What entity is resposible for swapping the slice out? What entity is responsible for swapping a slice in?
      • Where will the state be stored while it is swapped out.
      • Many experiments require specific resources (or resources from specific aggregates). What happens if those resources are not available at swap in time?
      • How does the swap in process know how/where to restore state?
      • We can't restart processes that were running, can we?
      • To prevent a user from loosing state when their slice is about to expire, should the system automatically swap it out for them?
    • Alternatives:
      • Can the user just save their state away? Because running processes/services do not survive swap out/in, a GENI swap mechanism would only be saving the state which the user can do themselves.
      • Can we write scripts that save user state away for users?
      • Instead of swapping a user's experiment before it expires, why not just extend the expiration time? (This defeats the purpose of an expiration time).
    • No conclusions were reached and the topic was tabled for now.