wiki:GEC22Agenda/DeveloperRoundtable

Version 16 (modified by Aaron Helsinger, 9 years ago) (diff)

--

GEC22 Developer Roundtable

This is an informal session for GENI developers to discuss details of software integration, and software issues that affect multiple control frameworks or tools. Note that this session is separate from the parallel experimenter and operations drop-in session.

Schedule

Thursday, 9.00am - 10.30am & 11.00am - 12.30pm

Session Leaders

Tom Mitchell
GPO
Aaron Helsinger
GPO

Agenda / Details

This software development session provides an opportunity for GENI developers to collaborate informally. Topics are TBD based on topics raised by the GENI developers in attendance. The agenda is currently in discussion on the dev at geni.net mailing list. Candidate topics are detailed below.

Cross Slice Stitching

We had an interesting conversation about this topic at GEC21. To run a service in a slice, like Choice Net or other FIA architecture, or VTS, requires connecting multiple slices. Today, that requires using shared VLANs. Is there a better way?

  • Does a slice provision a special 'port'?
  • Does a slice that wants to accept connections do something at reservation time? After the resources are provisioned?
  • How are clients authorized to connect?
  • Note that some kinds of cross slice connections are possible now, but somewhat difficult.

Cross Testbed Federation

The Open MultiNet discussions covering GENI/FIRE collaboration, translating RSpecs and NDL based ontologies, and related topics could be a substantive conversation. Check out http://open-multinet.info/.

  • On what level should testbeds federate?

Resource Queries

The AM API allows tools to ask for everything that an aggregate has (optionally filtered to what is currently available). But it provides no standard way to ask for only what the tool wants; the tool must parse the full response to find what it wants. This capability would better support tools trying to help with topology embedding.

  • Could we support an option (e.g. geni_query) in some query language (SPARQL?), allowing aggregates to support richer resource queries?
  • Should this be a separate and optional service, or part of the AM API?
  • How would an aggregate advertise the ontology or internal namespace it uses, to allow formulating reasonable queries?

Note that this is a small piece of a potential other topic: allowing RDF format RSpecs. But that larger topic has gotten some push-back.

Update Regression Tests

Sometimes after software or system updates, existing pieces of basic functionality are broken. Are there basic regression tests that we can do as part of developing any updates and during upgrades to ensure that experimenters don't have to do our regression testing? Are there automated test frameworks we can use to do these tests?

  • test all AM API functions
  • test basic disk images
  • test standard RSpecs
  • ?

ListResources without authorization

Currently the AM API ListResources call requires both a valid certificate (authentication) and a user credential (authorization). To the extend to which Advertisements are not user specific, the credential is arguably not required. If we made the credential optional, then tools (with a valid and trusted GENI certificate) could pre-fetch Advertisements for use by multiple users.

Dynamically add bandwidth

For initial experiments, a small topology is appropriate for stitched links. But later experimenters may want to use a larger capacity link. Could we support an AM APIv3 PerformOperationalAction command to increase the capacity of stitched links? Do the networks support changing capacity without reprovisioning the links?

Session Summary

At the Developer’s Roundtable there was excellent representation and participation from developers associated with many of the core GENI projects. The first and largest discussion topic was cross slice stitching. Several options were discussed and approaches were narrowed. Both ExoGENI and InstaGENI developers have paths they will investigate to expose this capability to experimenters in the near future.

Other discussion topics included cross testbed federation, regression testing, resource queries, public resource advertisements, and advertising available bandwidth.

Detailed Summary

Attendees included developers representing among others ProtoGENI/InsaGENI, ExoGENI, GENI Desktop, GIMI, jFed, FOAM/VTS, GPO, and the SCS (from the University of Utah, RENCI, University of Kentucky, University of Massachusetts-Amherst, iMinds, Barnstormer Softworks, BBN Technologies, and University of Maryland / MAX).

Stitch to LAN

A past topic of discussion has been the ability to stitch together LANs at 2 different aggregates into a single broadcast LAN, allowing more than one node at an aggregate to communicate out a single stitched VLAN. This capability works today in ExoGENI when reserving through the ExoSM or ExogENI APIs. Ilya Baldin of RENCI/ExoGENI says that this ability will be exposed through the GENI AM API soon.

Rob Ricci of the University of Utah / InstaGENI sees no reason why this cannot be done at their aggregates, and will plan to make this enhancement.

A related topic is the ability to stitch to an existing shared VLAN from outside the aggregate. However, this requires that the same VLAN tag number be available on the extrernal interface, so is unlikely to work at InstaGENI.

Note however that you can attach nodes to an existing shared LAN.

Cross Slice Stitching

We next discussed cross slice stitching; connecting 2 slices at layer 2. As background, note that we have the notion of a 'shared VLAN'. This is a VLAN that gets a name and that is marked public, allowing anyone to connect to it. At InstaGENI, there is a PerformOperationalAction to convert a newly allocated LAN into a shared VLAN.

Brecht Vermeulen and Paul Ruth noted that currently you can allocate a VM to act as a bridge between a stitched link and a local LAN.

Niky Riga notes that currently when you share a LAN, the bandwidth on the original LAN is not enforce on other slices that connect to the LAN. Rob Ricci replied that this is difficult to fix correctly, but they can set the bandwidth limit from the original LAN on the new connection. Nick Bastin noted that AL2S is best effort, so bandwidth settings are fairly meaningless. Ilya Baldin noted that at ExoGENI, you won't get more bandwidth than you request, but AL2S might give you less than you requested.

We discussed support for shared VLANs at ExoGENI. Currently, ExoGENI does not offer this functionality. However, Ilya Baldin noted that ExoGENI is working on the sliver-modify function. ExoGENI will expose this as PerformOperationalAction (without implementing the rest of AM API version 3). With support for POA, ExoGENI can expose the ability to reboot a node (geni_restart), modify the bandwidth on a link, add an interface to a node, add a post boot script, etc. Additionally, this could be how ExoGENI adds support for sharing VLANs.

Rob Ricci and Nick Bastin of Barnstormer Networks agreed on a plan to support at InstaGENI/ProtoGENI a 'restricted shared VLAN'. In this model, a "server slice" would make a call on the local aggregate to tell the aggregate that it is willing to accept "client slice" connections on a local LAN that belongs to the server slice. This is in fact just the existing PerformOperationalAction to make a LAN into a 'shared VLAN'. The change proposed here is that the POA would add an additional option through which the client could optionally ask for a 'restricted' shared VLAN, and then the client would be required to supply a credential in CreateSliver requests in order to make a connection to the server.

Ilya Baldin noted that connecting 2 slices is an operation at a single aggregate, where 1 slice asks to connect to a 'server' slice, which has to authorize the connection in advance. Mechanically, the connection could be node-to-LAN or LAN-to-LAN. We noted that this can be boiled down to the client/server model Rob and Nick proposed.

Proposal: Add to the existing POA geni_sharelan a new option restricted with default value false (old behavior). When true, the created shared VLAN requires a new credential when requesting a connection to this new Shared VLAN. The POA method will return in this case a GENI SFA credential with owner <user calling the method> and target <sliver of the shared VLAN, or the shared VLAN in some way; contents are not specified but should be sufficient for the aggregate to authorize the call>.

Note that shared vlan names are scoped within the AM and must be unique within the AM.

The server slice aggregate manager (the AM at which the shared VLAN was created) should include the shared vlan (whether restricted or not) in the advertisement RSpec for the aggregate, indicating if this LAN is shared or not. To do so, we need a new attribute on the shared-vlan RSpec extension (or a new extension).

Proposal: add a new optional attribute to the existing shared-vlan extension restricted with type xml:boolean and default value false.

Slices desiring to connect to this restricted shared VLAN negotiate with the service slice. The service slice must delegate the shared VLAN credential to the client slice user. Then the client slice user must include this extra credential in the call to createsliver or allocate. The aggregate therefore receives 5 key pieces of information:

  1. the client slice (URN is explicit argument to createsliver),
  2. the place in the client slice where the connection should be made (in the request RSpec),
  3. the server slice (may be in the additional credential, or else should be implicit given the owner of the shared VLAN),
  4. the sliver ID of the shared VLAN (either explicitly in the credential or implicitly in the name of the shared VLAN in the request RSpec), plus
  5. explicit authorization by the owner of the server slice that this client slice can connect to this restricted shared VLAN.

The aggregate can then create a LAN for the client slice that connects to the specified shared VLAN, allowing traffic to flow freely between the two slices.

Note that there could also be an additional PerformOperationalAction command to modify an existing 'client' slice to connect a LAN belonging to that slice to one of these 'restricted shared VLANs'. We did not specify the syntax for such an operation, but Ilya suggested ExoGENI might support such an operation.

Ezra Kissel of Indiana University and Niky Riga of the GENI Project Office requested that there be a way to request a shared VLAN as part of the initial request RSpec. The goal is to make it easier to get a shared VLAN, but also to avoid getting 2 VMs on the same host so that the link is trivial and not sharable.

Rob Ricci replied that while this is possible, this is a larger change and therefore not something he is willing to commit to do at this time.

However, Brecht Vermeulen of iMinds / jFed / Fed4Fire noted that there is an InstaGENI RSpec extension that allows you to request that a link be VLAN tagged, making it a non trivial link, spreading the VMs across 2 physical hosts, forcing the link to go via the hardware switch (and therefore it can be OpenFlow controlled), but also making the link sharable.

To use this, include

<vlan_tagging xmlns="http://www.protogeni.net/resources/rspec/ext/emulab/1" enabled="true"/>

Actions:

  • Experiment Support will document this option in a wiki page.
  • Jacks should add support for this option.

Marshall Brinn asked how a 'server' slice could identify and distinguish among the traffic from different 'client' slices, when all traffic arrives on the same shared LAN?

Rob Ricci proposed that the aggregate manager could sign the manifest RSpec of the 'client' slice, and this could be passed to the 'server' slice. Rob proposed that the hand-off of this signed manifest would be done directly between the two slices, not mediated by the aggregate. Additionally, Rob proposed that ProtoGENI / InstaGENI would begin signing all manifest RSpecs as returned by all AM API calls. (Note that this change will need to be tested to ensure compatibility with existing tools. The XML-DSIG signature is a new child element under rspec, so the resulting signed XML passes rspeclint and can be loaded in Jacks.)

Since this discussion, ProtoGENI has begun implementing this feature, and a formal proposal has been posted to the Draft AM API changes wiki.

Cross Federation Policy and Integration

There are increasing movements to formalize the relationship among the various federations and testbeds. Specifically, this includes providing mechanisms for testbeds and aggregates to apply both federation level policies (differing by which federation a user comes from) and local policy.

Rob Ricci: Eventually CloudLab will need federation level policies - using which MA a user comes from. For now, this will be a case-by-case thing.
Ilya Baldin: Jeff Chase of Duke is working on more authorization policies for Orca and ExoGENI.
Rob: GENI funding is dwindling. GENI should stick with fixing problems we have.
Jim Griffoen: Local AMs should be allowed to have local policies. As the number of users grow, this may become an issue.

Sarah Edwards: There is no monitoring API to know where an experimenter has reservations.
Heidi Dempsey: Monitoring does this. The University of Kentucky team has added some needed pieces. We'll be trying to get that out.

Decision: No action.

ListResources without a User Credential

Tools would like to tell users what resources are available where. To do so currently, requires that the tool use the user's certificate to get a user credential, with which to query the aggregates. Tools cannot get their own credential, and aggregates requires a credential (per the AM API).

Rob Ricci: We will make the user credential optional. We may even try supporting ListResources without a certificate.
Ilya Baldin: We can make the user credential optional.
Jim Griffoen: What about a service to store all such information?
Ilya: We built a service like that based on XMPP: you can subscribe to slices or AMs.

Since then, this change to the AM API has been documented on the GENI Wiki: http://groups.geni.net/geni/wiki/GAPI_AM_API_DRAFT#ChangeSetAD:DonotrequireausercredentialinListResources

Action: Modify all aggregates such that ListResources does not require a user credential when called outside of a slice context.

One source of stitching failures is insufficient available bandwidth. If users and tools knew what bandwidth was currently available, they could tailor requests accordingly.

Ilya Baldin: Having this information in the Ad may be an issue with the RSpec converter. However, this information is relatively useless; it becomes stale almost immediately.
Niky Riga: This would still be better than nothing.
Rob Ricci: This is almost certainly something we can do.

But the full answer is monitoring.

Action: ExoGENI and InstaGENI teams will make the actual current available bandwidth available in the Ad RSpec when the geni_available option is supplied to ListResources. This information should be updated in both the main body link and in the stitching extension.

Shared VLAN Name in Manifest

Thijs Walcarius of jFed/iMinds raised the issue that in the manifest for a slice whose LAN has been made shared using PerformOperationalAction, the shared LAN name is not present in the manifest. And this would be very useful.

Rob Ricci replied that they would discuss this internally at InstaGENI.

Regression Testing at Aggregates

Niky Riga introduced the topic: In the operations session, the rack teams indicated that often they rely on experimenter bug reports to identify problems. But experimenters aren't good testing.
Ilya Baldin noted that they could do good testing if they had more funding.
Niky: What about doing regression testing for things that we've run into before? What about using jFed testing?
Ilya: The # of scenarios to test is very large. We have a regression suite.
Sarah Edwards: Start by testing the tutorial RSpecs.
Jim Griffoen: Load is a problem, and is harder to test.
Tom Mitchell: We test using Selenium Grid.
Heidi: This isn't a GENI thing. And this isn't a developer specific problem. We should use jFed and the University of Kentucky. Tell us the use cases to test.
Jim G: There is a verification function in GENI Desktop to check interfaces. There is no script to launch the verification, but could be.

Thijs Walcarius: In jFed, if an automatic test fails for a couple days in a row we want to notify users.

Decision: GPO Experiment Support support will gather use cases, starting with their standard tutorials, and work with the jFed team to have these regularly tested, notifying key players of failures.

Query Aggregate Resources

Aaron Helsinger introduced the topic.

You can ask an aggregate for all resources, and then parse the response. Or with APIv3, you can just try your reservation and see if it works. But it would be nice to be able to let the aggregate parse it's own resource listing to find for you the resource you want. Marshall Brinn of the GPO has prototyped doing this using RDF and shown that the AM API supports this.

Marshall asks: Is there a demand?
Brecht noted: If you are dealing with simple resources, like nodes & links, then a query mechanisms is overkill. But if you start talking about different kinds of sensors, then parsing the Ad is a pain. European testbeds have more special devices. This is why we're trying to do this kind of research.

The variety of resources available in GENI may increase with time. The group decided that GENI should follow what the Europeans do.

Other Discussion Points

Update: ExoGENI could support this if an AM API interface was put on top of their existing slice_modify and sliver_modify functionality.

Sarah: no tool supports adding resources to an existing slice. That'd be nice.
Aaron: Omni supports Allocate and Update, to the extent it is supported by aggregates.

Hussam: In GENI Desktop, users can add a little topology to an existing slice by doing allocate/provision/poa.

Niky: Can we update the OpenFlow controller on a switch?
Ilya: We could pass the information to Flowspace Firewall.
Heidi: I doubt they support modifying the controller
Ilya: It could work if you restart the slice; I don't know the size of the disruption.

Ezra Kissel noted that it is still hard to have standard images cross rack types. InstaGENI image requirements are complicated. Without installing InstaGENI software in the image, you don't get InstaGENI functionality.