wiki:ClusterBProjReview20090630Notes

Version 1 (modified by Christopher Small, 15 years ago) (diff)

--

PlanetLab (Larry Peterson)

Main modues from SFA are Slice Manager Interface, Registry Interface, O&M Interface. First two will be standardized, it's thought that the third will be defined per-aggregate. Two interfaces, currently xml-rpc, going to support WSDL/SOAP.

Slice Facility Interface (sfi) calls into Slice Manager, Registry. Behind Slice Manager will be some number of aggregates, right now PlanetLab is the only aggregate. Skeletal version for anyone who has a PlanetLab codebase aggregate. A interface is same as Slice Manager interface. If you're running a MyPLC-based interface their code will run in top of it. Collectively based on the slice-facility architecture.

Two versions: SFA-full and SFA-lite. SFA-lite is SFA-full minus credentials, so you can build an aggregate that depends on the front-end having authenticated the caller.

Have SFI, Slice Manager, Registry, Aggregage Manager. WSDL interface, and a lite version (from Enterprise GENI).

Minimal ops to implement for Aggregate Manager interface for your Aggregate -- GetResources() returns RSpecs, CreateSlice() allocates slice, DeleteSlice(), ListSlices(). Plus a minimal RSpec; can be aggregate specific (basically an XSD, basically what you need as an argument to CreateSlice()).

John Hartman What's the deal on SFA lite?

You trust the Slice Manager and Registry to do security.

PLC API is supported by each component manager. Aggregate (PLC) has back door into component, using a private interface.

Ticket call and tickets are implemented; usually go though CreateSlice(), not GetTicket().

Chip Elliot How do you create topologies (e.g. for GpENI, MAX)?

Aggregates return a set of RSpecs, which are opaque to Slice Manager. User picks from the set of RSpecs, Slice Manager passes them back down. Larry's assumption is that the RSpecs will have the information about how to create a topology. You'll go and edit the XML file (either directly or via a tool) to put together the RSpec set that you want.

Guido Appenzeller FlowSpecs will need some opt-in description of how this aggregate connects to the internet at large.

We don't have opt-in at the moment.

The set { sfi, Slice Manager, Registry } is the clearing house, from my perspective. It's the portal that the user sees. There is no central record of slices -- you can talk to an aggregate directly, so there is no single point where you can find a list of all allocated slices.

Aaron Falk Clearinghouse is supposed to have ther definitive record of who did what when; it's the place where researchers and resources meet.

Keep in mind that aggregates can hand out resources independently, so there is no definitive record of who did what (and who is doing what).

As a first approximation we are in the busines of debugging aggregates and our code as aggregates plug in to the framework.

VINI is an ongoing MRI (?) project. Has developed a new kernel, uses the PlanetLab control framework. PlanetLab will eventually adopt the VINI kernel. The VINI kernel lets people own their own IP stack, available on PlanetLab within the next three months.

Outside the people in this room, only the Europeans (PlanetLab Europe) are using the geniwrapper interface.

I don't understand which resources are "GENI resources". There are resources provided by aggregates, and agreements between users and aggregates -- and possibly peering agreements between organizations controlling resources, etc. But GENI doesn't own anything itself.

Aaron Falk Looking at your milestones, they are marked as complete, looks good.

Chip Elliot As we review milestones they may look archaic because we drew them up a year ago.

Aaron Falk Want layer 2 connections available to researchers ("Establish direct connectivity between project facilities and Internet2").

Heidi Picher Dempsey Intent was you could specify end-to-end VLANs on Internet2.

We'll provide end-to-end IP, based on last week's meeting.

Chip Elliot Can you stitch the dataplane so you can run non-IP traffic?

It may be a layer 2 connection, but it'll be tunneled over IP.

Chip Elliot Two major goals for spiral 1: control framework and cross-aggregate slices, and end-to-end layer 2 VLANs.

Aaron Falk Another topic Heidi and Aaron talked about; desire to keep control plane for GENI on GENI resources. Control plane will be over IP; if control plane stops working, everything falls over.

Access to control plane is via IP. Resources underneath, depends on how you connect to it.

Heidi Picher Dempsey We had hoped that all of the aggregates would be one hop away from Internet2, and run everything over GENI.

This strikes me as overambitious and possibly counterproductive requirement.

Guido Appenzeller We currently have less faith in our infrastructure than in the internet. I'd rather run my control traffic over the internet.

Chip Elliot Do you envision Princeton being connected to Internet2 and/or NLR at layer 2?

It's out of our control, but it's possible.

Chip Elliot GPO would prefer that everyone went direct.

Rest of the milestones seems reasonable.

GUSH (Jeanne Albrecht)

Couple of students working this summer. Starting to work with GENI wrapper, have worked with PLC interface for a while. Another student working on a GUI. Teaching distributed systems in the fall, getting students to use it.

Aaron Falk Do you have users outside?

Yes, and it's taking up a lot of time. GUSH is a giant C++ package at the moment, takes some real work to build it, so we provide statically compiled binaries that may or may not work on a given platform. Have PLUSH and GUSH users, wish I could get them working together. Looking at other resource types to connect to -- sensors, wireless, DTN. We've done visualization in the past, it'd be some work to pull it together.

Larry Peterson VINI uses an rspec for topology

OK, Gush can handle this.

Aaron Falk Is there a visualization engine -- maybe from Indiana -- for network topologies that GUSH can use?

Peter O'Neil Tool called Atlas.

Google Maps API is somewhat restrictive (complex?) and ever-changing, so it's a pain to keep up with it. Right now the GUI is a simple OpenGL app, you can connect to a GUSH and view remotely.

Vic Thomas Jeannie's milestones and status look great.

Aaron Falk All projects need to cooperate with the GMOC and the security project. They are doing data collection. It's helpful to provide to users a view into the health of the GENI system as a whole, so gathering statistics, exporting some operational state is a good thing.

MAX/MANFRED (Peter O'Neil)

A lot of our work over the course of the year is to keep the GENI effort in sync with our other efforts, I2 DCN, GLIF, clusters, etc.

Chip Elliot The GPO's view is that we'd like to fit into that bigger picture.

We had as a mileston expectation to be able to do things at an optical level. Expectation that we'd be able to set up VLANs at a wave level. Technology doesn't support this yet, we're too early -- wavelength selectable switches are just now becoming available, not yet really affortable, so we couldn't do it. John Jacob understood; we're not going to get dinged for it, it's still on our schedule, just pushed out. We have a number of PlanetLab nodes running on our metropolitan network (both owned by MAX and others in the area we serve). We have not done any outreach to the organizations providing these nodes.

Chip Elliot Now that you have some experience with it, does the notion of an aggregate make sense?

Basically, yes.

James Sterbenz Do you have something you consider a unified PlanetLab DCN aggregate?

Chris Tracy and Jarda have running code, you're probably interested in getting at it.

Chip Elliot A hypothesis from a year ago was that you could instantiate virtual machines and provision guaranteed bandwidth between them. You're close to this, right?

We are, soon.

James Sterbenz Bandwidth guarantees are at a VLAN level? Our switches don't support bandwidth caps, and we've already bought them, can't afford to buy fancier ones. We can do best effort, though.

Jon Turner At each site we have some spare ports on the SPPs; it makes sense to connect them to ProtoGENI, but they don't have any spare ports. They already have access to the wave. If it takes adding an extra module to those switches to increase the number of ports, can we shake the money loose?

Heidi Picher Dempsey We already did that once, if we do it again we'll be over budget. And we've pushed back twice, we need to do this soon or we won't get anything running by the end of the spiral.

Level3 is increasing prices, have limited cross-connects without having to go out and pay real money to get more. Monthly fees associated with each cross-connect.

Heidi Picher Dempsey End-to-end milestones have priority.

Chip Elliot Ideal would be by Oct 1 for GpENI, MAX, and SPP interoperating.

Chip Elliot Now that you've got PlanetLab integrated with your control framework, can we start doing this across the world? Say, optical people in Europe?

We have people using the code base (not the GENI part) around the world. We're working on getting more people using it, working on documentation.

Chip Elliot JGN2 and PlanetLab Japan would be another good place to pull things together at Spiral 2.

"GRMS" on the milestones is an anachronism, it isn't what we're doing.

We're actually good on the two milestones due, need to do the documentation. We're ready to start integration.

Aaron Falk Are you still OK with getting a service up and running on 09/01?

Yes, I think so. And our PlanetLab nodes will become public -- there are four or five. They are not currently public, we're the only ones who can use them (ISI East and MAX staff).

Chip Elliot We believe the way it should work is that an aggregate should affiliate with a clearinghouse.

Larry Peterson Federated aggregates using the PlanetLab control framework are allowed to say no, you can't get a slice on my node. (In this sense PlanetLab is an aggregate that is affiliated with the PlanetLab control framework.)

RAVEN (John Hartman)

We're interested in what is going on inside a slice, not in controlling slices.

Working on GENIwrapper integration, RAVEN and Stork tool. Integrated Stork repository with GENIwrapper -- if you want to upload a software package to our repo you can do it through GENIwrapper and use GENIwrapper authentication. (Don't have to log in with your PlanetLab login and password any more.) Have modified SFI package to make it possible to upload packages to the Stork repository -- and then make it available to PlanetLab machines.

Chip Elliot What about packages that don't look like PlanetLab packages?

Can have different kinds of nodes, e.g. SPP nodes, and say this kind of node needs that kind of software. You can create a group of nodes that satisfies a CoMon query.

Larry Peterson Chip, you want groupings that are independent of aggregate groupings. John Hartman has node groups, which seems to do what you want.

On slice management front, somewhat integrated with GUSH. Haven't demoed it. We're using GUSH inside the Stork tool, in a pub-sub system, so nodes that are part of a slice can see that there is an update to their package and can reinstall themselves.

Chip Elliot You're running a valuable service there... are you going to do it indefinitely?

We're trying to get out of the repository business, just deal with a database of URLs. People publishing packages have to put the packages up themselves, make them available, then publish the URL via Stork.

Working on some other stuff. Can create groups based on CoMon queries, but the code is kind of dusty, needs to be brought up to date. Rudimentary monitoring service to monitor what's going on inside a slice. Basically wanted to monitor the health of Stork; generalizing it and making it available.

Larry Peterson Is there stuff you'd like to see in CoMon that's not there?

We should talk about that.

Jeanne Albrecht We use CoMon -- if you cleaned up the API we'd take advantage of it.

Milestones look good.

Larry Peterson CoMon works by installing on a slice that spans the machines it is monitoring.

Larry Peterson Once we have this it'll be an environment that's richer than the Amazon EC2. Using the same tools we can upload images, allocate slices, and run them.

Enterprise GENI (Guido Appenzeller)

Chip Elliot What does "integrate with switched VLAN in I2?"

We want to provide layer 2 to experimenters. Connect that in an Internet2 pop with a VLAN -- outside of OpenFlow. Demo for GEC6.

Jon Turner Do your NetFPGAs have any spare ports? That would be helpful.

If not, we'll figure out how to make some available.

Plan on integrating with PlanetLab "clearinghouse."

We've written our own Aggregate Manager, speaks lightweight protocol as defined at Denver meeting. Automatically discovers and reports switches and network toplogy and reports to clearinghouse via RSpect. Can virtualize Stanford OpenFlow network based on reservation rspec received from clearinghouse.

Chip Elliot What is in your RSPecs? All we're looking for in spiral 1 and spiral 2 we'll want VLANs and tunnels.

Goal for this spiral is "here are your three options to connect" and hard code them.

Chip Elliot In GEC6 we want to show experiments running -- not just demos.

Having meaningful experiments by November across aggregates will be difficult to do. Maybe demo experiments, not meaningful experiments.

Heidi Picher Dempsey For Spiral 2 for this project we were thinking about limited opt-in, one at Stanford one at Princeton.

We're working on a mechanism where first we put an opt-in end user on a VLAN, then when a user opts in we move them into the experiment's VLAN. The OpenFlow switches are installed, but production traffic is not using the OpenFlow ports (using the regular ports).

In Gates building, 3A wing only, five switches (HP and NEC), 25 wireless APs, ~25 daily users. 2nd phase is all of Gates building, 23 switches (HP ProCurve 5400), hundreds of users. Phase 3, 2H2009, Packard and CIS buildings as well, number of switches TBD (HP ProCurve 5400), > 1000 users.

We have built our own toy clearninghouse to support the functions we need, rather than keep asking Larry to add functions.

Aaron Falk Is there a plan to get your needs covered by the PlanetLab design?

I'll get to that soon, hold on.

Expand OpenFlow substrate to 7-8 other campuses. Multiple vendors (Cisco, HP, NEC, Toroki, Arista, Juniper) have agreed to participate. Goal is to virtualize production infrastructure and allow experiments on this infrastructure via GENI. Goal is >100 switches, 5000 ports.

Fundamental integration challenge for GENI is that we have very different substrates. Types of nodes, switches, layer 1, 2, 3, 4, ... How do you define all of this in RSpecs? This affects the clearinghouse; how does it apply policy? Detect conflicts? Present options to users? Help users resolve conflicts? How do clearinghouses manage this complexity? How do they keep up with rapid change?

Substrates will drive clearinghouse requirements. At this point we couldn't define a stable RSpec, as we don't know what we need yet. So maybe we should have individual clearinghouses.

Larry Peterson Maybe what you mean individual aggregates, not clearinghouses.

Chip Elliot Think of Amazon.com -- they don't know what they are selling, they have prices and pictures and descriptions.

Amazon.com can't sell airline tickets, too many options. Reserving a network slice is not commoditized, at least not yet. It'd be nice to have an Amazon.com like interface. Can't expect someone who writes a clearinghouse to manage all of the complexity of all of the substrates; the parties building the substrates should manage the complexity.

Larry Peterson But now users have 20 different UIs to deal with -- one for each substrate.

Chip Elliot We all agree that it's not the role of the clearinghouse to understand everything about everything.

Larry Peterson It'd be great if sfi had a GUI in front of it such that when a set of rspecs came back they were presented as a list sorted by aggregates and you go through each aggregate picking what you want.

Chip Elliot We really want you to integrate into a control framework. A year ago we didn't understand how difficult it would be; now we do. We'd like you to integrate, even if it's a very thin veneer.

Heidi Picher Dempsey Milestones are marked not done, this may be a difference of opinion. Maybe all we need is to close the loop, get some documentation, make the code public, etc.

GpENI (James Sterbenz)

Spent the first 9 months getting stuff up and running. Four nodes were up and demoed by last GEC; were cheating with one of them because one node's connectivity was broken at the time, but now it's running correctly. Since GEC4 been working on VINI. Just last week Katerina downloaded GENIwrapper and is starting to work with it. About seven, eight PlanetLab nodes. We are not affiliated with PlanetLab yet, but we would set up slices for people if they asked. We are running our own PLC.

Larry Peterson So when you affiliate, people connect to the slice manager they will gain access to your nodes.

DCN has been ported to our Netgear switch, waiting for an interface from Ciena, and then we'll put it on line. It'll be controlled by DRAGON eventually; we need to get a Ciena 4200, but that switch isn't available yet. We have the equipment but not the software to create dyanmically established circuits. Although we don't have the ability to put in bandwidth limits. Currently configuring the year2 switch, which will go to KU. Cienna 4200 doesn't have DCN drivers at this time.

Aaron Falk This is shared responsibility, need to help PlanetLab control framework mature.

Right, we understand that, we have always said that as soon as it becomes available we'll grab it, and we've done that.

Jeanne Albrecht We use GENIwrapper, it's working pretty well.

Haven't really been in touch with the GMOC folks yet.

How to do dynamic circuits?

Larry Peterson Two approaches: either one Aggregate Manager controls nodes and network together or have one Aggregate Manager for each. not sure you need to support both.

Planning to use MAX code / model.

SPP (Jon Turner)

Project goal is to acquire and deploy five SPP nodes in Internet2 POPs. Three nodes in Salt Lake City, Kansas City, and Washington DC. Houston and Atlanta will be added later. 10 x 1GbE, network processor subsystem, two GPP engines (server blades) that run the PlanetLab environment, separate control processor with a NetFPGA. User training and support, consulting and development of new code options. System software to support GENI-compatible control interface.

Deliverables: develop initial version of component interface software "matching GENI framework" and demonstrate on SPP nodes in WUSTL lab.

Aaron Falk How are resources partitioned?

On a node you run in a vserver. If you want fastpath on the network processor, you request the resources you want (bandwidth, ports, queues, ...) Important to understand is that a reservation isn't just "can I get this now" but also "can I get this tomorrow from 0200 to 0900?" Although we don't currently have a way to retract resources once given. Every node has a standalone reservation system.

Peter O'Neil Wasn't there an issue six months ago about VLAN translation?

Will come back to that. Original deliverable was to deploy two SPP nodes, we're going to make it three. Plan is to make this initial deployment available by end of Spiral 1, so it should be available to external researchers in the quarter after that. Architectre and design doc to GPO by month 9, SPP component manager interface documentation to GPO by month 12. Don't expect to have it all implemented by the end of spiral 1, probably not until the end of spiral 2.

Current version of the reservation system doesn't allow you to reserve external port numbers, but don't have a nice way to go to a vserver running in a GPE and force it to close a socket.

Larry Peterson Vserver reboot works...

Chip Elliot So you need to know the code that will run on the processor to make the reservation.

Flow monitoring, so you can tell which experiment is sending out packets to a host on the internet. Version 2 of Network Processor Datapath Software pushed back, toward the end of the year.

Y2: Carryover from Y1, Continue dev of component interface sw (integrate fast path and inter-node link bandwith reservation into GENI management framework, use RSpecs for making multi-node reservations), documentation, tutorials, sample applications using SPP fast paths, support.

If time permits: implement control software for NetFPGAs, terminate VLANs to MAX, GpENI, Stanford (?). Physical connection -- can we do it in time? What do VLANs connect to? Static or dynamic?

SPP nodes will not set up VLANs dynamically.

More if time permits: Cool demos at GECs (OpenFlow in a slice), demonstrating slice management with GUSH and RAVEN, eliminating IP tunnels where L2 connections exist, transition management to IU group, expand NP capabilities (Netronome 40 cores on low cost PCIx card).

Where else might we put SPP nodes other than IPP PoP?

Peter O'Neil Maybe get some hosting space, we sometimes act as a RON and provide access to elsewhere. This can be lower cost, but not free. Campuses would be better, most RONs don't have space.

Can't get much cheaper than Internet2 (we're paying nothing) although it's effort to work with them.

Heidi Picher Dempsey hosting centers are much cheaper than Internet2 space.

GPO Spiral 2 Vision

Spiral 1 had two goals -- control framework that controls a lot of stuff, and create end-to-end slices across aggregates. Spiral 2 only one big goal -- get continuous big real research projects running. This is where the rubber hits the road; is anyone interested? Can we really make it work?

Chip Elliot Operations becomes a big part of this, getting things up and running and keeping them running. This is a change from what you're used to.

Chip Elliot Spiral 2 is an opportunity to see which of the things we've developed are of interest to people. It's easy to build infrastructure that nobody uses; we want to see, of what we've built, what people want to use.

Aaron Falk Documentation, sample experiments, tutorials, users workshop. Want to help get researchers using the infrastructure. So here are some candidate ideas of what to include in your SoW for next year, to help us reach Spiral 2 goals. We're going to be pressing people toward Spiral 2 goals, prioritize based on the GENI goals. You might be interested in doing some work that'll be good for your aggregate or cluster, but priority will be given to tasks that further the Spiral 2 goals.

Chip Elliot Instrumentation and measurement: everyone has agreed that this environment will be really well instrumented. But we have very few efforts focused on this in Spiral 1.

Larry Peterson I see instrumentation largely as a slice issue.

Aaron Falk For many experiments you want the instrumentation to have minimal impact. Also want some extra-slice instrumentation, e.g. BER on a link that you're running over.

Chip Elliot Is it a researcher's job to resynchronize all of the clocks?

Larry Peterson There are useful common servivces -- logging, archiving, time synchronization, ... that we should provide. The work of instrumenting is then a slice issue.

Heidi Picher Dempsey After you've collected data, you want to be able to share it, too.

Larry Peterson MeasurementLab has three pieces; tools, embargoing data, platform.

Aaron Falk Negotiate with your system engineer to work out milestones.

Chip Elliot We also want identity management systems that are not tool-specific. We're advocating that we leverage other people's work for this. Currently recommending Shibboleth and InCommon for "single sign-on."

Larry Peterson Immediate reaction is that it's non-trivial programming effort.

Guido Appenzeller From our point of view, we trust the clearinghouse.

Guido Appenzeller By centralizing services we simplify things, right? It relieves you of a lot of the identity management work.

Chip Elliot I agree with that argument, but there is also the argument that there is benefit in being able to allocate and redeem tickets.

Aaron Falk If we go to a centralized trust model, will that preclude outsiders using these resources?

Chip Elliot But there will always be pairwise trust as well.

Chip Elliot Integration and interoperability. Integration has to continue, it won't all be working by October 1. How many control frameworks will there be when the dust settles? It's a big question for Spiral 2.

James Sterbenz Getting integrated with one control framework was hard -- how am I going to interoperate with multiple?

Larry Peterson ProtoGENI and PlanetLab share history in the SFA. Maybe we can bring them together, TIED as well.

Chip Elliot My view is that by October 1 we'll have enough experience with these control frameworks to determine what we can do. We may determine that nobody wants to do it. Or maybe we can unify things enough that not every aggregate will have to implement two or three interfaces.