wiki:GENIRacksHome/InstageniOpenQuestions

Context Navigation

← Previous Version
View Latest Version
Next Version →

Version 1 (modified by lnevers@bbn.com, 12 years ago) (diff)
--

GPO InstaGENI Questions
Adam Slagell Questions
Nick Bastin Questions

This page captures InstaGENI Design Questions for the InstaGENI Design. Also captured are questions sent to instageni-design@geni.net; discussions are captured to track resolution. Questions are crossed out when answered. Person closing a question should adding his/her name to the question with a brief explanation. Any document that is referred to with a URL is attached to the page for historical reference. Following are notes attributions:

AB: Andy Bavier         CG: Chaos Golubitsky    LN: Luisa Nevers     RM: Rick McGeer           
AH: Aaron Helsinger     HD: Heidi Dempsey       MB: Mark Berman      RR: Robert Ricci         
AS: Adam Slagell        JS: Josh Smift          NR: Niky Riga        SS: Steve Schawb

GPO InstaGENI Questions

Questions Submitted for Design Review

Design review GPO question 1:

We understand you intend racks are full PG sites running an SA that is fully trusted. We do not want that SA to be fully trusted (beyond the rack). We don't want that SA to be a GENI SA. Does your model require that? Put another way, GENI mesoscale aggregates currently trust pgeni.gpolab slices and users and Utah emulab and PLC, which is what we expect the InstaGENI racks to do. The racks would not use the full PG bundle or IG rack certificates would not be in the PG bundle.

<AH> Rob: Federation decides if these are trusted SAs. Initial plan is to federate these with PG,
     and they will switch to the GENI CH. Consequences of not including: user certs/slice creds 
     from the rack are not trusted elsewhere. We probably want a separate bundle of AM root certs, 
     so SSL conn to AM can be verified.
<HD> Racks won't be part of PG bundle unless they have federated with clearinghouse.  May need to 
     distribute a separate bundle of root certs so that when you make a connection to the SA you 
     can verify that it was part of that bundle on just AM certs.  That would be server certs instead 
     of CA certs.  First steps will be to use pgeni.gplab for initial rack deployments.  Standard 
     install is no access for local SAs.

Design Review GPO question 2:

Control plane switch paths. The drawings are somewhat confusing, so we may have this wrong, but it seems that the control plane for ProtoGENI and FOAM is going through the HP6600 OpenFlow controlled switch. We expected the control plane would be separated from the 6600, so that if user experimenters cause problems with that switch, the control plane will still be fine. Is there a reason to use the 6600 this way, which seems like a higher-risk approach?

<AH>  Rob: stuff is confusing. Control plane traffic is supposed to go through
      the smaller switch
      Rick: I'll take responsibility for drawings. Data plane is through the
      big switch. Control plane and ILO are through the 'baby' switch.
      Rob: same switch does access to outside and control communications -
      sometimes in separate VLAN. Like ILO stuff. PL nodes use top of rack 
      switch to public internet to contact MyPLC.
      Rob: frisbee stuff is on the smaller switch
<HD> All control plane traffic goes through top of rack switch (small).  Dataplane 
     goes through 6600.  ILO also through baby switch.  What about PL control
     plane?  Rob says emulab switch is giving you access to outside and control 
     communications, so PG will send PL control out through that as well.  
     Frisbee etc. on smaller switch.

Design Review GPO question 3:

We believe that having PlanetLab nodes in an InstaGENI rack is optional and should not be the default configuration. Sites that want PlanetLab nodes should have a way of being able to add them to the default configuration, which probably requires some engineering/operations coordination. Does this fit your model too? This model implies that there are two parallel Aggregate Managers running on a default InstaGENI rack--a ProtoGENI AM and a FOAM AM.

<AH> Andy: yes, exactly. There is an IG-PLC lke current PLC, just admin separately. 
     When site admin wants one, register it with IG-PLC, download ISO image (like 
     public PL), and brings that up as boot image, and PL SW stack takes over. It's 
     optional.
     Rick: IG-PLC running in Princeton (not on rack), PL node looks like just
     another experiment node.
     Rob: You create a slice that asks for a particular OS to boot this.
     We'll restrict the OS to certain slices, so we don't have people unintentionally
     bringing it up
     Rob: and work with credentials to make the slice not expire. Probably a global 
     IG-PL slice with a very long expiration
     Rob: OpenVZ are different mechanism. 2 admin options for shared nodes:
     (1) OpenVZ, (2) PL. Having both options and ability to go back/forth lets us see 
     which get the most use/meet experimenter needs, an adjust allocations over time.
     Rob: if local admins want control on when PL comes up, they can have it 
<HD> Our assumptions are good.  InstaGENI PlanetLab central is setup and administered 
     separately from public planetlab.  When an admin at a site wants to bring up one 
     of the IG nodes as a PL node, he registers with PL, downloads the image and the 
     PL stack takes over and installs the node.  IG PLC will run off the rack (eg at Princeton.)  
     For a PL node, one creates a slice that has a particular OS.  Will restrict that OS to certain 
     slices, so people don't accidentally bring up PL nodes.  Also fiddle with expiration mechanisms.  
     Have a global IG PL slice with a far-way expiration date.  Admins have two options to do these 
     shared nodes.  One is PG node, one is PL node and anyone can request slivers on them, can go 
     back and forth between them.  Local admins can have that control if they want it.

Design Review GPO question 4:

Will stitching code that will be supported for InstaGENI racks be the same as what is planned for ProtoGENI? If not, what are the differences?

<AH> Rob: same as PG Utah, like what they demo'ed to connect to KY.
<HD> Stitching code is same as in PG.  Rob says the same stuff that PG demoed for making slices to Kentucky.
<LN> Do we have an answer about the specifics for this? Did we get details about the individual examples (delay node, initscripts)?
<AH> To my knowledge we did not get answer to what is different on these racks from general ProtoGENI. 
     For stitching in particular, Rob said it is the same as they currently have. So no answer to Q24 yet.
<LN> Moving delay node question to list to be posted on InstaGENI mail list.

Design Review GPO question 5:

The proposed plan for making sliver metadata available to GMOC seems reasonable. How are you planning to make time-series data on the health of components available? Are there other significant questions from the GMOC team?

<AH> Rob: Give GMOC SNMP access for some info
Rob: emails go to admins on things: should go to local admins and IG staff
JP: We want up/down status of slice, sliver, node, AM - as well as timeseries data
Rob: Up/down stuff is in DB & can have an interface for GMOC to query. Port counters
- maybe handle it same as now (select few IPs to do query-only SNMP state of switches)
JP: Good start. Then need to understand topology
Rob: For experimental plane, the Ad RSpec is best source of that data.
Currently allocated VLANs are in the DB on the node and go in the slice-specific manifests.
Should make all manifests available to the GMOC.
JP: We should talk more about that, formats, etc.
Rob: I'll send you example manifests. etc
JP: Emergency stop looks good. Didn't expect ILO, don't know how often I'll need it
Rob: We tested shutdown once, know it works
JP: Will each rack have a separate contact?
Rob: site surveys have contacts?
Rick: both technical & admin
Joe: backups can be added
JP: General support center contacts?
Heidi: Notify sites about the SNMP read requirement (heads up)
<HD> What is timeseries data? Port packet counts? CPU utilization? Nodes available?
Thinks that were listed on the requirements wiki page. PG already maintains lots of info
about nodes up/free, how many VMs etc. For other PG nodes, they previously just gave
GMOC SNMP read access. Currently monitor things like fail to boot and send emails to
admin--probably both local admins and InstaGENI staff and GMOC in the IG case. JP
wants up/down status of sice/sliver, node, aggregate, as well as time series data we listed.
Port counters is probably the right way to go for select IP addresses. JP mentions that
there will have to be some understanding of topology (advert RSPEC) to make the data
relatable to experimetns/PIs/slices. Currenty alloc VLANs go only in manifest, not advertisement.
Need to make all manifests avail to GMOC. JP and Rob will have to start comparing formats. Look
at manifest format for RSPEC. Rob will send examples. JP says emergency stop looks good and ILO
access is good. PG has tested that interface already. Will there be individual contacts for each
rack? Yes. The contact info is included on site surveys. Rick will update form to reflect SNMP access.
May need to contact some of the other sites. Prepare questionnaire to match acceptance tests?
That is Rick's intention.
<CG> * InstaGENI will plan to allow GMOC SNMP access to rack network devices
* InstaGENI will notify campuses that this access will be happening
* InstaGENI will work with GMOC for collection of other operationally interesting data from rack boss nodes
* InstaGENI will maintain a service which can be used to track a public IP address of interest to a slice,
for use in investigating problematic network behavior.

Design Review GPO question 6:

How do experimenters get access to their resources in the limited IP case where you need NAT (since experimenters don't have ops accounts)? Will outgoing connections work? Will they be able to run a web server?

<AH> Rob: In case of no NAT (lots of IPs) - physical nodes and OpenVZ containers get public IPs. 
     But NAT.... Run an SSH daemon on ops VM: we create account with their public key on that 
     ops machine. Manifest field for login will say ssh 1st to this host & then to your sliver.
     If limited IPs available, we'll allocate them first come first served for now - could 
     eventually make it an allocatable resource.
     Rob: The ability to log in to ops using remote account is something we're implementing now.
     Rob: This is also why we want to run PL: Cause it does a good job with sharing IPs. Still
     only 1 port 80, yes. But PL lets users bind own ports. Assuming many sites run PL, then if you
     are doing NAT and want a public server then use PL.
     Rob: Could do similar in OpenVZ, but have no plans to do so. Hence need for PL.
     Rick: most 1st set campuses have plenty of IPs.
     Rick: We have lot more interest then 32 campuses. So we could demand /8 if we need to.
<HD> When no NAT required, we give physical nodes and openVZ containers publically-routable 
     IP addresses.  If we have to do NAT, run an SSH demon on the VM that is Ops in the plan.
     When someone gets a sliver, we create an account with their public key on that ops machine. 
     The manifest field that you get back after you've created the sliver, it is sufficiently
     expressive to tell you how to SSH to a host and then SSH to your individual sliver.  If 
     there are a limited number of IP addresses available, alloc on a first-come, first-serve 
     basis to the VMs that are on the rack.  New slivers will have to go through NAT after the
     limited IP adresses are used up.  Yes, this is different than how PG works now.  PL is 
     better for sharing the single public IP address, so if lots of people want to do that, 
     it makes more sense to run a PL node.  Will need to direct experimenters who want public 
     addresses to go to racks with PL nodes or racks that have enough addresses and don't need NAT. 
     Rick doesn't think address limits are going to be a big issue for the places where they want to deploy racks.

Issues to track/follow up

These items are not questions, but should be tracked to verify delivery or resolution:

I.1. Need updated drawing for topology diagram.

I.2. In the limited IP case when you need to NAT, how do remote users get access (since they don't have an Ops account)?

Will outgoing connections work?

<HD> This is currently under development and somewhat limited, done on a first-come 
     first-serve basis, need to track resolution (not part of questions)

I.3. (Req C.3.c) What is required to run Ilo? I.3.a. What types of clients are supported for Ilo? I.3.b. Is Ilo connected to management switch?

<HD> Need followup on pdu and need to read available documentation.

I.3. There is no remote PDU access, so there is no provision for reboot. What about non-ilo?

<HD>  There is a terminal interface that we will figure out how to connect, 
      need to follow up on software and wiring solution.

Misc. questions answered in design meeting

M.1. In the limited IP case when you need to NAT, how do remote users get access (since they don't have an Ops account)?

Will they be able to run a webserver, EG?
```
<HPD> No, use PL NODE in rack.
```

M.2 Confirm we get full (prototype) GENI stitching, as at PG-Utah.:

Can reserve/provision a VLAN to VMs and bare metal nodes, both allocated from static pool and dynamically negotiated.
```
<HD> Yes in demo mode, closed.
```
- Will full GENI stitching work? Will IG racks fully support dynamic VLAN requests, as with the stitching extension? Supplying a VLAN tag and requesting translation to connect it to slice resources on the rack? (Req C.2.e)
```
<HP> Define "fully support dynamic" follow up with Aaron.
<LN> What does "fully  support dynamic VLAN" mean?   Is it the same as dynamic 
     vlan support or did you have something else in mind? 
<AH> I think we heard 'Yes' - same as the existing PG-Utah stitching functionality.
```

M.3 Explain the support story between Utah and Princeton sites

<HD> In the design review Andy Bavier said that they will run one MyPLC to support all InstaGENI racks,
     they did not specify the location - Need followup on the security and policy implications.
<HD> Andy said for PL upgrades he alleged that the normal PL upgrade would work for InstaGENI rack. they post an
     image you can upgraded whenever using existing tools over the network.
<HD> Utah, (PG) said they were going to support all key distribution and packaging for PG software, also when
     upgrades take place, they would take over for each rack. 
<HD> We do not know the upgrade path for OpenFlow.  We need to follow up here to confirm with Nick (not at meeting).
<HD> Bottom line: there was a story, but not documented. We need to get documentation and verify.

M.4. Does the OS support the ability to modify the kernel (EG add new modules)?

<HD>  Can't do kernel mods in either PG or PL containers.  PG has non-production level support for using XEN as 
      a virtualization technology.  The bare metal node is still a PG node if they use it for the kernel mod resource.

M.5. (Req F.1) Please expand the required IP address requirements to detail more than minimum required.

<HD>  IP address required are 256 addresses, can work with less. For site that cannot provide 256 
      address they can provide limited IP solution or NAT solution.

M.6. The PlanetLab Integration section states: However it is possible to base the Linux environment offered to slices on different Linux distributions -- for example, a single node could support slices running several versions of Fedora, Ubuntu, and Debian Linux. Are the OSes listed run simultaneously?

<HD> Multiple versions of the same OS (ex, ubuntu) can be run, but the there is no mixing 
     of OS types (ex, ubuntu and centos).

M.7. At previous GECs, InstaGENI was advertised as a rack that can have hardware added as needed, but this requires coordination to ensure that a given set of ProtoGENI images in the rack is supported on the new hardware. This should be made clear to site administrators before they attempt to purchase hardware for the rack, and it should be made clear that site administrators will lose some level of support from the InstaGENI team if they add arbitrary hardware into the rack.

<HD> Rob said that if a site installs the same image or same set of existing hardware, 
     than can it can be added.  But, if it is different, than it is not supported and
     sites are on their on.
<HD> Need to generate question got allowable and supported combinations

GPO Design Questions

G.1. What PG functionality is not here or is different in the InstaGENI racks?

<RR> We are not planning to leave any ProtoGENI functionality off of the racks. 
     I expect there will be a smaller set of officially supported disk images 
     (two Linux, one FreeBSD). And, of course, we are going to be more conservative
     about updating the software on the racks than we are about the Utah ProtoGENI 
     site, so they will be a bit behind in terms of features.
     The issue about Linux distributions on shared nodes that I mentioned is actually 
     one of Emulab support vs. ProtoGENI support; it's a feature that Emulab supports, 
     but we haven't yet plumbed that support through to the ProtoGENI/GENI APIs. 
     We expect to do so.

G.1.a. Does InstaGENI intend to support delay nodes? Initscripts?

<RR> We will definitely support initscripts. We will also support 
     delay nodes, though I suspect they are not likely to see much 
     use on the racks.

G.2 (Req D.8) Does each rack provide a static (always-available) interface on each GENI mesoscale VLAN to be used for reachability testing of the liveness of the meso-scale connection to the rack?

<RR> We had not planned to do so; allowing VLANs on trunks when they 
     are not actually in use can result in stray traffic - for example,
     flooding during learning, spanning tree, and other switch protocols. 
     For this reason, we dynamically add and remove VLANs from trunked 
     ports (which is how we expect to get to the meso-scale VLANs in 
     most cases) as needed.
     However, if the GPO believes this requirement overrides concerns 
     about unnecessary traffic, we can meet it by having some 'permanent' 
     slivers on a shared node.

G.3. Who gets access to the fileserver VM on each rack? All federated users? Or only local users? Does this mean all users of a site with limited connectivity must be local users? (See Q41)

<RR> All users of the federation who have created a sliver on the rack. 
     Our plan is that when any federated user creates a sliver, they will 
     be given a shell on the fileserver VM even if they are not local users. 
     (These accounts will probably have usernames that are not taken from 
     the URN like 'local' user accounts.)

G.4. The design states that users will be able to log in to the ops node; how will that work? Current ProtoGENI instances don't do that for non-local users, e.g. if I create a sliver at UKY with my GPO user credential, I don't get a login on the UKY ops system.

<RR> This and G.3 have the same answer as the 'ops' VM is the fileserver VM.

G.5. Document how the VLAN translation will work on the switches.

<RR> Joe and I will update the document on this point soon.

G.6. (Req C.2.b) Document the mechanism for connecting a single VLAN from a network external to the rack to multiple VMs in the rack?

<RR> I will update the document on this point soon.

G.7. (Req C.2.c) Document the mechanism for connecting a single VLAN from a network external to the rack to multiple VMs and a bare metal compute resource in the rack simultaneously?

<RR> The answer for this will be the same as for G.6, as we support the
     same mechanisms (either untagged ports or 802.1q tagging) for both
     VMs and bare metal hosts.

G.8. (Req C.2.e) Document the following:

G.8.a. Does InstaGENI support AM API options to allow both static (pre-defined VLAN number) and dynamic (negotiated VLAN number e.g. interface to DYNES, enhanced SHERPA) configuration options for making GENI layer 2 connections?

<RR> I'm not sure I fully understand this point: the GENI AM API does 
     not yet have any stitching support in it. We do have such support 
     in the ProtoGENI API, and have been working with Tom Lehman and 
     Aaron Helsinger regarding getting it into the GENI AM API.

G.8.b. Will InstaGENI racks fully support dynamic VLAN requests, as with the stitching extension?

<RR> This will vary from site to site, depending on what connectivity 
     the site provides. On the ProtoGENI side of things, yes, we will 
     support fully dynamic VLAN requests. We do expect, however, that
     some sites may only provide us with a static set of VLANs across 
     campus to some backbone connection point; this is, for example, 
     the situation with the current setup at Kentucky. 
<AH> Will the IG racks support the stitching extension published by Tom
     Lehman, including requesting and being granted a VLAN tag (whether
     dynamically allocated or allocated from a static pool), and also
     including requesting a specific VLAN tag (a request which may succeed or
     fail)?
     If the VLAN is from a static pool of pre-allocated VLANs: are VLANs
     always for a single slice? Or will the racks have provision for
     requesting a shared VLAN (as in the mesoscale VLANs)?

G.8.c. Supplying a VLAN tag and requesting translation to connect it to slice resources on the rack?

<RR> The switches in our racks do not support VLAN translation, so if 
     this question is  primarily about VLAN translation, the answer is 
     no. However, it's certainly possible to connect external equipment 
     on an untagged port that happens to be in a different VLAN on our 
     rack switch and the other side.

G.9 (Req B.2.a) Have the GENI API Compliance/Acceptance tests been run on the InstaGENI rack?

<RR> Since it's the ProtoGENI AM, yes. As far as I know, they fully pass, 
     though Tom would know for sure. 
<AH> Well, the tests are distributed in the Omni/GCF release bundle exactly
     so that you can run them yourself. Last I heard there were some issues -
     RSpecs not passing rspeclint.
<TM> There are a couple of minor issues. I just pinged Jon D. on the protogeni-dev 
     list about an outstanding issue with the stitching schema. It's some minor 
     capitalization mismatches and a missing tag. Jon's very handy rspeclint tool 
     complains.
     There's also a minor issue that ProtoGENI will accept a manifest RSpec as 
     a request. I don't have the details on that one handy, but I can dig them up
     if need be. Sarah reports that this is a known issue, so maybe it has been 
     reported previously.
     The other issues with respect to the AM API compliance tests have been resolved 
     as far as I know.

G.10. Document the OS Version supported and any known OS restriction in an InstaGENI rack for bare-metal and VM systems.

<RR> Okay, we will document this. My current plan is to provide default
OS images for Ubuntu, CentOS, and FreeBSD. If you have any feedback
from experimenters on which Linux distributions they would prefer,
that would be helpful.
<MB> I don't claim to have made a proper study, but the most popular
requests appear to be Ubuntu and Fedora.
<RR> We've supported Fedora in the past, and still do to some extent.
The problem with Fedora is that the half-life of Fedora releases is
very short: we find that the package repositories for the old release go
away very quickly once new releases are made, which happens every 6 months.
This also means that security updates stop. It requires a lot of manpower to
keep up with the frequent Fedora releases (which are quite happy to change
very large subsystems every 6 months). Experimenters, too, are likely to have
to do significant work to get their code to work with either the latest version
of Fedora, or whatever slightly older version we support. It's also bad news for
shared nodes, which we'd like to update as rarely as we can. Ubuntu has 'long term
stable' releases, and CentOS also works around a longer release cycle.
<MB> Completely understood, but you asked what people are requesting. ;-)
I expect relatively few experimenters really _need_ Fedora, but that doesn't
stop them from asking.
<AS> I have a similar issue with Fedora. It quickly becomes out of date and insecure.
<JS> Would the people who request Fedora in fact be happy with the latest CentOS?
If they just mean "something Red Hat like", and think that means "Fedora", they might.
Conversely, will the Ubuntu people be happy with the latest LTS version,
or will they want something more cutting edge?
My guess is that these two go together somewhat: If people need the latest
Fedora, then they may want the latest (non-LTS) Ubuntu too; whereas if
they're happy with Ubuntu LTS, they'd probably also be happy with CentOS.
My two cents is that we should start with CentOS 6 and Ubuntu 12.04, and
go from there. (Among other things, Ubuntu 12.04 will be the latest and
greatest until October anyway. :^)
<NR> Is there any plan to support any Microsoft OS images, either in the bare-metal
nodes or in the VMs?
<RR> This has been covered before: we are not planning to provide any Windows
images ourselves. Sites that have a site license for Windows and would like
to make their own images are free to do so
<NR> Just to clarify you are not planning to provide any Windows OS image for the
bare metal nodes, but sites that want to provide one for their rack are free
to do so.
<RR> Correct.
<NR> Would you support them in making the ProtoGENI image, or they are
on their own there as well?
<RR> We have this process documented on the Emulab website; it's a long one,
but a couple of other sites have done it. (We can't give them our images for
licensing reasons, even if they have have a legitimate Windows license.)

G.11. Document Custom Image support . Detailing usage, updates and distribution.

<RR> I will update our design document soon with the information that I shared on the phone.

G.12. Does InstaGENI support software updates of experimental images? If supported, document the mechanisms.

<RR> For the images that we provide, updates will be provided after major 
     releases to the distribution, though the time to do so may vary depending 
     on how large the changes in the distribution are. (For example, major kernel 
     updates and changes to the initialization system can take a while to support.)
     If security fixes for 'remote root' exploits in software packages that are enabled 
     by default in our images are released, we will provide updated images in a timely 
     fashion, made available through the mechanisms documented for G.11 .

G.13. Regarding hardware component's "no warranty", what does this mean for dead on arrival hardware?

<RR> This is one for someone from HP. 
<RM> Dead-on-arrival is handled via the same process as a support call.  i.e. call a 
     toll-free number (in the US) with the serial number and product number, and help
     the agent root cause to the minimal set of field replaceable units.  Functional 
     FRUs should arrive within a day or two and the non-functional FRUs will need to
     be returned in the packaging that the new unit arrived in.
     He confirmed this is the case even under the specific program we're using
     to buy the equipment

G.14. Are there plans for Firewall/IP-tables on control node?

<RR> We plan to run a firewall in dom0 on the control node to protect the control VMs from the <<<Incompleted in email??>>>>

G.15 Are monitoring and software updates taking place via the Data plane?

<RR> No, these take place on the Control Plane. (Of course, if an experimenter wishes 
     to build a network on the Data Plane for monitoring traffic, they are free to do so.)

G.16. There were inconsistencies in the document about the NICs on the experimental nodes. What is the number of NICs on experimental nodes?

<RR> To avoid further possible inconsistencies, I will let someone from HP answer this.

G.17. Does the current PDU support the addition of more devices?

<RR> I'll also let HP field this one. 
<RM>   Ss configured with the 252663-B24 PDU, 5 compute nodes, 1 each control node, 
       internal switch and external switch:
        a. power - this is a 3.6kVA unit. there should be a bit of power to spare.  so yes.
        b. outlets - 14 IEC 320 C-13 outlets.  so yes.

G.18. (Req C.1.a) Scaling to 100 compute resources in not explicitly stated. Can InstaGENI support operations for at least 100 simultaneously used virtual compute resources?

<RR> Can we create 100 simultaneous slivers? Certainly. But I'm not sure this
is useful way of looking at the issue; since Vserers, OpenVZ, and LXC all
use the Linux kernel's scheduler, which is work conserving, this question
is more or less the same as asking "can you run 100 processes on a Linux machine.
" The answer of course is yes, but whether you can *usefully* run this many
processes is dependent entirely on the needs of those processes. (The control
software we are using scales well past 100 slivers, of course.)
We've picked Linux container based slicing for the nodes precisely because it has
the best scaling properties; overheads for each sliver are far lower than they
would be for, say, VMware or Xen. We believe that we will get the best possible
scaling from PlanetLab; I'll let someone from Princeton say more, but I believe
that running more than 100 slices on a single node is by no means uncommon for them.
<NB> I'll throw in my unsolicited $0.02 here - my experience with OpenVZ versus Xen is that
a PV guest is only 2-3% more expensive (in processor) than a container. Both of them
kill your disk, and this is usually where you get into trouble first (long before memory or CPU) -
running 100 workloads on 1 disk is simply impossible, but if you have 100 slivers and 98
of them are mostly idle, it'll work regardless of your solution. Of course, the upside of
Xen is that you also have the option of running HVM guests, something which is impossible in
containers (even some PV-suitable guests, like mininet, can't run in a container).
I'm not trying to argue against the use of containers - I think they have a time and a place,
and this very well may be one of them, but we need to evaluate experimenter use cases
(mostly, do they need windows, solaris, custom linux kernels, etc.). More importantly, we should
realize that 100 active slivers on 5 disk spindles is going to be a potentially significant
performance problem depending on your definition of "active".
<AB> On PlanetLab nodes, we frequently see on the order of 100 "live" slivers (that occasionally
wake up and do something) and 10-20 "active" slivers (with a currently running process).
The typical PlanetLab node is much less powerful than an InstaGENI experiment node, so I
expect that an IG-PL node will be able to support significantly more "live" and "active" slivers.
Of course, as Rob points out, exactly how many depends on the workloads they are running.
<AB> I agree that we should evaluate experimental use cases and try to provide the tools that
experimenters need. However, in the first part of your note you seem to be saying that
there are no significant tradeoffs between VMs and containers. Of course we've written some
papers about this, but since this is the InstaGENI design list, I thought your note deserved
a reply. Hope it doesn't sound too defensive.
Virtualizing resources is already one of the main jobs of the OS, and so the philosophy behind
the container approach is that a simple and efficient way to provide better virtualization is
to leverage and extend existing OS mechanisms. Container kernels are simpler than hypervisiors
since there are fewer layers in the stack, and so less overheads and fewer opportunities for
surprising cross-layer interaction (e.g., resource schedulers at different layers working at
cross-purposes). Some examples of ways that VServers are more *efficient* then Xen are:
* Xen reserves physical memory for each VM, potentially leading to underutilization.
VServers share memory best-effort and so can pack more containers into the same memory footprint.
* Each Xen image has its own separate file system, with potentially multiple copies of the
same file (e.g., libraries). VServers use CoW hard links to maintain a single copy of shared
files, conserving disk space and leading to better file caching.
* Xen network traffic flows through dom0, introducing overhead in network I/O relative to VServers.
I'm not saying that these limitations are fundamental to Xen, and definitely work has been done
to address each of these issues. My point is that containers give you certain efficiency benefits
almost for free, whereas you need to do something clever to get the same efficiency in Xen (like
virtualize the NIC in hardware and map the virtual queues directly to Xen VMs). Like everywhere
in systems, there is a tradeoff: to get the same efficiency in Xen, you need to add complexity.
The reason that efficiency has been important to PlanetLab is that more efficient = cheaper. I
think that PlanetLab was extraordinarily cheap given the utility that the research community has
got from it. In my opinion, one of the main reasons that PlanetLab succeeded was our best-effort
model of resource sharing -- even though we have taken a lot of flack for it over the years.
To sum up: if experimenters need to run their own OS or just find VMs easier to use, then VMs are
the right tool for the job. But in cases where both will do equally well from the experimenter's
standpoint, I think containers have some clear benefits over VMs.
<NB> Not that we want to get into some long discussion of containers vs. VMs, but I think this is
generally worth talking about as what services we'd like to provide to experimenters.

> * Xen reserves physical memory for each VM, potentially leading to underutilization.
> VServers share memory best-effort and so can pack more containers into the same memory footprint.
Memory over-commit in Xen addresses this issue (and VMWare has had similar functionality for much longer).

> * Each Xen image has its own separate file system, with potentially multiple
> copies of the same file (e.g., libraries). VServers use CoW hard links to maintain
> a single copy of shared files, conserving disk space and leading to better file caching.
Xen offers this functionality via thin provisioning and fast cloning with COW semantics
for multiple VMs running from the same base image. Obviously multiple VMs don't *have* to run
from the same base, but you could imagine encouraging experimenters to use a common image unless
they had a compelling reason to do otherwise.

> * Xen network traffic flows through dom0, introducing overhead in network I/O relative to VServers.
This is why I was emphasizing the use of PV kernels, not HVM. Even vservers are using
a software bridge as their first stop for packets (at least if they're properly isolated),
and properly isolating a container from a network perspective is actually pretty hard (LXC
makes this easier, both Vservers and OpenVZ are more complex, but even LXC has some breakdowns).

> To sum up: if experimenters need to run their own OS or just find VMs easier to use,
> then VMs are the right tool for the job. But in cases where both will do
> equally well from the experimenter's standpoint, I think containers have some clear benefits over VMs.
I am less convinced of "clear benefits", but I'll concede the point for purposes of
argument that of course there are some benefits to any given solution over another.
However, my main point was that I take issue with the implication that the mere choice
of a given technology gives you scalability benefits that cannot be achieved via other
methods. I think it's also fair to point out that containers have a higher administrative
burden on software updates than VMs do - if an experimenter wants to use Fedora 16 in a
VM that is a trivial operation (even in PV), but if they're in a container they have to
hope that the admin has been on top of OS updates for any of the OSes they care to use.
Also, while providing multiple distributions in a single containered solution is possible,
there are significant drawbacks and administrative headaches that we should be clear about.
Because you're running a single kernel, your options for distribution runtimes are
restricted to those that will function with the specific kernel you have in place
(an issue that is going to be particularly interesting in the medium term as distributions
make differing choices about Linux 3.0 kernel support, but also relevant in the long term
given the tendency of various distributions to make custom kernel modifications that are
incompatible in their runtime environments).
I want to make it clear that I have no problem with containers, but I want to beat back a
bit of the "containers are fast and VMs are slow" that seems to be going around, as this
is simply not the case in a modern environment. I also think that containers have a higher
continuing administrative burden (presuming we are diligent in keeping up with distribution
updates and core kernel features), while VMs have a higher cost of entry on setting a system
up, but have less of an ongoing admin burden, and I would gladly take a higher bar to entry on
setup if it made maintenance of the racks at each site (particularly if scaled into the 100s)
significantly easier.
<AB> Definitely you are more familiar with the current status of Xen development than I am. There is
a lot of momentum behind Xen and I'm sure that its efficiency limitations are being / already
have been addressed. In my opinion PlanetLab has some solid reasons for sticking with containers
but we don't need to debate these here.
I think the main reason why InstaGENI is advocating containers for scalability (instead of Xen VMs)
is that the team members have built complete solutions that use containers. In the short term,
container-based virtualization on InstaGENI seems to be the most pragmatic solution. In the longer
term, I don't see anything that precludes us from deploying Xen VMs on InstaGENI if someone is willing to do it.

G.19. What is the plan for the transition from VServer to LXC? The current deployment plan lines up with the estimated availability of LXC. Please confirm or clarify.

<RR> I'll let Princeton take this one. 
<AB> Assuming that the LXC image is ready by the time we want to start deploying IG-PL nodes, 
     we will have no need for VServers and we will deploy the LXC image on InstaGENI.

G.20. Design document should provide much more detail about the topology and the handling of all traffic types (monitoring, updates, user, etc) in the topology.

<RR> Sure, we will update this in the design document.

G.21. Document the expansion hardware and image combinations that are supported for any InstaGENI site to implement.

<RR> To be clear, there are two things I said that we won't support:
     1) Sites making local changes to the control software
     2) Sites adding PCs to the rack which won't boot our standard images
     These are two separate issues: if a site has to build a custom image 
     to support nodes that they add, this usually does not require changes 
     to the control software, and they will not forgo our help in updating the control software.
     There is documentation about node requirements here:
     http://users.emulab.net/trac/emulab/wiki/HWRecommend
     I will add a link to the design document.

GPO OpenFlow Questions

(jbs - I'm content at this point with their design, and don't have any remaining questions.)

Questions submitted by Josh Smift:

I wanted to summarize a bit how we think OpenFlow will work in the InstaGENI racks, to make sure we're all on the same page, since the original design document was a little confusing on this score.

We think that the current plan is for the HP 6600 switch to run in hybrid mode, with three types of VLANs that are related to slivers:

(1) VLANs that are dedicated to a sliver, created and removed by ProtoGENI, much (exactly?) like existing ProtoGENI VLANs now.

(2) VLANs that are dedicated to a sliver, created and removed by PG, just like the previous kind, except OpenFlow-enabled, and pointing directly to an experimenter's controller. (See below for more about this.)

(3) VLANs that are shared between multiple slivers, OpenFlow-enabled, pointing to the InstaGENI FlowVisor in the rack as their controller, which will in turn be controlled by FOAM.

<RR> Yes, you have this exactly right.

One general question: Would it be better to run the switch in aggregation mode, with all VLANs OpenFlow-controlled, and implement the first type of VLAN by running a learning-switch controller on the control node? Nick is probably best positioned to talk about the pros and cons of this approach, but we weren't sure if it had been considered and rejected, or was still on the table, or not really considered yet, or what.

<RR> We've talked about this with Nick, and I think that hybrid mode is 
     far preferable. I can go into a lot of detail about this if you'd like,
     but the high level point is that experimenters who aren't choosing to
     run experiments with OpenFlow should not be affected by it. 
<JS> Well, this depends somewhat on how you define "affected", in that non-OF
     VLANs can still be affected if the switches OF-controlled VLANs overwhelm
     the switch's CPU, but fair enough. I'm content to defer to Nick on this.
<RR> Yes, I do realize the OpenFlow users could potentially DoS the switch, 
     but it seems like we have to live with this if we want to offer OpenFlow
     and don't want to buy a separate switch for it. But we should still
     minimize the impact.
<AH> How does an experimenter in their request RSpec indicate that they want
     an OpenFlow controlled network?
     I assume that only in that case does PG send FOAM the manifest (or whatever 
     you use to send info about the allocation)?
     Does the manifest RSpec have sufficient information for the experimenter to
     know what flowspace is legal for them to request from FOAM?
     Do you and Nick in fact have a plan for how PG gets this information to FOAM?
     To the extent to which you have to add information to the request and manifest 
     to make this work, how does this relate to the RSpec changes that Ilia has 
     requested to support similar functionality in the ExoGENI racks 
    (see http://lists.geni.net/pipermail/dev/2012-January/000553.html )

About the second kind: On the design review call, Rob described a scenario in which an experimenter wanted a dedicated OpenFlow-controlled VLAN, but wanted to run their controller on one of the resources in their new sliver (e.g. on an OpenVZ container they were requesting). This would mean that they couldn't supply their controller info (IP address and TCP port) in their request rspec, because they obviously don't know that information for a resource they haven't gotten yet. Rob suggested that to accommodate this, ProtoGENI would configure the switch such that it ran a *listener* for the experimenter's VLAN, which the experimenter would tell their *controller* to connect to, rather than the more typical approach of configuring the *switch* to connect to the experimenter's controller.

I wasn't sure that this would actually work, and talked with Nick about it later in the week, and he confirmed that it won't. In particular, you can't actually do many OpenFlow actions when you talk to a listener like this (e.g. with dpctl); and controller software generally expects to receive connections from switches, not to connect actively to switches, so most (all?) existing controller software wouldn't work in this model.

There are other ways to solve the problem that Rob described; for example, there could be a way for the experimenter to indicate in their rspec that they want their controller to be an OpenVZ container that they're requesting, but they don't know its IP address, but once PG gives them the container, it should configure the switch to use that container's address (and either a specified TCP port or an arbitrary port or whatever). So this doesn't seem like a big problem; but we wanted to make sure that we were all on the same page about how it would work, and that there weren't any other obstacles to having the switch connect to the controller in the second scenario.

<RR> This comes as a surprise to me, since I've used OpenFlow with the switch 
     as a listener in the past. But that was a few years ago, and I guess things
     have changed?
     Anyhow, the alternate solution you have sketched out sounds good to me.

(We're also not sure how common this situation will be, since another issue with having the controller be on a container in the rack is that if the rack doesn't have a public IP address, things outside the rack won't be able to connect to it... Our experience is that experimenters generally want to run a single controller to control all the OF resources in their slice, not a controller per switch (or even per rack), so they're going to want to run their controller on something with a stable public IP address.)

One other problem with the second scenario is that each VLAN is a separate OpenFlow instance on the switch, and we think that the switch can only handle a dozen or so instances at once before bogging down (CPU is the bottleneck, I believe). So, in practice, we may want to do something to discourage experimenters from reserving dedicated OpenFlow-controlled VLANs unless they're sure they really need them, or something. (I think that limitation may go away if the switch runs in aggregation mode, which is one of the advantages of doing it that way, but I'm not 100% sure.)

<RR> That's my understanding too. I don't think this is going to be a problem: 
     while some important experiments do require OpenFlow, a majority don't. 
<JS> Sure, but what I actually had in mind here was steering people who *do*
     want OpenFlow towards using a shared VLAN rather than a dedicated VLAN,
     unless they need a dedicated VLAN for some reason. Does that make sense?
<RR> Sure, I have no problem with steering people towards a shared OpenFlow VLAN.

Moving on to the third kind: Rob said on the call that in this scenario, the sequence would be something like this:

Experimenter requests a combination of ProtoGENI and network resources,indicating that they want their network resources to be OF-controlled on a shared VLAN.
PG allocates the resources.
PG tells FOAM that the experimenter is authorized to have the relevant network resources.
PG tells the experimenter that to actually get those resources, they should create an OpenFlow rspec and submit it to FOAM.

Our question was: If PG is going to be talking to FOAM anyway (which seems like a fine idea), why doesn't it just allocate the resources then and there, rather than requiring the experimenter to write and submit a separate rspec? Telling FOAM "if this experimenter asks for this flowspace later, let them have it" doesn't seem like it would be any easier (on the software side) than just telling FOAM "give this experimenter this flowspace"... In fact, it seems like it'd be somewhat harder, since FOAM already has a complete interface for "give this experimenter this flowspace", and doesn't yet have a full implementation of the policy-engine style "if someone asks for something that looks like this later, give it to them" sort of interaction. And it's much easier for the experimenter, of course, if they can just get their resources immediately, rather than if they have to go write and submit a separate rspec (giving them a separate sliver, in a separate aggregate, with a separate expiration date, etc).

<RR> My thinking here is that there are two things going on here: 
     the first is the connection of resources in the a slice to a shared
     VLAN run using OpenFlow, and the second is allocation of flowspace in
     that VLAN. Flowspace is about much more than just the ports and VLANs 
     being controlled, so it makes sense to have FOAM / flowvisor manage that
     shared flowspace. That said, I want to give FOAM as much information as
     we can so that it can make choices like not giving slices port-based 
     flowspace for ports they don't own, etc. 
<JS> Yeah, I think this all makes sense at a high level; the question I want to
     get at is how that actually happens. The part that seems to me like it
     could vary, is the question how much FOAM and ProtoGENI know about each
     other, and how they communicate with each other.
     Maybe the ProtoGENI AM in a given rack has special permission to ask FOAM
     to allocate flowspace on behalf of an experimenter; this could be
     flowspace that PG has figured out that the experimenter wants, or it could
     be based on explicit flowspace requests from the experimenter, or some
     combination of those things.
     Or, maybe PG has a way to tell FOAM about flowspace that an experimenter
     is entitled to ask for, but doesn't actually allocate it until the
     experimenter asks FOAM for it.
     Or maybe it goes the other way, and PG doesn't tell FOAM anything when a
     user creates a sliver; but when an experimenter asks FOAM for some
     flowspace, FOAM asks PG to confirm that the user should be allowed to have it.
     The first or last of those make somewhat more sense to my intuitive
     expectations than the middle one, but if you've got a solid plan for the
     middle one, and it has some advantage over the others, it doesn't seem
     obviously crazy. :^) But it might be worth talking through some of the
     pros and cons some time... Or, if you and Nick have already had this
     conversation, and can summarize, that'd be cool.
<RR> > Or, maybe PG has a way to tell FOAM about flowspace that an experimenter
     > is entitled to ask for, but doesn't actually allocate it until the
     > experimenter asks FOAM for it.
     This is along the lines I was thinking; nothing happens automatically at FOAM
     on creation of a sliver at the ProtoGENI level, ProtoGENI just communicates
     the details of that sliver to FOAM in case FOAM wants to use it as input to 
     its management of flowspace.
<JS> Ja, so I think my question is more for Nick: Is FOAM going to have an
     interface for ProtoGENI to do that? It doesn't yet, and it sounds like the
     converse of something we'd been talking about before, in which FOAM would
     have a way to ask other AMs (or a clearinghouse, or a stitching service,
     or something) for information that it wanted to use to make decisions. 
     Pushing that information to FOAM seems different, and my intuition is that
     it's architecturally less ideal -- it seems much better to create a general
     way for any AM to ask another AM for information that it might use to 
     allocate resources, than to create a one-off way for PG to push specific 
     information to FOAM.
     (And to touch on something Aaron just said: What the ExoGENI guys are
     doing isn't really either of those things, but more like having ORCA
     actually allocate the experimenter's flowspace itself (by talking to
     FlowVisor or FOAM).)

Finally, one other OpenFlow-related question: If you have a bunch of resources in a rack -- say two bare-metal nodes and half a dozen OpenVZ containers -- and you want all of the traffic between all of them to be OpenFlow-controlled, can you do that? In particular, there's presumably some sort of virtual switch used by the OpenVZ containers; is it (or could it be) an OpenFlow-enabled Open vSwitch?

<RR> At this point, our plan is to force all traffic in this case out to the 
     physical switch, rather than try to run an openflow-enabled virtual switch
     on the hosts.
<CG> Does this work even if two VMs are using the same VLAN on the same
     physical interface on the host?  Sorry to ask what's probably a dumb
     question --- i just haven't personally figured out how to force the
     Linux network stack to do this, so wanted to double-check.
<RR> Yes, it does work; we have built a special virtual NIC driver just
     for this purpose. 
<JS> Sounds good.

Adam Slagell Questions

S.1. Aggregate Provider Agreement
s.1.a Updates Plan
S.1.a.1. ~~What pieces of the software stack do you take responsibility for?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> They take responsibility for all software and VMs, except the PL image updates
     will really be pushed out through the PL mechanism.

S.1.a.2. ~~How do you monitor for vulnerabilities and determine impact?~~ (aslagell email 2/21/2012 5:04 PM )

AS> They will follow the Emulab frequent update process.

S.1.a.3. ~~Do you plan to test patches or mitigations?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> Yes, at the Emulab InstaGENI rack first

S.1.a.4. ~~How do you push out patches and how fast?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> Most updates will be done by running scripts on the racks. An admin will have to login 
     to each rack to execute an update, they are not pushed out automatically. Turn-around 
     time is dependent upon criticality of the update.

S.1.b. Logging
S.1.b.1. ~~What logs do you keep?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> Logs are stored in a database on the control node. These include all allocation related 
     transactions. Local staff, the GMOC and insta-geni central will be able to retrieve these 
     logs through a web interface on the control node. It will be password protected.
     There will also be a public interface like PlanetFlow to give a reduced view of these logs, 
     mapping IPs to slices.

S.1.b.2. ~~How long do you keep them?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> It is up to the individual sites, but by default until they run out of room.

S.1.b.3. ~~How do you protect the integrity of logs? Do you send them to a remote server?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> There are no integrity tools like tripwire used, but logs are duplicated on the boss
     node, where users have no shell access. Also, the GMOC will be able to pull logs offsite, 
     as well as the CH in the future (which may be a push).

S.1.c. Administrative interfaces
S.1.c.1. Is there a separate, dedicated administrative interface?

<AS> All nodes have iLO remote admin cards to give console access. The control node iLO card 
     has a publicly routable IP, the rest can be accessed from the control node once connected 
     to the rack.
<AS> Is this on it's own admin backplane in the rack?  
<RR> Access to iLO on the experiment nodes is done through the 'top of rack' 2610 switch, 
     the same one used for outside connectivity. This will be accomplished through a separate 
     VLAN that is not routed to outside, and has as its only members the boss VM on the control
     node, the iLOs for the experiment nodes, and the IP interface of the 6600 switch. This is 
     very much like the setup used at every Emulab site. 
<AS> iLO can use passwords or SSH keys. THere will be different credentials for the local sites, 
     GMOC and instageni central.
     There is also a command line and web server interface to control the protogeni software on 
     the AM IP.

S.1.c.2. ~~Do the local sites have access to this?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> Local sites have access to the admin interfaces, but heir credentials only work for their rack.

S.1.c.3. ~~If so, do you use different credentials at different sites?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> Yes.

S.1.c.4. ~~What sort of authentication does an admin use?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> SSH keys or passwords

S.1.c.5. ~~Do you use root accounts. If so, can you audit escalation to root?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> They do not allow direct root login with SSH. Instead, admins must login as a user and sudo to act as root.

S.1.d. User credentials
S.1.d.1 ~~Are use credentials (e.g., private keys) stored anywhere, even temporarily~~ (aslagell email 2/21/2012 5:04 PM )

<AS> No

S.1.d.2 ~~If yes, then how are they protected and how long do they live?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> NA

S.1.e. Slice isolation
S.1.e.1. How are slices isolated?

<AS> VMs (or whole node allocation) and VLANs are the primary mechanisms. On the control node XEN 
     is used for virtualization. OpenVz (lightweight, more like BSD jails) is used for protogeni VMs.
     OpenVZ provides containers that only allow a VM to see the VLANs it is supposed to. PlanetLab nodes 
     will use VServers or LXC, which provides similar security properties to OpenVZ.

<AS> Can a bare metal allocated node sniff traffic for other slices?      
     Probably not if switches are configured properly.       
<RR> Since the "control network", ie. the Internet-facing VLAN on the 2610 switch, is shared by all 
     nodes, it is possible that one could use various tricks such as ARP poisoning to snoop on others' 
     traffic. This would require intentional, malicious action on the part of the snooper, as normal 
     switch behavior would prevent it. On the "data plane network" (ie. the 6600), traffic will be 
     separated by VLANs and no snooping will be possible. This is the same level of guarantee currently 
     given by all Emulab sites. 
<SS> One other question: how often do log messages or "watch dog" messages get sent out from 
     the boss node on each rack? If normal  log traffic stops, how long until someone 
     (instaGENI or GMOC) notices? (I'm assuming that it is hard to take control of boss or the
     2610 switch, but moderately simpler to DoS something (process, VMM, switch port) to make 
     it harder to log in/shut down. )
<RR> The stuff that we have set up with the GMOC right now (or, at least, a few years ago, I don't 
     know if they are still using it), is pull from their side. I don't know how often they poll. 
<RM> On the first point, the only connections to the experiment nodes that go through the 2610 
     are to the iLO ports...I believe that Rob suggested we could hook those up to the 6600 
    (we have the ports).  The negative is that people would have to log in to the boss node to 
     access the iLO cards on the experiment nodes...
<SS> Hmmm... I thought that people would have to log into the boss node (in a VM) to reach 
     the experiment node iLO cards -- they don't have a public Internet IP address. I think 
     it is better to present the smallest attack surface to the public Internet, even if the
     iLO cards are a fairly low risk.
<RM> Up to Rob, but as far as I am concerned, np
<RR> My preference is to have private iLO addresses, as I expect they will mostly be useful 
     for admins (both local and remote).
     The main use case for iLO console access for experimenters is when setting up or 
     debugging a new disk image. My intention is to give the rack at Utah public ILO address,
     and encourage people who are making new images to do so here. 
<AS> I thought user's could not login to the boss node, which I considered a good thing.
<RR> This is a point where we need to be more precise in our language - I made a pass over
     the document trying to make sure it's consistent, but maybe I missed something.
     The control node is the physical PC that hosts a few VMs
     The boss VM is one VM that runs on the control node, and this is where the database, 
     node boot services, etc. reside. Users will not have shells in this VM.
     The 'users' VM (might also be called 'ops' VM) is one on which users *will* be given 
     shells so that they can get to their files (it's also the fileserver for the rack) without 
     needing slices. This is the path through which users will get in if the site doesn't provide
     sufficient IP space for slivers to get their own public IP addresses. If we give users iLO
     access, it would be through this VM. 
<RM> The "users" VM is the equivalent of users.emulab.net, right?
<RR> Correct. (For others, you might also see it called 'ops' in some documents; this is a name 
     we've used to refer to it for a long time, but it's actually quite a confusing name, so
     we're trying to break this habit. 
<RM> I guess one good question is whether users should be able to manipulate
     the iLO cards of the experiment nodes, or whether we want to restrict that
     to the Boss VM.  Probably the latter...
<RR> Clearly, users should not have access to iLO on nodes that are being shared. My guess is that 
     the main use will be for exclusive nodes during image creation, since there are plenty of 
     things one could do with the kernel, etc. to break the network, and you want to be able to 
     at least see what's going on.

S.1.e.2 ~~Is there isolation in time for bare metal hosts, wiped OS'es?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> Yes

S.1.e.3. ~~Are there weak points in the isolation?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> OpenVZ, VServers and LXC provide reasonable security isolation but not as strict as true VMs.

S.2. LLR Agreement
S.2.a. Opt-in users
S.2.a.1 Can opt-in users be mapped to a slice? For example, if two experiments where acting as ISPs and there was an issue with a particular opt-in user connecting to GENI through a given rack, could you determine which slice it was associated with?

<AS> This was the question I was struggling to word before I left the call.
<AS> What if it is a PlanetLab slice with opt-in users?                    
<RR> Yes, as long as the reporting party knows the IP address on the rack that is
     associated with the offending opt-in-user. For all public IP addresses, we 
     will provide a mechanism that maps that address to a particular slice; in the 
     case of PlanetLab shared nodes, a port number will also be required, as many 
     slices share the same public IP address. So if, for example, an opt-in user 
     is using a proxy in the rack to attack someone, or hosting objectionable material
     on a CDN hosted in the rack, knowing the address(es) used by the proxy or the CDN 
     will be sufficient to map it to a slice. This is the same level of guarantee provided 
     by PlanetLab.

S.2.a.2. ~~Does your rack bridge to opt-in users like MNG?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> No. Experiments may have opt-in users do this as part of their experiment, but this is tangental to the GENI rack design.

S.2.b. Attribution
S.2.b.1. Can you determine which slices are running on a given component?

<AS> Yes
<AS> Is this complicated when you run PL nodes which are outsourcing AM functions?   
<RR> It is only slightly more complicated. Since both ProtoGENI and PlanetLab have 
     functions that make this information available, and ProtoGENI will know which
     nodes are currently acting as PlanetLab nodes, any requests regarding these 
     nodes made to the ProtoGENI AM will be redirected to the MyPLC instance that
     will be set up for all the racks.

S.2.b.2. Can you map IP and timestamp info uniquely to a slice and how quickly?

<AS> Yes
<AS> How quickly?                                                              
<RR> As quickly as it takes to run a database query.

S.2.b.3. Would a place replying a rack be able to give a clear set of IP addresses unique to the rack. This might be hard if it is bridging resources behind it with dynamic IPs like opt-in user WiMAX hand sets. (aslagell email 2/21/2012 5:04 PM )

<AS> Yes. And if they don't have enough IPs on site, the rack can NAT.

S.2.c. Clearinghouse policy
S.2.c.1 ~~Is there a mechanism to resource allocation information back to the CH for policy verification?~~ (aslagell email 2/21/2012 5:04 PM )

<AS> They plan to support it when they know what that interface will look like.

Nick Bastin Questions

(jbs - These questions have all been addressed, between a separate conversation between Nick and the InstaGENI team, the design review meeting, or follow-up e-mail.)

Design Document Quote: "Thus, the minimum number of routable IP addresses for the rack is expected to be three: one for the control node, one for the control node iLO card, and one for FOAM"

B.1. ~~How is FOAM provisioned/configured such that it needs an IP independent of the control node?~~ (jbs - the FOAM/FV VM on the control node will presumably have its own IP address)

B.2. ~~Why not use Open vSwitch with the OpenVZ environment to allow experimenters control over the network stack using openflow?~~ (jbs - Rob says that they're going to force traffic from the OpenVZ containers out to the switch, so experimenters can control it there)

NB Quote: Otherwise we locally re-create the short-circuit bridging problem that we have at some regionals, forcing experimenters to have containers on different nodes if they want control of the packets between them

B.3. ~~How many openflow instances do we think the ProCurve (E-Series) 6600 can support?~~ (jbs - Rob says "about a dozen", Nick says Rob got this info from him and it's probably conservative, depends on what the experimenters are doing)

B.4. ~~FOAM is not a controller, nor a hypervisor, and does not require 12GB of ram~~ (jbs - 12 GB of RAM is for the entire control node; the FOAM+FV VM should probably get 3 or 4 GB)

NB Quote: FlowVisor, which isn't mentioned in the document, requires significantly more RAM than FOAM, but in this environment (with at most 6 datapaths, but this document describes only one) the memory footprint should be no more than the default 1.5GB, so FOAM+FlowVisor should be perfectly happy with 2-4GB.

B.5. Why not use aggregation mode with a default controller to provide connectivity to slices which don't care about using openflow (thus allowing openflow experimenters the use of multiple VLAN tags, which is impossible in hybrid mode) (jbs - Rob says that he and Nick discussed the pros and cons, and concluded that hybrid mode was a better bet)

B.6. ~~I don't really understand anything in the "OpenFlow" section of the document~~ (jbs - Nick talked with the team and now understands the general idea)

B.6.a. ~~What is "external openflow provisioning"?~~ (jbs - unclear, but not important)

B.6.b. ~~Where is the scalability concern with a single controller? (jbs - unclear, but not important)~~

NB Quote: A controller has no requirement to be implemented as a single process, or even on a single machine - nothing prevents a controller from being a distributed high-availability service, which is an implementation option for several available controllers

B.6.c. ~~What single "controller" is being used in InstaGENI as referenced in this section?~~ (jbs - unclear, but not important)

B.6.d. ~~Does this mean that all racks will use the same hypervisor? Or that there will be one hypervisorper rack?~~ (jbs - each rack will have a FOAM instance)

B.7. ~~It is in fact possible to use an L2-based networking model for vServers with distinct interfaces for each container, it's just not simple (although it only needs to be done once).~~ (jbs - true, but not relevant -- they're not planning to do it)

B.8. Node Control and Imaging: "likely users of this capability will be able to create slivers that act as full PlanetLab nodes or OpenFlow controllers." Why does a user need an entire node for an OpenFlowcontroller? (jbs - unclear, but irrelevant)

B.9. ~~Containers are not Virtual Machines (VMs) - at best they are Virtual Environments (VEs). We should not confuse this terminology lest we fail to manage the expectations of the people using them.~~ (jbs - agreed)

B.10. ~~How are these 3x1Gb data plane connections configured? Etherchannel, link-aggregation, distinctchannels?~~ (jbs - Nick says Rob addressed this, "they're just 3 distinct 1gb interfaces")

B.11. ~~How is FOAM getting new certificate bundles?~~ (jbs - when there's a new cert, the FOAM admin for each rack, whoever that is, will install it; or someone could automate this; and this should happen only infrequently, and isn't hard in any case)

B.12. ~~Is each rack running its' own clearinghouse which mints user certificates?~~ (jbs - no, as covered by Design Review Question 1)

Download in other formats:

Plain Text