Changes between Initial Version and Version 1 of GENIRacksHome/InstageniOpenQuestions


Ignore:
Timestamp:
03/08/12 13:03:56 (12 years ago)
Author:
lnevers@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIRacksHome/InstageniOpenQuestions

    v1 v1  
     1[[PageOutline]]
     2
     3This page captures InstaGENI Design Questions for the InstaGENI Design.  Also captured are questions sent to instageni-design@geni.net; discussions are captured to track resolution. Questions are crossed out when answered. Person closing a question should adding his/her name to the question with a brief explanation. Any document that is referred to with a URL is attached to the page for historical reference. Following are notes attributions:
     4{{{
     5AB: Andy Bavier         CG: Chaos Golubitsky    LN: Luisa Nevers     RM: Rick McGeer           
     6AH: Aaron Helsinger     HD: Heidi Dempsey       MB: Mark Berman      RR: Robert Ricci         
     7AS: Adam Slagell        JS: Josh Smift          NR: Niky Riga        SS: Steve Schawb
     8}}}
     9
     10= GPO InstaGENI Questions =
     11== Questions Submitted for Design Review ==
     12
     13'''Design review GPO question 1''':
     14
     15We understand you intend racks are full PG sites running an SA that is fully trusted.  We do not want that SA to be fully trusted (beyond the rack). We don't want that SA to be a GENI SA.  Does your model require that?   Put another way, GENI mesoscale aggregates currently trust pgeni.gpolab slices and users and Utah emulab and PLC, which is what we expect the InstaGENI racks to do.  The racks would not use the full PG bundle or IG rack certificates would not be in the PG bundle.
     16{{{
     17<AH> Rob: Federation decides if these are trusted SAs. Initial plan is to federate these with PG,
     18     and they will switch to the GENI CH. Consequences of not including: user certs/slice creds
     19     from the rack are not trusted elsewhere. We probably want a separate bundle of AM root certs,
     20     so SSL conn to AM can be verified.
     21<HD> Racks won't be part of PG bundle unless they have federated with clearinghouse.  May need to
     22     distribute a separate bundle of root certs so that when you make a connection to the SA you
     23     can verify that it was part of that bundle on just AM certs.  That would be server certs instead
     24     of CA certs.  First steps will be to use pgeni.gplab for initial rack deployments.  Standard
     25     install is no access for local SAs.
     26}}}
     27
     28'''Design Review GPO question 2''':
     29
     30Control plane switch paths.  The drawings are somewhat confusing, so we may have this wrong, but it seems that the control plane for ProtoGENI and FOAM is going through the HP6600 !OpenFlow controlled switch.  We expected the control plane would be separated from the 6600, so that if user experimenters cause problems with that switch, the control plane will still be fine.  Is there a reason to use the 6600 this way, which seems like a higher-risk approach?
     31
     32{{{
     33<AH>  Rob: stuff is confusing. Control plane traffic is supposed to go through
     34      the smaller switch
     35      Rick: I'll take responsibility for drawings. Data plane is through the
     36      big switch. Control plane and ILO are through the 'baby' switch.
     37      Rob: same switch does access to outside and control communications -
     38      sometimes in separate VLAN. Like ILO stuff. PL nodes use top of rack
     39      switch to public internet to contact MyPLC.
     40      Rob: frisbee stuff is on the smaller switch
     41<HD> All control plane traffic goes through top of rack switch (small).  Dataplane
     42     goes through 6600.  ILO also through baby switch.  What about PL control
     43     plane?  Rob says emulab switch is giving you access to outside and control
     44     communications, so PG will send PL control out through that as well. 
     45     Frisbee etc. on smaller switch.
     46}}}
     47
     48'''Design Review GPO question 3''':
     49
     50We believe that having !PlanetLab nodes in an InstaGENI rack is optional and should not be the default configuration.  Sites that want PlanetLab nodes should have a way of being able to add them to the default configuration, which probably requires some engineering/operations coordination.  Does this fit your model too? This model implies that there are two parallel Aggregate Managers running on a default InstaGENI rack--a ProtoGENI AM and a FOAM AM. 
     51{{{
     52<AH> Andy: yes, exactly. There is an IG-PLC lke current PLC, just admin separately.
     53     When site admin wants one, register it with IG-PLC, download ISO image (like
     54     public PL), and brings that up as boot image, and PL SW stack takes over. It's
     55     optional.
     56     Rick: IG-PLC running in Princeton (not on rack), PL node looks like just
     57     another experiment node.
     58     Rob: You create a slice that asks for a particular OS to boot this.
     59     We'll restrict the OS to certain slices, so we don't have people unintentionally
     60     bringing it up
     61     Rob: and work with credentials to make the slice not expire. Probably a global
     62     IG-PL slice with a very long expiration
     63     Rob: OpenVZ are different mechanism. 2 admin options for shared nodes:
     64     (1) OpenVZ, (2) PL. Having both options and ability to go back/forth lets us see
     65     which get the most use/meet experimenter needs, an adjust allocations over time.
     66     Rob: if local admins want control on when PL comes up, they can have it
     67<HD> Our assumptions are good.  InstaGENI PlanetLab central is setup and administered
     68     separately from public planetlab.  When an admin at a site wants to bring up one
     69     of the IG nodes as a PL node, he registers with PL, downloads the image and the
     70     PL stack takes over and installs the node.  IG PLC will run off the rack (eg at Princeton.) 
     71     For a PL node, one creates a slice that has a particular OS.  Will restrict that OS to certain
     72     slices, so people don't accidentally bring up PL nodes.  Also fiddle with expiration mechanisms. 
     73     Have a global IG PL slice with a far-way expiration date.  Admins have two options to do these
     74     shared nodes.  One is PG node, one is PL node and anyone can request slivers on them, can go
     75     back and forth between them.  Local admins can have that control if they want it.
     76}}}
     77
     78'''Design Review GPO question 4''':
     79
     80Will stitching code that will be supported for InstaGENI racks be the same as what is planned for ProtoGENI?  If not, what are the differences?
     81{{{
     82<AH> Rob: same as PG Utah, like what they demo'ed to connect to KY.
     83<HD> Stitching code is same as in PG.  Rob says the same stuff that PG demoed for making slices to Kentucky.
     84<LN> Do we have an answer about the specifics for this? Did we get details about the individual examples (delay node, initscripts)?
     85<AH> To my knowledge we did not get answer to what is different on these racks from general ProtoGENI.
     86     For stitching in particular, Rob said it is the same as they currently have. So no answer to Q24 yet.
     87<LN> Moving delay node question to list to be posted on InstaGENI mail list.
     88}}}
     89
     90'''Design Review GPO question 5''':
     91
     92The proposed plan for making sliver metadata available to GMOC seems reasonable.  How are you planning to make time-series data on the health of components available?  Are there other significant questions from the GMOC team?
     93{{{
     94<AH> Rob: Give GMOC SNMP access for some info
     95     Rob: emails go to admins on things: should go to local admins and IG staff
     96     JP: We want up/down status of slice, sliver, node, AM - as well as timeseries data
     97     Rob: Up/down stuff is in DB & can have an interface for GMOC to query. Port counters
     98     - maybe handle it same as now (select few IPs to do query-only SNMP state of switches)
     99     JP: Good start. Then need to understand topology
     100     Rob: For experimental plane, the Ad RSpec is best source of that data.
     101     Currently allocated VLANs are in the DB on the node and go in the slice-specific manifests.
     102     Should make all manifests available to the GMOC.
     103     JP: We should talk more about that, formats, etc.
     104     Rob: I'll send you example manifests. etc
     105     JP: Emergency stop looks good. Didn't expect ILO, don't know how often I'll need it
     106     Rob: We tested shutdown once, know it works
     107     JP: Will each rack have a separate contact?
     108     Rob: site surveys have contacts?
     109     Rick: both technical & admin
     110     Joe: backups can be added
     111     JP: General support center contacts?
     112     Heidi: Notify sites about the SNMP read requirement (heads up)
     113<HD> What is timeseries data?  Port packet counts?  CPU utilization?  Nodes available? 
     114     Thinks that were listed on the requirements wiki page.  PG already maintains lots of info
     115     about nodes up/free, how many VMs etc.  For other PG nodes, they previously just gave
     116     GMOC SNMP read access.  Currently monitor things like fail to boot and send emails to
     117     admin--probably both local admins and InstaGENI staff and GMOC in the IG case.  JP
     118     wants up/down status of sice/sliver, node, aggregate, as well as time series data we listed. 
     119     Port counters is probably the right way to go for select IP addresses.  JP mentions that
     120     there will have to be some understanding of topology (advert RSPEC) to make the data
     121     relatable to experimetns/PIs/slices.  Currenty alloc VLANs go only in manifest, not advertisement. 
     122     Need to make all manifests avail to GMOC.  JP and Rob will have to start comparing formats.  Look
     123     at manifest format for RSPEC.  Rob will send examples.  JP says emergency stop looks good and ILO
     124     access is good.  PG has tested that interface already.  Will there be individual contacts for each
     125     rack?  Yes.  The contact info is included on site surveys.  Rick will update form to reflect SNMP access. 
     126     May need to contact some of the other sites.  Prepare questionnaire to match acceptance tests? 
     127     That is Rick's intention. 
     128<CG>  * InstaGENI will plan to allow GMOC SNMP access to rack network devices
     129      * InstaGENI will notify campuses that this access will be happening
     130      * InstaGENI will work with GMOC for collection of other operationally interesting data from rack boss nodes
     131      * InstaGENI will maintain a service which can be used to track a public IP address of interest to a slice,
     132        for use in investigating  problematic network behavior.
     133}}}
     134-----
     135'''Design Review GPO question 6''':
     136
     137How do experimenters get access to their resources in the limited IP case where you need NAT (since experimenters don't have ops accounts)? Will outgoing connections work?  Will they be able to run a web server?
     138{{{
     139<AH> Rob: In case of no NAT (lots of IPs) - physical nodes and OpenVZ containers get public IPs.
     140     But NAT.... Run an SSH daemon on ops VM: we create account with their public key on that
     141     ops machine. Manifest field for login will say ssh 1st to this host & then to your sliver.
     142     If limited IPs available, we'll allocate them first come first served for now - could
     143     eventually make it an allocatable resource.
     144     Rob: The ability to log in to ops using remote account is something we're implementing now.
     145     Rob: This is also why we want to run PL: Cause it does a good job with sharing IPs. Still
     146     only 1 port 80, yes. But PL lets users bind own ports. Assuming many sites run PL, then if you
     147     are doing NAT and want a public server then use PL.
     148     Rob: Could do similar in OpenVZ, but have no plans to do so. Hence need for PL.
     149     Rick: most 1st set campuses have plenty of IPs.
     150     Rick: We have lot more interest then 32 campuses. So we could demand /8 if we need to.
     151<HD> When no NAT required, we give physical nodes and openVZ containers publically-routable
     152     IP addresses.  If we have to do NAT, run an SSH demon on the VM that is Ops in the plan.
     153     When someone gets a sliver, we create an account with their public key on that ops machine.
     154     The manifest field that you get back after you've created the sliver, it is sufficiently
     155     expressive to tell you how to SSH to a host and then SSH to your individual sliver.  If
     156     there are a limited number of IP addresses available, alloc on a first-come, first-serve
     157     basis to the VMs that are on the rack.  New slivers will have to go through NAT after the
     158     limited IP adresses are used up.  Yes, this is different than how PG works now.  PL is
     159     better for sharing the single public IP address, so if lots of people want to do that,
     160     it makes more sense to run a PL node.  Will need to direct experimenters who want public
     161     addresses to go to racks with PL nodes or racks that have enough addresses and don't need NAT.
     162     Rick doesn't think address limits are going to be a big issue for the places where they want to deploy racks.
     163}}}
     164
     165== Issues to track/follow up ==
     166
     167These items are not questions, but should be tracked to verify delivery or resolution:
     168
     169I.1. Need updated drawing for topology diagram.
     170
     171I.2. In the limited IP case when you need to NAT, how do remote users get access (since they don't have an Ops account)? [[BR]]
     172      - Will outgoing connections work? 
     173{{{
     174<HD> This is currently under development and somewhat limited, done on a first-come
     175     first-serve basis, need to track resolution (not part of questions)
     176}}}
     177
     178I.3. (Req C.3.c)  What is required to run Ilo? 
     179I.3.a. What types of clients are supported for Ilo?
     180I.3.b. Is Ilo connected to management switch?
     181{{{
     182<HD> Need followup on pdu and need to read available documentation.
     183}}}
     184
     185I.3. There is no remote PDU access, so there is no provision for reboot. What about non-ilo?
     186{{{
     187<HD>  There is a terminal interface that we will figure out how to connect,
     188      need to follow up on software and wiring solution.
     189}}}
     190
     191== Misc. questions answered in design meeting  ==
     192
     193M.1. In the limited IP case when you need to NAT, how do remote users get access (since they don't have an Ops account)? [[BR]]
     194     - Will they be able to run a webserver, EG?
     195{{{
     196<HPD> No, use PL NODE in rack.
     197}}}
     198
     199M.2 Confirm we get full (prototype) GENI stitching, as at PG-Utah.:
     200  - Can reserve/provision a VLAN to VMs and bare metal nodes, both allocated from static pool and dynamically negotiated.
     201{{{
     202<HD> Yes in demo mode, closed.
     203}}}
     204   - Will full GENI stitching work? Will IG racks fully support dynamic VLAN requests, as with the stitching extension?
     205     Supplying a VLAN tag and requesting translation to connect it to slice resources on the rack? (Req C.2.e)
     206{{{
     207<HP> Define "fully support dynamic" follow up with Aaron.
     208<LN> What does "fully  support dynamic VLAN" mean?   Is it the same as dynamic
     209     vlan support or did you have something else in mind?
     210<AH> I think we heard 'Yes' - same as the existing PG-Utah stitching functionality.
     211}}}
     212
     213M.3 Explain the support story between Utah and Princeton sites
     214{{{
     215<HD> In the design review Andy Bavier said that they will run one MyPLC to support all InstaGENI racks,
     216     they did not specify the location - Need followup on the security and policy implications.
     217<HD> Andy said for PL upgrades he alleged that the normal PL upgrade would work for InstaGENI rack. they post an
     218     image you can upgraded whenever using existing tools over the network.
     219<HD> Utah, (PG) said they were going to support all key distribution and packaging for PG software, also when
     220     upgrades take place, they would take over for each rack.
     221<HD> We do not know the upgrade path for OpenFlow.  We need to follow up here to confirm with Nick (not at meeting).
     222<HD> Bottom line: there was a story, but not documented. We need to get documentation and verify.
     223}}}
     224 
     225M.4.  Does the OS support the ability to modify the kernel (EG add new modules)?
     226{{{
     227<HD>  Can't do kernel mods in either PG or PL containers.  PG has non-production level support for using XEN as
     228      a virtualization technology.  The bare metal node is still a PG node if they use it for the kernel mod resource.
     229}}}
     230
     231M.5. (Req F.1) Please expand the required IP address requirements to detail more than minimum required.
     232{{{
     233<HD>  IP address required are 256 addresses, can work with less. For site that cannot provide 256
     234      address they can provide limited IP solution or NAT solution.
     235}}}
     236
     237M.6. The !PlanetLab Integration section states: ''However it is possible to base the Linux environment offered to slices on different Linux distributions -- for example, a single  node could support slices running several versions of Fedora, Ubuntu, and Debian Linux.''  Are the OSes listed run simultaneously?
     238{{{
     239<HD> Multiple versions of the same OS (ex, ubuntu) can be run, but the there is no mixing
     240     of OS types (ex, ubuntu and centos).
     241}}}
     242
     243M.7. At previous GECs, InstaGENI was advertised as a rack that can have hardware added as needed, but this requires coordination to ensure that a given set of ProtoGENI images in the rack is supported on the new hardware.  This should be made clear to site administrators before they attempt to purchase hardware for the rack, and it should be made clear that site administrators will lose some level of support from the InstaGENI team if they add arbitrary hardware into the rack.
     244{{{
     245<HD> Rob said that if a site installs the same image or same set of existing hardware,
     246     than can it can be added.  But, if it is different, than it is not supported and
     247     sites are on their on.
     248<HD> Need to generate question got allowable and supported combinations
     249}}}
     250== GPO Design Questions ==
     251
     252G.1. What PG functionality is not here or is different in the InstaGENI racks?
     253{{{
     254<RR> We are not planning to leave any ProtoGENI functionality off of the racks.
     255     I expect there will be a smaller set of officially supported disk images
     256     (two Linux, one FreeBSD). And, of course, we are going to be more conservative
     257     about updating the software on the racks than we are about the Utah ProtoGENI
     258     site, so they will be a bit behind in terms of features.
     259     The issue about Linux distributions on shared nodes that I mentioned is actually
     260     one of Emulab support vs. ProtoGENI support; it's a feature that Emulab supports,
     261     but we haven't yet plumbed that support through to the ProtoGENI/GENI APIs.
     262     We expect to do so.
     263}}}
     264
     265G.1.a. Does InstaGENI intend to support delay nodes? Initscripts?
     266{{{
     267<RR> We will definitely support initscripts. We will also support
     268     delay nodes, though I suspect they are not likely to see much
     269     use on the racks.
     270}}}
     271
     272G.2 (Req D.8) Does each rack provide a static (always-available) interface on each GENI mesoscale VLAN to be used for reachability testing of the liveness of the meso-scale connection to the rack?
     273{{{
     274<RR> We had not planned to do so; allowing VLANs on trunks when they
     275     are not actually in use can result in stray traffic - for example,
     276     flooding during learning, spanning tree, and other switch protocols.
     277     For this reason, we dynamically add and remove VLANs from trunked
     278     ports (which is how we expect to get to the meso-scale VLANs in
     279     most cases) as needed.
     280     However, if the GPO believes this requirement overrides concerns
     281     about unnecessary traffic, we can meet it by having some 'permanent'
     282     slivers on a shared node.
     283}}}
     284
     285G.3. Who gets access to the fileserver VM on each rack? All federated users? Or only local users? Does this mean all users of a site with limited connectivity must be local users? (See Q41)
     286{{{
     287<RR> All users of the federation who have created a sliver on the rack.
     288     Our plan is that when any federated user creates a sliver, they will
     289     be given a shell on the fileserver VM even if they are not local users.
     290     (These accounts will probably have usernames that are not taken from
     291     the URN like 'local' user accounts.)
     292}}}
     293G.4. The design states that users will be able to log in to the ops node; how will that work? Current ProtoGENI instances don't do that for non-local users, e.g. if I create a sliver at UKY with my GPO user credential, I don't get a login on the UKY ops system.
     294{{{
     295<RR> This and G.3 have the same answer as the 'ops' VM is the fileserver VM.
     296}}}
     297G.5. Document how the VLAN translation will work on the switches.
     298{{{
     299<RR> Joe and I will update the document on this point soon.
     300}}}
     301
     302G.6. (Req C.2.b) Document the mechanism for connecting a single VLAN from a network external to the rack to multiple VMs in the rack?
     303{{{
     304<RR> I will update the document on this point soon.
     305}}}
     306G.7. (Req C.2.c) Document the mechanism for connecting a single VLAN from a network external to the rack to multiple VMs and a bare metal compute resource in the rack simultaneously?
     307{{{
     308<RR> The answer for this will be the same as for G.6, as we support the
     309     same mechanisms (either untagged ports or 802.1q tagging) for both
     310     VMs and bare metal hosts.
     311}}}
     312G.8. (Req C.2.e) Document the following:
     313
     314G.8.a. Does InstaGENI support AM API options to allow both static (pre-defined VLAN number) and dynamic (negotiated VLAN number e.g. interface to DYNES, enhanced SHERPA) configuration options for making GENI layer 2 connections? 
     315{{{
     316<RR> I'm not sure I fully understand this point: the GENI AM API does
     317     not yet have any stitching support in it. We do have such support
     318     in the ProtoGENI API, and have been working with Tom Lehman and
     319     Aaron Helsinger regarding getting it into the GENI AM API. 
     320}}}
     321G.8.b. Will InstaGENI racks fully support dynamic VLAN requests, as with the stitching extension?
     322{{{
     323<RR> This will vary from site to site, depending on what connectivity
     324     the site provides. On the ProtoGENI side of things, yes, we will
     325     support fully dynamic VLAN requests. We do expect, however, that
     326     some sites may only provide us with a static set of VLANs across
     327     campus to some backbone connection point; this is, for example,
     328     the situation with the current setup at Kentucky.
     329<AH> Will the IG racks support the stitching extension published by Tom
     330     Lehman, including requesting and being granted a VLAN tag (whether
     331     dynamically allocated or allocated from a static pool), and also
     332     including requesting a specific VLAN tag (a request which may succeed or
     333     fail)?
     334     If the VLAN is from a static pool of pre-allocated VLANs: are VLANs
     335     always for a single slice? Or will the racks have provision for
     336     requesting a shared VLAN (as in the mesoscale VLANs)?
     337}}}
     338G.8.c. Supplying a VLAN tag and requesting translation to connect it to slice resources on the rack?
     339{{{
     340<RR> The switches in our racks do not support VLAN translation, so if
     341     this question is  primarily about VLAN translation, the answer is
     342     no. However, it's certainly possible to connect external equipment
     343     on an untagged port that happens to be in a different VLAN on our
     344     rack switch and the other side.
     345}}}
     346G.9 (Req B.2.a) Have the GENI API Compliance/Acceptance tests been run on the InstaGENI rack?
     347{{{
     348<RR> Since it's the ProtoGENI AM, yes. As far as I know, they fully pass,
     349     though Tom would know for sure.
     350<AH> Well, the tests are distributed in the Omni/GCF release bundle exactly
     351     so that you can run them yourself. Last I heard there were some issues -
     352     RSpecs not passing rspeclint.
     353<TM> There are a couple of minor issues. I just pinged Jon D. on the protogeni-dev
     354     list about an outstanding issue with the stitching schema. It's some minor
     355     capitalization mismatches and a missing tag. Jon's very handy rspeclint tool
     356     complains.
     357     There's also a minor issue that ProtoGENI will accept a manifest RSpec as
     358     a request. I don't have the details on that one handy, but I can dig them up
     359     if need be. Sarah reports that this is a known issue, so maybe it has been
     360     reported previously.
     361     The other issues with respect to the AM API compliance tests have been resolved
     362     as far as I know.
     363}}}
     364
     365G.10. Document the OS Version supported and any known OS restriction in an InstaGENI rack for bare-metal and VM systems.
     366{{{
     367<RR> Okay, we will document this. My current plan is to provide default
     368     OS images for Ubuntu, CentOS, and FreeBSD. If you have any feedback
     369     from experimenters on which Linux distributions they would prefer,
     370     that would be helpful.
     371<MB> I don't claim to have made a proper study, but the most popular
     372     requests appear to be Ubuntu and Fedora.
     373<RR> We've supported Fedora in the past, and still do to some extent.
     374     The problem with Fedora is that the half-life of Fedora releases is
     375     very short: we find that the package repositories for the old release go
     376     away very quickly once new releases are made, which happens every 6 months.
     377     This also means that security updates stop. It requires a lot of manpower to
     378     keep up with the frequent Fedora releases (which are quite happy to change
     379     very large subsystems every 6 months). Experimenters, too, are likely to have
     380     to do significant work to get their code to work with either the latest version
     381     of Fedora, or whatever slightly older version we support. It's also bad news for
     382     shared nodes, which we'd like to update as rarely as we can. Ubuntu has 'long term
     383     stable' releases, and CentOS also works around a longer release cycle.
     384<MB> Completely understood, but you asked what people are requesting. ;-)
     385     I expect relatively few experimenters really _need_ Fedora, but that doesn't
     386     stop them from asking.
     387<AS> I have a similar issue with Fedora. It quickly becomes out of date and insecure.
     388<JS> Would the people who request Fedora in fact be happy with the latest CentOS?
     389     If they just mean "something Red Hat like", and think that means "Fedora", they might.
     390     Conversely, will the Ubuntu people be happy with the latest LTS version,
     391     or will they want something more cutting edge?
     392     My guess is that these two go together somewhat: If people need the latest
     393     Fedora, then they may want the latest (non-LTS) Ubuntu too; whereas if
     394     they're happy with Ubuntu LTS, they'd probably also be happy with CentOS.
     395     My two cents is that we should start with CentOS 6 and Ubuntu 12.04, and
     396     go from there. (Among other things, Ubuntu 12.04 will be the latest and
     397     greatest until October anyway. :^)
     398<NR> Is there any plan to support any Microsoft OS images, either in the bare-metal
     399     nodes or in the VMs?
     400<RR> This has been covered before: we are not planning to provide any Windows
     401     images ourselves. Sites that have a site license for Windows and would like
     402     to make their own images are free to do so
     403<NR> Just to clarify you are not planning to provide any Windows OS image for the
     404     bare metal nodes, but sites that want to provide one for their rack are free
     405     to do so.
     406<RR> Correct.
     407<NR> Would you support them in making the ProtoGENI image, or they are
     408     on their own there as well?
     409<RR> We have this process documented on the Emulab website; it's a long one,
     410     but a couple of other sites have done it. (We can't give them our images for
     411     licensing reasons, even if they have have a legitimate Windows license.)
     412}}}
     413G.11. Document Custom Image support . Detailing usage, updates and distribution.
     414{{{
     415<RR> I will update our design document soon with the information that I shared on the phone.
     416}}}
     417
     418G.12. Does InstaGENI support software updates of experimental images? If supported, document the mechanisms.
     419{{{
     420<RR> For the images that we provide, updates will be provided after major
     421     releases to the distribution, though the time to do so may vary depending
     422     on how large the changes in the distribution are. (For example, major kernel
     423     updates and changes to the initialization system can take a while to support.)
     424     If security fixes for 'remote root' exploits in software packages that are enabled
     425     by default in our images are released, we will provide updated images in a timely
     426     fashion, made available through the mechanisms documented for G.11 .
     427}}}
     428G.13. Regarding hardware component's "no warranty", what does this mean for dead on arrival hardware?
     429{{{
     430<RR> This is one for someone from HP.
     431<RM> Dead-on-arrival is handled via the same process as a support call.  i.e. call a
     432     toll-free number (in the US) with the serial number and product number, and help
     433     the agent root cause to the minimal set of field replaceable units.  Functional
     434     FRUs should arrive within a day or two and the non-functional FRUs will need to
     435     be returned in the packaging that the new unit arrived in.
     436     He confirmed this is the case even under the specific program we're using
     437     to buy the equipment
     438}}}
     439G.14. Are there plans for Firewall/IP-tables on control node?
     440{{{
     441<RR> We plan to run a firewall in dom0 on the control node to protect the control VMs from the <<<Incompleted in email??>>>>
     442}}}
     443G.15 Are monitoring and software updates taking place via the Data plane?
     444{{{
     445<RR> No, these take place on the Control Plane. (Of course, if an experimenter wishes
     446     to build a network on the Data Plane for monitoring traffic, they are free to do so.)
     447}}}
     448G.16. There were inconsistencies in the document about the NICs on the experimental nodes. What is the number of NICs on experimental nodes?
     449{{{
     450<RR> To avoid further possible inconsistencies, I will let someone from HP answer this.
     451}}}
     452G.17. Does the current PDU support the addition of more devices?
     453{{{
     454<RR> I'll also let HP field this one.
     455<RM>   Ss configured with the 252663-B24 PDU, 5 compute nodes, 1 each control node,
     456       internal switch and external switch:
     457        a. power - this is a 3.6kVA unit. there should be a bit of power to spare.  so yes.
     458        b. outlets - 14 IEC 320 C-13 outlets.  so yes.
     459}}}
     460G.18. (Req C.1.a) Scaling to 100 compute resources in not explicitly stated. Can InstaGENI support operations for at least 100 simultaneously used virtual compute resources? 
     461{{{
     462<RR> Can we create 100 simultaneous slivers? Certainly. But I'm not sure this
     463     is useful way of looking at the issue; since Vserers, OpenVZ, and LXC all
     464     use the Linux kernel's scheduler, which is work conserving, this question
     465     is more or less the same as asking "can you run 100 processes on a Linux machine.
     466     " The answer of course is yes, but whether you can *usefully* run this many
     467     processes is dependent entirely on the needs of those processes. (The control
     468     software we are using scales well past 100 slivers, of course.)
     469     We've picked Linux container based slicing for the nodes precisely because it has
     470     the best scaling properties; overheads for each sliver are far lower than they
     471     would be for, say, VMware or Xen. We believe that we will get the best possible
     472     scaling from PlanetLab; I'll let someone from Princeton say more, but I believe
     473     that running more than 100 slices on a single node is by no means uncommon for them.
     474<NB> I'll throw in my unsolicited $0.02 here - my experience with OpenVZ versus Xen is that
     475     a PV guest is only 2-3% more expensive (in processor) than a container.  Both of them
     476     kill your disk, and this is usually where you get into trouble first (long before memory or CPU) -
     477     running 100 workloads on 1 disk is simply impossible, but if you have 100 slivers and 98
     478     of them are mostly idle, it'll work regardless of your solution.  Of course, the upside of
     479     Xen is that you also have the option of running HVM guests, something which is impossible in
     480     containers (even some PV-suitable guests, like mininet, can't run in a container).
     481     I'm not trying to argue against the use of containers - I think they have a time and a place,
     482     and this very well may be one of them, but we need to evaluate experimenter use cases
     483     (mostly, do they need windows, solaris, custom linux kernels, etc.).  More importantly, we should
     484     realize that 100 active slivers on 5 disk spindles is going to be a potentially significant
     485     performance problem depending on your definition of "active".
     486<AB> On PlanetLab nodes, we frequently see on the order of 100 "live" slivers (that occasionally
     487     wake up and do something) and 10-20 "active" slivers (with a currently running process). 
     488     The typical PlanetLab node is much less powerful than an InstaGENI experiment node, so I
     489     expect that an IG-PL node will be able to support significantly more "live" and "active" slivers.
     490     Of course, as Rob points out, exactly how many depends on the workloads they are running.
     491<AB> I agree that we should evaluate experimental use cases and try to provide the tools that
     492     experimenters need.  However, in the first part of your note you seem to be saying that
     493     there are no significant tradeoffs between VMs and containers.  Of course we've written some
     494     papers about this, but since this is the InstaGENI design list, I thought your note deserved
     495     a reply.  Hope it doesn't sound too defensive.
     496     Virtualizing resources is already one of the main jobs of the OS, and so the philosophy behind
     497     the container approach is that a simple and efficient way to provide better virtualization is
     498     to leverage and extend existing OS mechanisms.  Container kernels are simpler than hypervisiors
     499     since there are fewer layers in the stack, and so less overheads and fewer opportunities for
     500     surprising cross-layer interaction (e.g., resource schedulers at different layers working at
     501     cross-purposes).  Some examples of ways that VServers are more *efficient* then Xen are:
     502       * Xen reserves physical memory for each VM, potentially leading to underutilization. 
     503         VServers share memory best-effort and so can pack more containers into the same memory footprint.
     504       * Each Xen image has its own separate file system, with potentially multiple copies of the
     505         same file (e.g., libraries).  VServers use CoW hard links to maintain a single copy of shared
     506         files, conserving disk space and leading to better file caching.
     507       * Xen network traffic flows through dom0, introducing overhead in network I/O relative to VServers.
     508    I'm not saying that these limitations are fundamental to Xen, and definitely work has been done
     509    to address each of these issues.  My point is that containers give you certain efficiency benefits
     510    almost for free, whereas you need to do something clever to get the same efficiency in Xen (like
     511    virtualize the NIC in hardware and map the virtual queues directly to Xen VMs).  Like everywhere
     512    in systems, there is a tradeoff: to get the same efficiency in Xen, you need to add complexity.
     513    The reason that efficiency has been important to PlanetLab is that more efficient = cheaper.  I
     514    think that PlanetLab was extraordinarily cheap given the utility that the research community has
     515    got from it.  In my opinion, one of the main reasons that PlanetLab succeeded was our best-effort
     516    model of resource sharing -- even though we have taken a lot of flack for it over the years.
     517    To sum up: if experimenters need to run their own OS or just find VMs easier to use, then VMs are
     518    the right tool for the job.  But in cases where both will do equally well from the experimenter's
     519    standpoint, I think containers have some clear benefits over VMs. 
     520<NB> Not that we want to get into some long discussion of containers vs. VMs, but I think this is
     521     generally worth talking about as what services we'd like to provide to experimenters.
     522
     523     > * Xen reserves physical memory for each VM, potentially leading to underutilization. 
     524     >   VServers share memory best-effort and so can pack more containers into the same memory footprint.
     525     Memory over-commit in Xen addresses this issue (and VMWare has had similar functionality for much longer).
     526
     527     > * Each Xen image has its own separate file system, with potentially multiple
     528     >   copies of the same file (e.g., libraries).  VServers use CoW hard links to maintain
     529     > a single copy of shared files, conserving disk space and leading to better file caching.
     530     Xen offers this functionality via thin provisioning and fast cloning with COW semantics
     531     for multiple VMs running from the same base image. Obviously multiple VMs don't *have* to run
     532     from the same base, but you could imagine encouraging experimenters to use a common image unless
     533     they had a compelling reason to do otherwise.
     534
     535     > * Xen network traffic flows through dom0, introducing overhead in network I/O relative to VServers.
     536     This is why I was emphasizing the use of PV kernels, not HVM.  Even vservers are using
     537     a software bridge as their first stop for packets (at least if they're properly isolated),
     538     and properly isolating a container from a network perspective is actually pretty hard (LXC
     539    makes this easier, both Vservers and OpenVZ are more complex, but even LXC has some breakdowns).
     540
     541     > To sum up: if experimenters need to run their own OS or just find VMs easier to use,
     542     > then VMs are the right tool for the job.  But in cases where both will do
     543     > equally well from the experimenter's standpoint, I think containers have some clear benefits over VMs.
     544     I am less convinced of "clear benefits", but I'll concede the point  for purposes of
     545     argument that of course there are some benefits to any given solution over another. 
     546     However, my main point was that I take issue with the implication that the mere choice
     547     of a given technology gives you scalability benefits that cannot be achieved via other
     548     methods.  I think it's also fair to point out that containers have a higher administrative
     549     burden on software updates than VMs do - if an experimenter wants to use Fedora 16 in a
     550     VM that is a trivial operation (even in PV), but if they're in a container they have to
     551     hope that the admin has been on top of OS updates for any of the OSes they care to use. 
     552     Also, while providing multiple distributions in a single containered solution is possible,
     553     there are significant drawbacks and administrative headaches that we should be clear about.
     554     Because you're running a single kernel, your options for distribution runtimes are
     555     restricted to those that will function with the specific kernel you have in place
     556    (an issue that is going to be particularly interesting in the medium term as distributions
     557     make differing choices about Linux 3.0 kernel support, but also relevant in the long term
     558     given the tendency of various distributions to make custom kernel modifications that are
     559     incompatible in their runtime environments).
     560     I want to make it clear that I have no problem with containers, but I want to beat back a
     561     bit of the "containers are fast and VMs are slow" that seems to be going around, as this
     562     is simply not the case in a modern environment.  I also think that containers have a higher
     563     continuing administrative burden (presuming we are diligent in keeping up with distribution
     564     updates and core kernel features), while VMs have a higher cost of entry on setting a system
     565     up, but have less of an ongoing admin burden, and I would gladly take a higher bar to entry on
     566     setup if it made maintenance of the racks at each site (particularly if scaled into the 100s)
     567     significantly easier.
     568<AB> Definitely you are more familiar with the current status of Xen development than I am. There is
     569     a lot of momentum behind Xen and I'm sure that its efficiency limitations are being / already
     570     have been addressed. In my opinion PlanetLab has some solid reasons for sticking with containers
     571     but we don't need to debate these here.
     572     I think the main reason why InstaGENI is advocating containers for scalability (instead of Xen VMs)
     573     is that the team members have built complete solutions that use containers. In the short term,
     574     container-based virtualization on InstaGENI seems to be the most pragmatic solution. In the longer
     575     term, I don't see anything that precludes us from deploying Xen VMs on InstaGENI if someone is willing to do it.
     576}}}
     577G.19. What is the plan for the transition from VServer to LXC? The current deployment plan lines up with the estimated availability of LXC.  Please confirm or clarify.
     578{{{
     579<RR> I'll let Princeton take this one.
     580<AB> Assuming that the LXC image is ready by the time we want to start deploying IG-PL nodes,
     581     we will have no need for VServers and we will deploy the LXC image on InstaGENI. 
     582}}}
     583
     584G.20. Design document should provide much more detail about the topology and the handling of all traffic types (monitoring, updates, user, etc) in the topology.
     585{{{
     586<RR> Sure, we will update this in the design document.
     587}}}
     588
     589G.21. Document the expansion hardware and image combinations that are supported for any InstaGENI site to implement.
     590{{{
     591<RR> To be clear, there are two things I said that we won't support:
     592     1) Sites making local changes to the control software
     593     2) Sites adding PCs to the rack which won't boot our standard images
     594     These are two separate issues: if a site has to build a custom image
     595     to support nodes that they add, this usually does not require changes
     596     to the control software, and they will not forgo our help in updating the control software.
     597     There is documentation about node requirements here:
     598     http://users.emulab.net/trac/emulab/wiki/HWRecommend
     599     I will add a link to the design document.
     600}}}
     601
     602== GPO !OpenFlow Questions ==
     603
     604(jbs - I'm content at this point with their design, and don't have any remaining questions.)
     605
     606Questions submitted by Josh Smift:
     607
     608I wanted to summarize a bit how we think !OpenFlow will work in the InstaGENI racks, to make sure we're all on the same page, since the original design document was a little confusing on this score.
     609
     610We think that the current plan is for the HP 6600 switch to run in hybrid mode, with three types of VLANs that are related to slivers:
     611
     612(1) VLANs that are dedicated to a sliver, created and removed by ProtoGENI, much (exactly?) like existing ProtoGENI VLANs now.
     613
     614(2) VLANs that are dedicated to a sliver, created and removed by PG, just like the previous kind, except !OpenFlow-enabled, and pointing directly to an experimenter's controller. (See below for more about this.)
     615
     616(3) VLANs that are shared between multiple slivers, !OpenFlow-enabled, pointing to the InstaGENI FlowVisor in the rack as their controller, which will in turn be controlled by FOAM.
     617{{{
     618<RR> Yes, you have this exactly right.
     619}}}
     620
     621One general question: Would it be better to run the switch in aggregation mode, with all VLANs !OpenFlow-controlled, and implement the first type of VLAN by running a learning-switch controller on the control node? Nick is probably best positioned to talk about the pros and cons of this approach, but we weren't sure if it had been considered and rejected, or was still
     622on the table, or not really considered yet, or what.
     623{{{
     624<RR> We've talked about this with Nick, and I think that hybrid mode is
     625     far preferable. I can go into a lot of detail about this if you'd like,
     626     but the high level point is that experimenters who aren't choosing to
     627     run experiments with OpenFlow should not be affected by it.
     628<JS> Well, this depends somewhat on how you define "affected", in that non-OF
     629     VLANs can still be affected if the switches OF-controlled VLANs overwhelm
     630     the switch's CPU, but fair enough. I'm content to defer to Nick on this.
     631<RR> Yes, I do realize the OpenFlow users could potentially DoS the switch,
     632     but it seems like we have to live with this if we want to offer OpenFlow
     633     and don't want to buy a separate switch for it. But we should still
     634     minimize the impact.
     635<AH> How does an experimenter in their request RSpec indicate that they want
     636     an OpenFlow controlled network?
     637     I assume that only in that case does PG send FOAM the manifest (or whatever
     638     you use to send info about the allocation)?
     639     Does the manifest RSpec have sufficient information for the experimenter to
     640     know what flowspace is legal for them to request from FOAM?
     641     Do you and Nick in fact have a plan for how PG gets this information to FOAM?
     642     To the extent to which you have to add information to the request and manifest
     643     to make this work, how does this relate to the RSpec changes that Ilia has
     644     requested to support similar functionality in the ExoGENI racks
     645    (see http://lists.geni.net/pipermail/dev/2012-January/000553.html )
     646}}}
     647About the second kind: On the design review call, Rob described a scenario in which an experimenter wanted a dedicated !OpenFlow-controlled VLAN, but wanted to run their controller on one of the resources in their new sliver (e.g. on an OpenVZ container they were requesting). This would mean that they couldn't supply their controller info (IP address and TCP port) in
     648their request rspec, because they obviously don't know that information for a resource they haven't gotten yet. Rob suggested that to accommodate this, ProtoGENI would configure the switch such that it ran a *listener* for the experimenter's VLAN, which the experimenter would tell their *controller* to connect to, rather than the more typical approach of configuring the *switch* to connect to the experimenter's controller.
     649
     650I wasn't sure that this would actually work, and talked with Nick about it later in the week, and he confirmed that it won't. In particular, you can't actually do many !OpenFlow actions when you talk to a listener like this (e.g. with dpctl); and controller software generally expects to receive connections from switches, not to connect actively to switches, so most (all?) existing controller software wouldn't work in this model.
     651
     652There are other ways to solve the problem that Rob described; for example, there could be a way for the experimenter to indicate in their rspec that they want their controller to be an OpenVZ container that they're requesting, but they don't know its IP address, but once PG gives them the container, it should configure the switch to use that container's address (and either a specified TCP port or an arbitrary port or whatever). So this doesn't seem like a big problem; but we wanted to make sure that we were all on the same page about how it would work, and that there weren't any other obstacles to having the switch connect to the controller in the second scenario.
     653{{{
     654<RR> This comes as a surprise to me, since I've used OpenFlow with the switch
     655     as a listener in the past. But that was a few years ago, and I guess things
     656     have changed?
     657     Anyhow, the alternate solution you have sketched out sounds good to me.
     658}}}
     659
     660(We're also not sure how common this situation will be, since another issue with having the controller be on a container in the rack is that if the rack doesn't have a public IP address, things outside the rack won't be able to connect to it... Our experience is that experimenters generally want to run a single controller to control all the OF resources in their slice, not a controller per switch (or even per rack), so they're going to want to run their controller on something with a stable public IP address.)
     661
     662One other problem with the second scenario is that each VLAN is a separate !OpenFlow instance on the switch, and we think that the switch can only handle a dozen or so instances at once before bogging down (CPU is the bottleneck, I believe). So, in practice, we may want to do something to discourage experimenters from reserving dedicated !OpenFlow-controlled VLANs unless they're sure they really need them, or something. (I think that limitation may go away if the switch runs in aggregation mode, which is one of the advantages of doing it that way, but I'm not 100% sure.)
     663
     664{{{
     665<RR> That's my understanding too. I don't think this is going to be a problem:
     666     while some important experiments do require OpenFlow, a majority don't.
     667<JS> Sure, but what I actually had in mind here was steering people who *do*
     668     want OpenFlow towards using a shared VLAN rather than a dedicated VLAN,
     669     unless they need a dedicated VLAN for some reason. Does that make sense?
     670<RR> Sure, I have no problem with steering people towards a shared OpenFlow VLAN.
     671}}}
     672Moving on to the third kind: Rob said on the call that in this scenario, the sequence would be something like this:
     673
     674 * Experimenter requests a combination of ProtoGENI and network resources,indicating that they want their network resources to be OF-controlled on a shared VLAN.
     675 * PG allocates the resources.
     676 * PG tells FOAM that the experimenter is authorized to have the relevant network resources.
     677 * PG tells the experimenter that to actually get those resources, they should create an !OpenFlow rspec and submit it to FOAM.
     678
     679Our question was: If PG is going to be talking to FOAM anyway (which seems like a fine idea), why doesn't it just allocate the resources then and there, rather than requiring the experimenter to write and submit a separate rspec? Telling FOAM "if this experimenter asks for this flowspace later, let them have it" doesn't seem like it would be any easier (on the software side) than just telling FOAM "give this experimenter this flowspace"... In fact, it seems like it'd be somewhat harder, since FOAM already has a complete interface for "give this experimenter this
     680flowspace", and doesn't yet have a full implementation of the policy-engine style "if someone asks for something that looks like this later, give it to them" sort of interaction. And it's much easier for the experimenter, of course, if they can just get their resources immediately, rather than if they have to go write and submit a separate rspec (giving them a separate sliver, in a separate aggregate, with a separate expiration date, etc).
     681
     682{{{
     683<RR> My thinking here is that there are two things going on here:
     684     the first is the connection of resources in the a slice to a shared
     685     VLAN run using OpenFlow, and the second is allocation of flowspace in
     686     that VLAN. Flowspace is about much more than just the ports and VLANs
     687     being controlled, so it makes sense to have FOAM / flowvisor manage that
     688     shared flowspace. That said, I want to give FOAM as much information as
     689     we can so that it can make choices like not giving slices port-based
     690     flowspace for ports they don't own, etc.
     691<JS> Yeah, I think this all makes sense at a high level; the question I want to
     692     get at is how that actually happens. The part that seems to me like it
     693     could vary, is the question how much FOAM and ProtoGENI know about each
     694     other, and how they communicate with each other.
     695     Maybe the ProtoGENI AM in a given rack has special permission to ask FOAM
     696     to allocate flowspace on behalf of an experimenter; this could be
     697     flowspace that PG has figured out that the experimenter wants, or it could
     698     be based on explicit flowspace requests from the experimenter, or some
     699     combination of those things.
     700     Or, maybe PG has a way to tell FOAM about flowspace that an experimenter
     701     is entitled to ask for, but doesn't actually allocate it until the
     702     experimenter asks FOAM for it.
     703     Or maybe it goes the other way, and PG doesn't tell FOAM anything when a
     704     user creates a sliver; but when an experimenter asks FOAM for some
     705     flowspace, FOAM asks PG to confirm that the user should be allowed to have it.
     706     The first or last of those make somewhat more sense to my intuitive
     707     expectations than the middle one, but if you've got a solid plan for the
     708     middle one, and it has some advantage over the others, it doesn't seem
     709     obviously crazy. :^) But it might be worth talking through some of the
     710     pros and cons some time... Or, if you and Nick have already had this
     711     conversation, and can summarize, that'd be cool.
     712<RR> > Or, maybe PG has a way to tell FOAM about flowspace that an experimenter
     713     > is entitled to ask for, but doesn't actually allocate it until the
     714     > experimenter asks FOAM for it.
     715     This is along the lines I was thinking; nothing happens automatically at FOAM
     716     on creation of a sliver at the ProtoGENI level, ProtoGENI just communicates
     717     the details of that sliver to FOAM in case FOAM wants to use it as input to
     718     its management of flowspace.
     719<JS> Ja, so I think my question is more for Nick: Is FOAM going to have an
     720     interface for ProtoGENI to do that? It doesn't yet, and it sounds like the
     721     converse of something we'd been talking about before, in which FOAM would
     722     have a way to ask other AMs (or a clearinghouse, or a stitching service,
     723     or something) for information that it wanted to use to make decisions.
     724     Pushing that information to FOAM seems different, and my intuition is that
     725     it's architecturally less ideal -- it seems much better to create a general
     726     way for any AM to ask another AM for information that it might use to
     727     allocate resources, than to create a one-off way for PG to push specific
     728     information to FOAM.
     729     (And to touch on something Aaron just said: What the ExoGENI guys are
     730     doing isn't really either of those things, but more like having ORCA
     731     actually allocate the experimenter's flowspace itself (by talking to
     732     FlowVisor or FOAM).)
     733}}}
     734Finally, one other !OpenFlow-related question: If you have a bunch of resources in a rack -- say two bare-metal nodes and half a dozen OpenVZ containers -- and you want all of the traffic between all of them to be !OpenFlow-controlled, can you do that? In particular, there's presumably some sort of virtual switch used by the OpenVZ containers; is it (or could
     735it be) an !OpenFlow-enabled Open vSwitch?
     736{{{
     737<RR> At this point, our plan is to force all traffic in this case out to the
     738     physical switch, rather than try to run an openflow-enabled virtual switch
     739     on the hosts.
     740<CG> Does this work even if two VMs are using the same VLAN on the same
     741     physical interface on the host?  Sorry to ask what's probably a dumb
     742     question --- i just haven't personally figured out how to force the
     743     Linux network stack to do this, so wanted to double-check.
     744<RR> Yes, it does work; we have built a special virtual NIC driver just
     745     for this purpose.
     746<JS> Sounds good.
     747}}}
     748
     749= Adam Slagell Questions =
     750
     751S.1.  Aggregate Provider Agreement   [[BR]]
     752s.1.a Updates Plan   [[BR]]
     753S.1.a.1. ~~ What pieces of the software stack do you take responsibility for?  ~~ (aslagell email 2/21/2012 5:04 PM ) [[BR]]
     754{{{
     755<AS> They take responsibility for all software and VMs, except the PL image updates
     756     will really be pushed out through the PL mechanism.
     757}}}
     758S.1.a.2. ~~ How do you monitor for vulnerabilities and determine impact? ~~  (aslagell email 2/21/2012 5:04 PM )[[BR]]
     759{{{
     760AS> They will follow the Emulab frequent update process.
     761}}}
     762
     763S.1.a.3. ~~ Do you plan to test patches or mitigations?  ~~  (aslagell email 2/21/2012 5:04 PM ) [[BR]]
     764{{{
     765<AS> Yes, at the Emulab InstaGENI rack first
     766}}}
     767
     768S.1.a.4. ~~ How do you push out patches and how fast? ~~  (aslagell email 2/21/2012 5:04 PM )    [[BR]]
     769{{{
     770<AS> Most updates will be done by running scripts on the racks. An admin will have to login
     771     to each rack to execute an update, they are not pushed out automatically. Turn-around
     772     time is dependent upon criticality of the update.
     773}}}
     774S.1.b. Logging   [[BR]]
     775S.1.b.1. ~~ What logs do you keep?  ~~  (aslagell email 2/21/2012 5:04 PM )  [[BR]]
     776{{{
     777<AS> Logs are stored in a database on the control node. These include all allocation related
     778     transactions. Local staff, the GMOC and insta-geni central will be able to retrieve these
     779     logs through a web interface on the control node. It will be password protected.
     780     There will also be a public interface like PlanetFlow to give a reduced view of these logs,
     781     mapping IPs to slices.
     782}}}
     783
     784S.1.b.2. ~~ How long do you keep them?  ~~  (aslagell email 2/21/2012 5:04 PM )  [[BR]]
     785{{{
     786<AS> It is up to the individual sites, but by default until they run out of room.
     787}}}
     788S.1.b.3. ~~ How do you protect the integrity of logs? Do you send them to a remote server?  ~~  (aslagell email 2/21/2012 5:04 PM )   [[BR]]
     789{{{
     790<AS> There are no integrity tools like tripwire used, but logs are duplicated on the boss
     791     node, where users have no shell access. Also, the GMOC will be able to pull logs offsite,
     792     as well as the CH in the future (which may be a push).
     793}}}
     794
     795S.1.c. Administrative interfaces   [[BR]]
     796S.1.c.1. Is there a separate, dedicated administrative interface?   [[BR]]
     797{{{
     798<AS> All nodes have iLO remote admin cards to give console access. The control node iLO card
     799     has a publicly routable IP, the rest can be accessed from the control node once connected
     800     to the rack.
     801<AS> Is this on it's own admin backplane in the rack? 
     802<RR> Access to iLO on the experiment nodes is done through the 'top of rack' 2610 switch,
     803     the same one used for outside connectivity. This will be accomplished through a separate
     804     VLAN that is not routed to outside, and has as its only members the boss VM on the control
     805     node, the iLOs for the experiment nodes, and the IP interface of the 6600 switch. This is
     806     very much like the setup used at every Emulab site.
     807<AS> iLO can use passwords or SSH keys. THere will be different credentials for the local sites,
     808     GMOC and instageni central.
     809     There is also a command line and web server interface to control the protogeni software on
     810     the AM IP.
     811}}}
     812S.1.c.2. ~~ Do the local sites have access to this?~~  (aslagell email 2/21/2012 5:04 PM ) [[BR]]
     813{{{
     814<AS> Local sites have access to the admin interfaces, but heir credentials only work for their rack.
     815}}}
     816S.1.c.3. ~~If so, do you use different credentials at different sites?~~  (aslagell email 2/21/2012 5:04 PM ) [[BR]]
     817{{{
     818<AS> Yes.
     819}}}
     820
     821S.1.c.4. ~~ What sort of authentication does an admin use? ~~  (aslagell email 2/21/2012 5:04 PM )  [[BR]]
     822{{{
     823<AS> SSH keys or passwords
     824}}}
     825S.1.c.5.~~ Do you use root accounts. If so, can you audit escalation to root?~~  (aslagell email 2/21/2012 5:04 PM )   [[BR]]
     826{{{
     827<AS> They do not allow direct root login with SSH. Instead, admins must login as a user and sudo to act as root.
     828}}}
     829S.1.d. User credentials   [[BR]]
     830S.1.d.1 ~~ Are use credentials (e.g., private keys) stored anywhere, even temporarily ~~  (aslagell email 2/21/2012 5:04 PM )     [[BR]]
     831{{{
     832<AS> No
     833}}}
     834S.1.d.2 ~~ If yes, then how are they protected and how long do they live? ~~  (aslagell email 2/21/2012 5:04 PM )     [[BR]]
     835{{{
     836<AS> NA
     837}}}
     838
     839S.1.e. Slice isolation   [[BR]]
     840S.1.e.1. How are slices isolated?   [[BR]]
     841{{{
     842<AS> VMs (or whole node allocation) and VLANs are the primary mechanisms. On the control node XEN
     843     is used for virtualization. OpenVz (lightweight, more like BSD jails) is used for protogeni VMs.
     844     OpenVZ provides containers that only allow a VM to see the VLANs it is supposed to. PlanetLab nodes
     845     will use VServers or LXC, which provides similar security properties to OpenVZ.
     846
     847<AS> Can a bare metal allocated node sniff traffic for other slices?     
     848     Probably not if switches are configured properly.       
     849<RR> Since the "control network", ie. the Internet-facing VLAN on the 2610 switch, is shared by all
     850     nodes, it is possible that one could use various tricks such as ARP poisoning to snoop on others'
     851     traffic. This would require intentional, malicious action on the part of the snooper, as normal
     852     switch behavior would prevent it. On the "data plane network" (ie. the 6600), traffic will be
     853     separated by VLANs and no snooping will be possible. This is the same level of guarantee currently
     854     given by all Emulab sites.
     855<SS> One other question: how often do log messages or "watch dog" messages get sent out from
     856     the boss node on each rack? If normal  log traffic stops, how long until someone
     857     (instaGENI or GMOC) notices? (I'm assuming that it is hard to take control of boss or the
     858     2610 switch, but moderately simpler to DoS something (process, VMM, switch port) to make
     859     it harder to log in/shut down. )
     860<RR> The stuff that we have set up with the GMOC right now (or, at least, a few years ago, I don't
     861     know if they are still using it), is pull from their side. I don't know how often they poll.
     862<RM> On the first point, the only connections to the experiment nodes that go through the 2610
     863     are to the iLO ports...I believe that Rob suggested we could hook those up to the 6600
     864    (we have the ports).  The negative is that people would have to log in to the boss node to
     865     access the iLO cards on the experiment nodes...
     866<SS> Hmmm... I thought that people would have to log into the boss node (in a VM) to reach
     867     the experiment node iLO cards -- they don't have a public Internet IP address. I think
     868     it is better to present the smallest attack surface to the public Internet, even if the
     869     iLO cards are a fairly low risk.
     870<RM> Up to Rob, but as far as I am concerned, np
     871<RR> My preference is to have private iLO addresses, as I expect they will mostly be useful
     872     for admins (both local and remote).
     873     The main use case for iLO console access for experimenters is when setting up or
     874     debugging a new disk image. My intention is to give the rack at Utah public ILO address,
     875     and encourage people who are making new images to do so here.
     876<AS> I thought user's could not login to the boss node, which I considered a good thing.
     877<RR> This is a point where we need to be more precise in our language - I made a pass over
     878     the document trying to make sure it's consistent, but maybe I missed something.
     879     The control node is the physical PC that hosts a few VMs
     880     The boss VM is one VM that runs on the control node, and this is where the database,
     881     node boot services, etc. reside. Users will not have shells in this VM.
     882     The 'users' VM (might also be called 'ops' VM) is one on which users *will* be given
     883     shells so that they can get to their files (it's also the fileserver for the rack) without
     884     needing slices. This is the path through which users will get in if the site doesn't provide
     885     sufficient IP space for slivers to get their own public IP addresses. If we give users iLO
     886     access, it would be through this VM.
     887<RM> The "users" VM is the equivalent of users.emulab.net, right?
     888<RR> Correct. (For others, you might also see it called 'ops' in some documents; this is a name
     889     we've used to refer to it for a long time, but it's actually quite a confusing name, so
     890     we're trying to break this habit.
     891<RM> I guess one good question is whether users should be able to manipulate
     892     the iLO cards of the experiment nodes, or whether we want to restrict that
     893     to the Boss VM.  Probably the latter...
     894<RR> Clearly, users should not have access to iLO on nodes that are being shared. My guess is that
     895     the main use will be for exclusive nodes during image creation, since there are plenty of
     896     things one could do with the kernel, etc. to break the network, and you want to be able to
     897     at least see what's going on.
     898}}}
     899S.1.e.2 ~~ Is there isolation in time for bare metal hosts, wiped OS'es?  ~~  (aslagell email 2/21/2012 5:04 PM ) [[BR]]
     900{{{
     901<AS> Yes
     902}}}
     903
     904S.1.e.3. ~~ Are there weak points in the isolation? ~~  (aslagell email 2/21/2012 5:04 PM )  [[BR]]
     905{{{
     906<AS> OpenVZ, VServers and LXC provide reasonable security isolation but not as strict as true VMs.
     907}}}
     908
     909
     910
     911S.2. LLR Agreement   [[BR]]
     912S.2.a. Opt-in users   [[BR]]
     913S.2.a.1 Can opt-in users be mapped to a slice? For example, if two experiments where acting as ISPs and there was an issue with a particular opt-in user connecting to GENI through a given rack, could you determine which slice it was associated with?
     914{{{
     915<AS> This was the question I was struggling to word before I left the call.
     916<AS> What if it is a PlanetLab slice with opt-in users?                   
     917<RR> Yes, as long as the reporting party knows the IP address on the rack that is
     918     associated with the offending opt-in-user. For all public IP addresses, we
     919     will provide a mechanism that maps that address to a particular slice; in the
     920     case of PlanetLab shared nodes, a port number will also be required, as many
     921     slices share the same public IP address. So if, for example, an opt-in user
     922     is using a proxy in the rack to attack someone, or hosting objectionable material
     923     on a CDN hosted in the rack, knowing the address(es) used by the proxy or the CDN
     924     will be sufficient to map it to a slice. This is the same level of guarantee provided
     925     by PlanetLab.
     926}}}
     927 
     928S.2.a.2. ~~ Does your rack bridge to opt-in users like MNG? ~~  (aslagell email 2/21/2012 5:04 PM )  [[BR]]
     929{{{
     930<AS> No. Experiments may have opt-in users do this as part of their experiment, but this is tangental to the GENI rack design.
     931}}}
     932
     933S.2.b. Attribution   [[BR]]
     934S.2.b.1. Can you determine which slices are running on a given component?   [[BR]]
     935{{{
     936<AS> Yes
     937<AS> Is this complicated when you run PL nodes which are outsourcing AM functions?   
     938<RR> It is only slightly more complicated. Since both ProtoGENI and PlanetLab have
     939     functions that make this information available, and ProtoGENI will know which
     940     nodes are currently acting as PlanetLab nodes, any requests regarding these
     941     nodes made to the ProtoGENI AM will be redirected to the MyPLC instance that
     942     will be set up for all the racks.
     943}}}
     944
     945S.2.b.2. Can you map IP and timestamp info uniquely to a slice and how quickly?   [[BR]]
     946{{{
     947<AS> Yes
     948<AS> How quickly?                                                             
     949<RR> As quickly as it takes to run a database query.
     950}}}
     951
     952S.2.b.3. ~~ Would a place replying a rack be able to give a clear set of IP addresses unique to the rack. This might be hard if it is bridging resources behind it with dynamic IPs like opt-in user WiMAX hand sets.  ~~  (aslagell email 2/21/2012 5:04 PM )   [[BR]]
     953{{{
     954<AS> Yes. And if they don't have enough IPs on site, the rack can NAT.
     955}}}
     956
     957S.2.c. Clearinghouse policy   [[BR]]
     958S.2.c.1 ~~ Is there a mechanism to resource allocation information back to the CH for policy verification? ~~  (aslagell email 2/21/2012 5:04 PM )  [[BR]]
     959{{{
     960<AS> They plan to support it when they know what that interface will look like.
     961}}}
     962
     963= Nick Bastin Questions =
     964
     965(jbs - These questions have all been addressed, between a separate conversation between Nick and the InstaGENI team, the design review meeting, or follow-up e-mail.)
     966
     967Design Document Quote: '' "Thus, the minimum number of routable IP addresses for the rack is expected to be three: one for the control node, one for the control node iLO card, and one for FOAM" ''
     968
     969B.1. ~~How is FOAM provisioned/configured such that it needs an IP independent of the control node?~~ (jbs - the FOAM/FV VM on the control node will presumably have its own IP address)
     970
     971B.2. ~~Why not use Open vSwitch with the OpenVZ environment to allow experimenters control over the network stack using openflow?~~ (jbs - Rob says that they're going to force traffic from the OpenVZ containers out to the switch, so experimenters can control it there)
     972
     973NB Quote:'' Otherwise we locally re-create the short-circuit bridging problem that we have at some regionals, forcing experimenters to have containers on different nodes if they want control of the packets between them ''
     974
     975B.3. ~~How many openflow instances do we think the ProCurve (E-Series) 6600 can support?~~ (jbs - Rob says "about a dozen", Nick says Rob got this info from him and it's probably conservative, depends on what the experimenters are doing)
     976
     977B.4. ~~FOAM is not a controller, nor a hypervisor, and does not require 12GB of ram~~ (jbs - 12 GB of RAM is for the entire control node; the FOAM+FV VM should probably get 3 or 4 GB)
     978
     979NB Quote:'' !FlowVisor, which isn't mentioned in the document, requires significantly more RAM than FOAM, but in this
     980environment (with at most 6 datapaths, but this document describes only one) the memory footprint should be no more than the default 1.5GB, so FOAM+!FlowVisor should be perfectly happy with 2-4GB. ''
     981
     982B.5. ~~Why not use aggregation mode with a default controller to provide connectivity to slices which don't care about using openflow (thus allowing openflow experimenters the use of multiple VLAN tags, which is impossible in hybrid mode)~~ (jbs - Rob says that he and Nick discussed the pros and cons, and concluded that hybrid mode was a better bet)
     983
     984B.6. ~~I don't really understand anything in the "OpenFlow" section of the document~~ (jbs - Nick talked with the team and now understands the general idea)
     985
     986B.6.a. ~~What is "external openflow provisioning"?~~ (jbs - unclear, but not important)
     987
     988B.6.b. ~~Where is the scalability concern with a single controller? (jbs - unclear, but not important)
     989
     990NB Quote:'' A controller has no requirement to be implemented as a single process, or even on a single machine - nothing prevents a controller from being a distributed high-availability service, which is an implementation option for several available controllers ''
     991
     992B.6.c. ~~What single "controller" is being used in InstaGENI as referenced in this section?~~ (jbs - unclear, but not important)
     993
     994B.6.d. ~~Does this mean that all racks will use the same hypervisor? Or that there will be one hypervisorper rack?~~ (jbs - each rack will have a FOAM instance)
     995
     996B.7. ~~It is in fact possible to use an L2-based networking model for vServers with distinct interfaces for each container, it's just not simple (although it only needs to be done once).~~ (jbs - true, but not relevant -- they're not planning to do it)
     997
     998B.8. ~~Node Control and Imaging: "likely users of this capability will be able to create slivers that act as full PlanetLab nodes or !OpenFlow controllers." Why does a user need an entire node for an !OpenFlowcontroller?~~ (jbs - unclear, but irrelevant)
     999
     1000B.9. ~~Containers are not Virtual Machines (VMs) - at best they are Virtual Environments (VEs). We should not confuse this terminology lest we fail to manage the expectations of the people using them.~~ (jbs - agreed)
     1001
     1002B.10. ~~How are these 3x1Gb data plane connections configured? Etherchannel, link-aggregation, distinctchannels?~~ (jbs - Nick says Rob addressed this, "they're just 3 distinct 1gb interfaces")
     1003
     1004B.11. ~~How is FOAM getting new certificate bundles?~~ (jbs - when there's a new cert, the FOAM admin for each rack, whoever that is, will install it; or someone could automate this; and this should happen only infrequently, and isn't hard in any case)
     1005
     1006B.12. ~~Is each rack running its' own clearinghouse which mints user certificates?~~ (jbs - no, as covered by Design Review Question 1)