[[PageOutline]]

These are numbered questions sent to exogeni-design@geni.net; discussions are captured to track resolution. Questions are crossed out when answered. Person closing a question should adding his/her name to the question with a brief explanation. Any document that is referred to with a URL is attached to the page for historical reference. Following are notes attributions:
{{{
AS: Adam Slagell           CE: Chip Elliot         IB: Ilia Baldine             JS: Josh Smift         NR: Niky Riga
AH: Aaron Helsinger        CG: Chaos Golubitsky    JC: Jeff Chase               LN: Luisa Nevers       VT: Vic Thomas
BV: Brad Viviano           HD: Heidi Dempsey       JM: Jonathan Mills           NB: Nick Bastin        TU: Tim Upthegrove
}}}

= GPO ExoGENI Questions =

 * 1. ~~Does ExoSM speak GENI API?~~ (nriga)
{{{
<NR> Yes, ExoSM is just like any Orca SM running in a rack and can
     be thought as a GENI AM that can make reservations in racks
     as well as provide the network connecting resources on different
     racks. Per Ilia comments ExoSM can also give an experimenter
     resources from only one rack by making a bound request that
     bounds all resources to specific rack. My understanding is
     also that all topology information is available to the
     experimenter through the GENI API (listresources) only through
     the ExoSM and not through the rack-local Orca SMs.
}}}
 * 2. ~~Can you describe the ExoGENI software stack a bit more in the teleconf (Figure 7)?~~ (ahelsing)
   * 2a. ~~Is the AltG API the same as the Orca XMLRPC API at the SM?~~ (ahelsing)
{{{
<IB> Yes.
}}}
   * 2b. ~~Can you draw the software stack for the worker nodes in the same style as Figure 7 for comparison?~~ (ahelsing)
{{{
<IB> Worker nodes are either turned off (booted and installed using
     xCAT when needed) or run Centos 6.1 with OpenStack worker node
     configuration.
<JC> The cloud worker nodes also need a cloud node manager installed.
     This requires minor modifications for NEuca.  This is the thing
     that lets us create multiple interfaces on VMs and stitch them
     to other VLANs.
<JS> Do we understand what controls how many nodes are bare-metal and how many
     are available for VMs? Can this be adjusted on the fly? By whom?
<HD> The allocation question is a policy question, the mechanism should be defined later - both postponed.
<HD> Josh to get software stack for Worker node.

}}}
 * ~~3. Is Eucalyptus or OpenStack used for the compute resources?~~ (chaos)
{{{
<IB> OpenStack
}}}
   * ~~3a.  If OpenStack is being used, what testing or analysis convinced you to choose OpenStack?~~ (chaos)
{{{
<IB> We've done performance comparision. OpenStack instances boot
     significantly faster (orders of magnitude) due to the use of
     COW for boot images. We can also see a path to making VM
     migration work between ExoGENI sites with OpenStack (could not
     figure it out with Eucalyptus).
}}}
 * ~~4. When will ExoGENI racks support xCAT-based bare-metal node allocation?~~ (chaos)
{{{
<IB> My hope by GEC13
}}}
 * ~~ 5. How do bare-metal images get vetted?~~ (hpd)
{{{
<IB> TBD
<HD> Same as S.27, closing this one, Adam will follow up.
}}}
   * ~~5a. Given that VM images are unvetted, why vet bare-metal images?~~
{{{
<IB> Security concerns. Also bare-metal images are harder to prepare.
     Mistakes will mean users occupy fairly limited bare-metal
     resources just learning how to boot them.
<HD> Same as S.27, closing this one, Adam will follow up.
}}}
 * ~~6. Can we have more detail about disk images:~~ (jbs)
   * ~~6a. How are central images selected? Is there a central repository?~~ (jbs)
{{{
<IB> For vetted images for bare metal. There may also be a small
     informal repo for sample VM images.
<JC> For 6a-6b-6c.   ImageProxy can fetch images given a URL.  So
     people can put their images anywhere.  We have software (pod)
     to make it easy for users to upload images and share them.
     The user interface is a little rough, and it is not quite
     deploy-ready, but it could be used.   Ilia's concern (I think)
     is that we don't have budget to run and manage a repository
     server with a lot of disk.  But GPO could certainly host it.
<AH> 13) Building new VM images takes work. (Q6)
     You have to add NEuca and maybe something OpenStack? This was hard with
     Eucalyptus, but maybe it is easier with OpenStack? Their answer that
     this is documented elsewhere isn't terribly re-assuring (since the
     Eucalyptus documentation wasn't enough for Tom). Do we want to check
     their list of images? Have them add 1 or 2? Have them collect/edit
     documentation for this process?
<CG> Well, OpenStack may be a lot better for this --- we simply don't know.
     IMHO, the right answer to "this is documented elsewhere" is, "great,
     then it should be easy for you to make a wiki page pointing to usable
     procedures elsewhere".
     Well, actually: i said that, and then thought about the RENCI integration
     of external documentation and internal for ORCA/NEuca, which i have not
     found all that readable when i've tried to use.  So maybe we'd rather
     have them duplicate the steps that an experimenter would use to create
     an image?  I'm not sure here.
<JS> I predict that we aren't planning to host a repository server. Are we? If
     not, do we think that someone else is? Do we want to push RENCI to do that?
<JS> Answer: RENCI has a central repository at http://geni-images.renci.org/images/,
     which ExoGENI will use too (or maybe a subdirectory, or some such). Images
     for that repository must be reviewed by RENCI, the GPO, or our delegate.
     All vetted bare-metal images will live here, and a small number of
     commonly-used VM images could be hosted here too. RENCI or GPO will put
     together a nicer index page (it's currently just an Apache DirectoryIndex
     listing, with no comments or explanations)
<IB> This is correct although as far as the bare-metal nodes are concerned
     the images will be cached in each rack and the booting will happen
     from there. I have put together a small page listing available VM
     images : https://geni-orca.renci.org/trac/wiki/neuca-images
<JB> I've also rephrased some of the questions a bit from their original forms.
     6a. How are central images selected? Is there a central repository? 
     Answer: RENCI has a central repository at http://geni-images.renci.org/images/,
     which ExoGENI will use too (or maybe a subdirectory, or some such). Images
     for that repository must be reviewed by RENCI, the GPO, or our delegate. 
     All vetted bare-metal images will live here, and a small number of
     commonly-used VM images could be hosted here too. RENCI or GPO will put
     together a nicer index page (it's currently just an Apache DirectoryIndex
     listing, with no comments or explanations)
<IB> This is correct although as far as the bare-metal nodes are concerned the 
     images will be cached in each rack and the booting will happen from there. 
     I have put together a small page listing available VM images :
     https://geni-orca.renci.org/trac/wiki/neuca-images

}}}
   * ~~6b. Are there default images hosted at RENCI? What are they?~~ (jbs)
{{{
<IB> We have a few on http://geni-images.renci.org/images/
<JS> Answer: The exact images haven't been specified, but there aren't in
     principle any reason why we can't publish any images that we decide
     we want, within disk space limitations. We (GPO) will presumably use
     our rack to come up with some, probably focusing on modern and stable
     versions of Ubuntu and Fedora/CentOS.
<IB> Yes and we encourage multiple locations (URLs/web servers) from which
     the images are served. A directory listing them can be stored in one
     place.
<JB> I've also rephrased some of the questions a bit from their original forms.
     6b. Are there default images hosted at RENCI? What are they? 
     Answer: The exact images haven't been specified, but there aren't in
     principle any reason why we can't publish any images that we decide we
     want, within disk space limitations. We (GPO) will presumably use our rack
     to come up with some, probably focusing on modern and stable versions of
     Ubuntu and Fedora/CentOS.
<IB> Yes and we encourage multiple locations (URLs/web servers) from which the 
     images are served. A directory listing them can be stored in one place.
}}}
   * ~~6c. Will RENCI also store some user images?~~ (jbs)
{{{
<IB> Only a few.
<JS> Answer: RENCI will only store experimenter-created images if they've
     been reviewed (see 6a), but ImageProxy can fetch and use an image
     from any experimenter-supplied URL, and RENCI has software that makes
     it easy for experimenters to upload images and share them, although
     it's not quite deployment-ready yet.
<IB> Yes. Duke team is working on POD (Persistent Object Depository) that
     can fulfill this role. I repeat that this is an optional component -
     a user can create an image and serve it from *any* web server.
<JB> I've also rephrased some of the questions a bit from their original forms.
     6c. Will RENCI also store some user images? 
     Answer: RENCI will only store experimenter-created images if they've been
     reviewed (see 6a), but ImageProxy can fetch and use an image from any
     experimenter-supplied URL, and RENCI has software that makes it easy for
     experimenters to upload images and share them, although it's not quite
     deployment-ready yet.
<IB> Yes. Duke team is working on POD (Persistent Object Depository) that can 
     fulfill this role. I repeat that this is an optional component - a user 
     can create an image and serve it from *any* web server. 
}}}
   * ~~6d. Will there be instructions for building custom images?~~ (jbs)
{{{
<IB> For VMs yes, although basically OpenStack, Eucalyptus and
     Amazon have pretty extensive guides on how to do that.
<JS> Answer: RENCI will publish instructions for building VM images, and
     there are good general docs available from OpenStack, Eucalyptus, and
     Amazon too.
<IB> Here are the current instructions:
     https://geni-orca.renci.org/trac/wiki/NEuca-guest-configuration
<JB> I've also rephrased some of the questions a bit from their original forms.
     6d. Will there be instructions for building custom images? 
     Answer: RENCI will publish instructions for building VM images, and there
     are good general docs available from OpenStack, Eucalyptus, and Amazon too.
<IB> Here are the current instructions: 
      https://geni-orca.renci.org/trac/wiki/NEuca-guest-configuration

}}}
   * ~~6f. Must the experimenter add NEuca? ~~ (hpd)
{{{
<IB> NEuca-py tools *should* be added to the image such that post
     boot configuration (IP address assignment to interfaces and
     post-boot scripts) would be done. Without it, bare interfaces
     will still be created based on NEuca INI script generated by
     ORCA for the desired topology and the user would have to
     manually configure them.
}}}
 * ~~10. Can we have more information about how the IP Address proxy options in the table on p. 4 work?  Do the proxies expose all ports or just ssh?~~ (jbs)
{{{
<IB> Right now only SSH. The plan is to add the ability for the
     user to ask to expose some port ranges in addition to that.
     It's on the todo list and is not complicated.
<AH> 12) They plan to NAT access to VMs, meaning that experimenter resources
     are only available via SSH or maybe in future specifically requested
     port ranges. (Q10 from original list)
     I think we want to know more here, and clarify our concerns and desires.
     Perhaps those 'future plans' are enough, but we need to know more (like
     a schedule).
<JS> 10. Can we have more information about how the IP Address proxy options   
     in the table on p. 4 work? Do the proxies expose all ports or just ssh?
     Ilia had said "Right now only SSH. The plan is to add the ability for the
     user to ask to expose some port ranges in addition to that. It's on the
     todo list and is not complicated."
     That sounds good; is there a timeframe for that?
     Just to make sure the goal is clear, the idea is that experimenters may
     want to run TCP or UDP services on their VMs, and make it possible for
     users to connect to those services via the Internet.
<JS> Answer: The plan is to add this ability, it's on the to-do list, and
     it'll be done by the time the first non-GPO/RENCI racks ship in April.
     (Which of the options in that table are you planning to go with? Or
     will this be a campus-by-campus decision? If the latter, which will
     you recommend? We prefer (C), which seems safe enough if the racks
     are behind a campus firewall, which we assume they will be.)
<IB> This is a campus-by-campus decision. We can deal with either B or C.
     If there are not enough public IP addresses, we have a proxy
     solution. If there are enough, they can be used as is.
<JB> 10. In the IP Address proxy options in the table in section 2.1, at the
     top of page 5 do the proxies expose all ports or just ssh? 
     (Experimenters may want to run TCP or UDP services on their VMs, and
     allow users to connect to those services via the Internet.)
     Answer: The plan is to add this ability, it's on the to-do list, and it'll
     be done by the time the first non-GPO/RENCI racks ship in April.
     ISSUE: Which of the options in that table are you planning to go with? Or
     will this be a campus-by-campus decision? If the latter, which will you
     recommend? We prefer (C), which seems safe enough if the racks are behind
     a campus firewall, which we assume they will be.
<IB> This is a campus-by-campus decision. We can deal with either B or C. If 
     there are not enough public IP addresses, we have a proxy solution. If 
     there are enough, they can be used as is. 
<JB> Ok, that sounds good.
     One other question about this: The ExoGENI racks will not expect that they
     have a dedicated IP subnet for these interfaces, which they need to route;
     but will instead expect that they'll connect to an existing IP subnet (or
     a newly-created one, I suppose), which the campus will route, right? (That
     sounds fine; I ask because it came up when we were deploying the starter
     racks in Chattanooga and Cleveland, so it may come up with campuses too.)
<IB> We don't require an entire subnet. A list of available IP addresses is enough.

}}}
   * ~~10a. Do all outbound connections work for all table options~~ (jbs)
{{{
<IB> Not clear about the question
<JS> I think the question is: "For all the options in Table N (don't have the 
     number handy, but we should cite it), is it the case that there are no
     restrictions on outbound connections?"
<JS> The original 10a said:
     10a. Do all outbound connections work for all table options
     I clarified that what we were getting at here was:
     For all three options in that table, is it the case that there are no
     restrictions on outbound connections?
     We assume not, but wanted to check.
<JS> Answer: Correct, there are no restrictions; all outbound connections
     are permitted. (Although some could be blocked if we needed to for
     some reason.)
<IB> We will not block any outgoing connections on the racks. We cannot
     say anything for the campus.
<JB> 10a. For all three options in that table, is it the case that there are
     no restrictions on outbound connections?
     Answer: Correct, there are no restrictions; all outbound connections are
     permitted. (Although some could be blocked if we needed to for some reason.)
<IB> We will not block any outgoing connections on the racks. We cannot say anything 
     for the campus.
}}}
   * ~~10b.  How does the proxy work for OpenFlow?~~ (jbs)
{{{
<IB> I don't think they are related.
<JC> Proxied IP connections go through the management net, so they
     don't touch the OF switch.
<JS> I think our concern was: If the FlowVisor is reaching out to experimenter
     controllers through the proxy, does that raise any issues? (Relative to
     the alternative of "the FlowVisor connects to experimenter controllers
     directly" -- which may in fact be what happens, if it's on the head node.)
<JS> Our concern here was: If the FlowVisor is reaching out to experimenter  
     controllers through the proxy, does that raise any issues? (Relative to
     the alternative of "the FlowVisor connects to experimenter controllers
     directly" -- which may in fact be what happens, if it's on the head node.)
<JS> The original 10b said:
     10b. How does the proxy work for OpenFlow?
     I clarified that what we were getting at here was:
     If the FlowVisor is reaching out to experimenter controllers through the
     proxy, does that raise any issues?
     If outbound connections are unrestricted, and performance of the proxy is
     good, then this is probably not an issue. But we wanted to raise the
     question because it's a situation where dataplane traffic uses the
     management network, so if the proxy was expected to only have to handle
     experimenter SSH, that might not be sufficient.
<JS> This is superseded by 10a and 10c: There are no proxy/firewall
     restrictions, and no performance issues, that are unique to OF/FV.
<JS> 10b. If the FlowVisor is reaching out to experimenter controllers
     through the proxy, does that raise any issues?
     Answers: If outbound connections are unrestricted, and performance of the
     proxy is good, then this is probably not an issue. We should make sure to
     test this carefully with the initial GPO and RENCI racks, since FlowVisor
     can generate a lot of control traffic.  
}}}
   * ~~10c.  What is the expected performance bottle-neck for proxying?~~ (jbs)
{{{
<IB> Packet forwarding is relatively cheap at reasonable rates. The
     bottleneck will be the connection to the campus network.
<JS> This gets to 10c:
     10c. What is the expected performance bottle-neck for proxying?
     Ilia had said "Packet forwarding is relatively cheap at reasonable rates.
     The bottleneck will be the connection to the campus network."
     Just to put some numbers on this, the theory is that the connection to the
     campus network will not be more than 1 Gb/sec, and we think that the proxy
     can go at least that fast?
<JS> 10c. What is the expected performance bottle-neck for proxying?                 
     Answer: We expect the connection to the campus network to be 1 Gbit or
     less, and that the proxy can go at least that fast.
<IB> The answer is above - we don't think the head node will be the bottleneck. It will be the campus connection.
}}}
 * ~~11. Is ExoGENI software essentially Orca software? How do they differ? ~~ (hpd)
{{{
<IB> Same
<JC> The software is ORCA (and associated stuff like ImageProxy and
     NEuca), but it is configured in a specific way, so we just say
     "ExoGENI" when we're talking about that configuration.
}}}
 * 12. ~~What happens to ExoGENI racks and/or rack functionality if RENCI suffers a network or service outage?~~ (ahelsing: watch this, but Ilia agreed to make deployment choices we wanted)
{{{
<IB> Should not be affected
<JC> For 12, 12b.  Also, old actors cannot see new actors.  Currently
     the actor registry uses SSL connections.  If RENCI goes off
     the net then a site or SM will not be able to restart.  AMs/SMs
     that are running will not be able to refresh their lists, so
     they won't accept any new actors.   If the registry issued
     certs (easy with ABAC), then this problem would go away, but
     it would be harder to revoke...
<JS> This contradicts 12a.
<AH> 1) Question 12: RENCI is a SPOF in your design, due to the RSpec   
     conversion service and Actor Registry.
     It appears that a couple (minor?) changes would mitigate this risk. Let
     us know if we're off base here.
}}}
   * 12a. ~~Will the absence of the RSpec/NDL conversion service mean RSpec-related requests will not work?~~ (ahelsing: RSpec converter being duplicated on all racks)
{{{
<IB> Yes. We can host alternative translators in a number of places
     if it is a concern. We can host a translator on every rack if
     needed and configure its SM to talk to that translator. It is
     a simple stateless web-service.
<AH> 2) RSpec conversion service is a SPOF. (Q 12a from GPO list)
     I think we'd like them to try running it elsewhere as well.
        a) Make the URL a configuration item in racks
        b) Test running it on the head node, to ensure no performance problems
        or library inconsistencies
        c) Consider running a backup version of the service somewhere. GPO?
<CG> I think Ilia said this service is stateless and there's no issue running
     it on the individual racks.  So i don't see any reason not to just run
     it on the individual racks, unless it's a serious resource hog.
<JS> This contradicts 12.
     So that's not "no functionality will be affected". :^p  (I don't think it's
     particularly important to call him on this, just mentioning it as a
     warning to us to keep our eyes open. :^)
<JS> I think we should ask them to have a translater for each SM, unless
     there's a significant cost to that (in which case we should ask them to
     clarify what the cost is).
<AH> a) Please install the RSpec conversion service on all racks, and make
     the URL for the conversion service be a configuration parameter. Be sure
     to test the load on the rack head node, once this and the OpenFlow
     pieces are running there.
<IB> No problems with 12 a or b - this is supported today and is a deployment-time decision. 
<AH> 12a&b is there now? (RSpec converter URL is a config param and actors
     community on the rack among themselves fine on restart) Great.
     If you are comfortable with this deployment choice (run the RSpec
     converter on all racks), then please plan on it.
<IB> 12a is there now because we have a way of statically specifying security
     associations between actors in a config file. The actor registry works on
     top of that filling in whatever is missing. So we can configure the ORCA
     actors in a rack to know about each other statically without relying on the
     registry and they will only learn from the registry about other racks.
<AH> Sounds great
}}}
   * 12b. ~~What impact will the lack of the ORCA Actor Registry have on racks?~~ (ahelsing: answered questions satisfactorily)
{{{
<IB> Everything will continue running. New actors will not be able to see old actors.
<AH> 4) Actor registry is a SPOF. (Q 12b from GPO list)
     This is less worrisome. New actors would be cut off. Racks cannot
     restart successfully.
<AH> 5) The actor registry shows topologies in NDL. Once Ad conversion works
     (GEC13 he says), we should ask them to include a link showing that in
     RSpec as well.
<AH> b) Please ensure that the 3 Orca actors on a rack can communicate with
    each other after rack reboot without re-talking to the Actor registry.
    IE a rack should work as a stand-alone GENI AM even if RENCI is inaccessible.
<IB> No problems with 12 a or b - this is supported today and is a deployment-time decision. 
<AH> 12a&b is there now? (RSpec converter URL is a config param and actors
     community on the rack among themselves fine on restart) Great.
     If you are comfortable with this deployment choice (run the RSpec
     converter on all racks), then please plan on it.
<IB> 12b is trivial since the converter service can be run anywhere and its
     location is a configuration parameter for the rack SM.
<AH> Sounds great
}}}
   * 12c.  ~~Any other impacts?~~ (ahelsing)
{{{
<IB> Can't think of any.
}}}
 * 13. ~~What would fail if the rack Orca XMLRPC interface were disabled?  What does the Orca XMLRPC feature do? Is it critical to the rack functions or just another way to use it?~~ (ahelsing)
{{{
<IB> It is another way to use it. Nothing would fail, but we would
     like to keep it. It is integral to the actor (SM) so there is
     no way for it to fail independently.
<JC> We may plan to add some new management functions through the
     XMLRPC interface, so the answer to this question might change.
}}}
 * 14. ~~Define ORCA AM Delegation to a broker further---is it double delegated?  How is it applied for local broker and ExoGENI broker?~~ (nriga)
{{{
<IB> Probably best to refer to
     https://geni-orca.renci.org/trac/wiki/orca-introduction
<JC> Double-delegated, but this would be site policy under site
     operator control.  A site could reserve resources for local
     use by not delegating them.  For example, they could buy more
     nodes and reserve them for local use.
}}}
   * ~~14a. If delegation to broker is a deployment time decision, what is the plan?~~ (nriga)
{{{
<IB> Delegation must occur for things to work, what is decided is
     how much to delegate. I'd say start with 50/50 for compute and
     probably 80/20 for vlans (local/global)
<NR> Resources at a local rack are delegated to *either* the local
     broker *or* to the ExoSM broker, i.e. the resources *are not*
     double delegated. The original split will be 50-50, i.e. 50%
     of compute resources, vlan tags etc, will be delegated to the
     local broker and 50% to the ExoSM broker. The percentage is
     configurable and each admin can decide on a different split.
     The reconfiguration probably requires changes in a couple of
     configuration files and a restart of some (??) software. Tom
     believes that this might be more complicated since the broker
     have to address problems with existing tickets.
<AH> 7) ExoSM owns half the racks. We may end up preferring to go direct to
     the racks. (Q 14a)
     We should have them document the process of changing that allocation,
     maybe even try it once, to be sure this isn't terribly disruptive or hard.
}}}
 * ~~ 15. What will the flowspace look like for ExoGENI OpenFlow slivers? If the flowspace is based on VLAN tags, will this still be doable if the OpenFlow switch runs in hybrid mode? ~~ (hpd) (this is an adequate first cut answer and the use cases duplicate this - closing)
{{{
<IB> Here are FlowVisor commands (for two ports one vlan slice):
       $ fvctl addFlowSpace 00:c8:08:17:f4:a6:6a:00 10 "in_port=23,dl_vlan=151" "Slice:ilia2=4"
       $ fvctl addFlowSpace 00:c8:08:17:f4:a6:6a:00 10 "in_port=24,dl_vlan=151" "Slice:ilia2=4"
<JC> The term "hybrid mode" is not well-defined.
<AH> 11) The use of OpenFlow vs VLANs, and the capabilities of the switches,
     seems an open and messy question. Josh/Niky/Nick/? need to follow up
      probably.
         - implications of hybrid mode
         - way to do an OpenFlow onramp
         - options instead of a NOX controller per VLAN
         - ways to use OpenVSwitch to do clever things
         - ...
<JS> This is what I expected, and should work fine, although we haven't
     personally tried it much. We could, when we get bamboo up and running again.
<JS> Does he mean that we haven't defined it well, or agreed on a definition,
     or something? Or that we don't know for sure what *IBM* is going to implement?
     I think we have a good definition of "hybrid mode", although the
     definition is different on HP and NEC switches, than on what we think the
     IBM switches will do. But if he just means that we don't yet know for sure
     what IBM is going to do, then yes.
<JS> Do you mean that we haven't defined it well, or agreed on a definition, or    
     something? Or that we don't know for sure what *IBM* is going to implement?
     I think we have a good definition of what we think we mean by "hybrid
     mode", although the definition is different on HP and NEC switches, than
     on what we think the IBM switches will do. But if you just mean that we
     don't yet know for sure what IBM is going to do, then I agree that this is
     an area with some question marks.
<NB> The definition is exactly the same, for this switch, between NEC and
     IBM (same hardware, same software).  The fact that this switch is
     different from other NEC switches is besides the point.  In an
     OpenFlow world vendors are free to decide what they specifically mean
     by the word hybrid - all that hybrid means is that there is an
     openflow datapath instance and non-openflow instance on the same
     hardware, but how those instances interact is left to the vendor.  The
     two most common implementations of a "hybrid" mode are:
      * VLAN-based hybrid mode - the switch handles all traffic tagged in a
     certain VLAN with instructions from the openflow instance.  This often
     puts limitations on VLAN and QoS handling within the openflow instance.
     * Port-based hybrid mode - you literally just "slice" the switch so as
     it if were more than one switch.  Traffic is divided between
     non-openflow and openflow instance based on what port it comes in on
     or goes out on.
     In both cases transitioning the boundary between the openflow and
     non-openflow datapaths is an implementation detail left to the vendor.
     The IBM switch actually supports both modes currently, but the
     non-openflow datapath in hybrid mode is incapable of anything more
     than L2 MAC learning.
     It's probably also worth mentioning that on this switch, while you can
     create your openflow instance with an id of 1 to 16 possibly leading
     you to believe that you could create up to 16 openflow instances, you
     cannot - you can only create 1 - if you make a new instance, it will
     replace the old one (per NEC).
<IB> We must be talking about different switches.
     The BNT switch we tested and is in our specs is a 10G/40G 48-port switch.
     NEC does NOT have it yet - I spoke to them about it.
     The NEC switch at GPO is not a BNT switch, since it is 1G/10G and BNT told
     me they do not have a 1G/10G implementation yet.
<NB> The BNT is an NEC PF 5820.  My comments apply to that switch (not
     other NECs that implement different hybrid modes), and of course to
     hybrid mode in general (just to make sure we were all on the same page
     about what was generally available).
<IB> The BNT switch I tested cannot do VLAN based OpenFlow (yes, you actually
     configure one VLAN to be OpenFlow, but a VLAN is used only as a port-grouping
     mechanism; its tag has no meaning). The only mode the BNT switch supports is
     port-based separation (and right now all ports have to be on that vlan; hybrid
     mode is coming).
<JS> So, just to (try to) close the loop on this, all I originally wanted to
     address was Jeff's comment

       15.  The term "hybrid mode" is not well-defined.

     and my belief is that we do in fact understand what "hybrid mode" means,
     even if we're still not entirely on the same page about whether the IBM
     switch is in fact a NEC PF 5820, or something else.
     Jeff, do you think there's still an open issue here about hybrid mode not
     being well-defined, or are you happy?
<JC> I was simply observing that there is no specification for "hybrid mode".
     I just want to be sure we define our terms.   We're all in    agreement
     about that, right?  Nick said something about "different hybrid modes".
     As I recall, the original question was:
     15. What will the flowspace look like for ExoGENI OpenFlow slivers? If
     the flowspace is based on VLAN tags, will this still be doable if the
     OpenFlow switch runs in hybrid mode?
     I responded "hybrid mode is not well-defined" because I did not understand
     the second part of the question.  If the question is still live, could I ask
     you to restate it more concretely?  There seems to be some concern behind it,
     and I'm not sure what that concern is.
     More broadly, we're still trying to figure out what the implications are of
     BNT's hybrid mode, and how to use it.  I said something about that in my e-mail
     to this list on 1/12.  I said:
     With a better hybrid mode we might be able to stitch and use OpenFlow at the same
     time, but this kind of "real" (to me) hybrid mode is not in the forseeable roadmaps
     for switch vendors.  The weak support for hybrid mode may turn out to be a pretty
     deep problem.  We're still working through the implications.   For example, Ilia has
     pointed out that we're not sure whether a controller can touch any traffic that enters
     and/or exits a non-OF-aware circuit provider or RON, since ports facing those networks
     weren't planning to be in OpenFlow mode.  But that's a separate issue.
<NB> The answer in this case, given different hybrid modes, is generally thus:
     * If a switch implements a port-based hybrid mode (basically
     splitting the hardware into two switches along physical boundaries),
     then the openflow datapath is usually fully capable (with whatever the
     ASIC supports anyhow), which means that you can slice on VLAN tag in flowvisor
     * If a switch implements a VLAN-based hybrid mode (where the
     discriminant to which software path controls the forwarding is the
     VLAN tag) then generally the openflow datapath is *not* permitted to
     match on or modify VLAN tags, which means that you cannot slice on
     VLAN tag in flowvisor.
<IB> For the switches we have picked the hybrid mode is port-based, we should have
     no problems slicing on VLAN tag. I tested it on the existing implementation
    (all ports had to be in OpenFlow mode).
<NR> And just to close the loop on this, this was exactly our concern; whether the FV will be
     able to see/modify VLAN tags when the switch is running in hybrid mode.
     It sounds like the switches that are provisioned for the racks will support port-based
     hybrid mode and thus will allow you to create slivers in the FV based on VLAN tags.
     However, it is probably good to keep this in mind when you are talking with the vendor
     to ensure that this will be possible in the new firmware that will support the hybrid mode.
<JS> Hmm, really? We haven't tried this, but I had thought that something like
     this would work: Say you've got a VM server, which provisions a VM to each
     of two experimenters, and gives each of them a virtual dataplane interface
     on a different VLAN, such that packets leaving the VM are tagged with that
     VLAN; but those virtual interfaces share a physical interface, which is
     connected to a hybrid-mode switch...
     ...Ah, ok, right: So if you want to slice by those VLAN IDs in FlowVisor,
     that port on the switch has to be an *access* port, in an OpenFlow-
     controlled VLAN, so it doesn't strip off the VLAN tags as they come in,
     and sends the tagged packets off to the FlowVisor. This is what you get
     for free with a pure OpenFlow switch; and you can do it on an NEC IP8800,
     but we think you *can't* do it on an HP. (And I don't think we've tested
     performance on the NEC, have we?)
     Anyway, probably not relevant to ExoGENI, with port-based hybrid mode;
     apologies for the tangent. :^)
<NB> Just to continue this tangent for one email more...
     The HPs can do this, it's what they call aggregation mode.
<JS> Rephrasing the question: If the general model is that the flowspace
     is sliced based on VLAN tags, will this still work if the OpenFlow
     switch runs in hybrid mode?
     Answer: Yes, this will work, because in port-based hybrid mode, each
     port will either be OpenFlow-controlled or not, and the OF-controlled
     ones will not be part of a VLAN, they'll just be part of a datapath.
     (Note that this might *not* work, however, in VLAN-based hybrid mode,
     because then you've got VLAN tags within a VLAN. This can probably be
     made to work, but it may require additional/different configuration.
     Shouldn't be an issue, but we should keep it in mind in the unlikely
     event that something changes from what we expect about how the switch
     does hybrid mode.)
<IB> Our BNT/IBM switches will support port-based hybrid mode. We think it
     will work.
<JB> 15. If the general model is that the flowspace is sliced based on VLAN
     tags, will this still work if the OpenFlow switch runs in hybrid mode?
     Answer: Yes, this will work, because in port-based hybrid mode, each port
     will either be OpenFlow-controlled or not, and the OF-controlled ones will
     not be part of a VLAN, they'll just be part of a datapath.
     (Note that this might *not* work, however, in VLAN-based hybrid mode,
     because then you've got VLAN tags within a VLAN. This can probably be made
     to work, but it may require additional/different configuration. Shouldn't
     be an issue, but we should keep it in mind in the unlikely event that
     something changes from what we expect about how the switch does hybrid mode.)
<IB> Our BNT/IBM switches will support port-based hybrid mode. We think it will work.
}}}
 * ~~ 16. In figure 1 (ExoGENI Rack overview), how do you expect the Dataplane link to Campus OpenFlow network to be set up and configured?   Will it require manual setup? Are there any implications?  Do we expect FOAM to be used to request and approve these OpenFlow connections? ~~ (hpd) (this is an adequate first cut answer and the use cases duplicate this - closing)
{{{
<IB> Affirmative on FOAM. We assume it is a connection (10G or
     downconverted to 1G) to some OF-enabled campus switch.
<JC> This link is optional, and it doesn't have to go to an OF
     network.  It can be used to pipe campus VLANs into the switch,
     for interconnection with slices under OpenFlow control.
<JS> I'll poke at the details of this more in my use case follow-up.
}}}
 * ~~ 17. Is Internet2 Dynamic Network System (DYNES) supported? ~~ (hpd)
{{{
<IB> DYNES uses ION (dynamic circuit service on Internet2). ION is
     supported because we support OSCARS - the software behind ION.
}}}
 * 19. Stitching support:
   * 19a. ~~Confirm ION/Sherpa/OSCARS all comes through the RENCI SM?~~ (ahelsing)
{{{
<IB> Yes
}}}
   * 19b. ~~Can I connect to racks without the RENCI SM?~~ (ahelsing: but see g)
{{{
<IB> If you have external stitching tool
}}}
   * 19c. ~~RENCI SM and other racks coordinate via ORCA private interfaces?~~ (ahelsing)
{{{
<IB> Yes
}}}
   * 19d. ~~What resources are allocated to the ExoSM?~~ (jbs)
{{{
<IB> See 14a plus all the intermediate network providers (LEARN,
     BEN, NLR, I2, ANI, NOX etc)
<JC> 'Allocated' is a strange term to use.  "visible" would be a
     better term.
<JS> I think there's a difference between "visible", in the sense of "the ExoSM
     is aware of them", and "allocated" or "delegated", in the sense of "ExoSM
     manages them, and the local SM doesn't".

    Side note: Is there any chance we can reconcile terminology here, or are
    we going to be talking about "GENI AM, by which we mean ORCA SM", and
    "ORCA AM, which is not a GENI AM at all", and so on, for the rest of this
    project? :^\
<AH> We should have them document the process of changing that allocation,
     maybe even try it once, to be sure this isn't terribly disruptive or hard.
<JS> I think we can assume that they'll document this, plan to try it out,
     and pester them if they don't document it. I think we're set here.
}}}
   * ~~19e. Can experimenter go to racks separately for compute and then to the ExoSM just for the links? ~~ (hpd)
{{{
<IB> This mode is not supported
<JC> This is not supported BECAUSE it means that an AM could be
     asked to operate on the same slice by two different SMs/controllers,
     and ORCA does not support this.  (It is a limitation we probably
     should not have.)
}}}
   * ~~ 19f.  Who is writing the Internet2 AM? ~~(hpd) (RENCI wrote the code already - closing)
{{{
<IB> The code is already there, we need a physical connection (at
     StarLight would be best).
}}}
   * 19g. ~~Does Single rack manifest expose external VLAN tag so I can stitch?~~ (ahelsing: they'll do this by GEC13)
{{{
<IB> We'll work on it. We can add an external-facing port as part
     of an internal slice.
<AH> 1) They do not currently support stitching resources you get from a
     single rack. (Q 19g)
     We should ask them to:
       a) expose the VLAN tag they have allocated in the manifest
       b) accept a VLAN tag allocated elsewhere in their request, and use it if
       it is available (else fail)
     I think this is something we want to get them to do. Definitely (a),
     even better (b). Ilia said they would work on it. I want to press for it
     to be done by GEC13 and in the initial racks.
<AH> 19g) Please support stitching of rack resources by GENI tools for this 
     year's racks - ideally by GEC13.
     Specifically:
     - More important: Support a request to a rack for compute resources and
     a VLAN out, where the resulting manifest specifies the allocated VLAN,
     such that this can be stitched to the next aggregate in the network.
     - Less important: Accept a VLAN tag in a request for a VLAN that the
     next aggregate in the network has allocated, and try to use it at that
     rack if it is available (failing the request if it is not available is
     expected at aggregates that do not support VLAN translation).
<IB> 19g.1 will be available.
<AH> Do you think you'll have this circa GEC13? April? September?
<IB> 19g.1 We will try to get it by GEC13. It's not that much work.
<AH> Sounds great

<IB> 19g.2 less likely to be available soon (please show me an RSpec request for this).
<AH> 19g.2: The closest I have is the sample requests to the PG aggregate in
      gcf/examples/stitching/libstitch/samples:
       http://trac.gpolab.bbn.com/gcf/browser/examples/stitching/libstitch/samples/utah-equest.xml?rev=795c50b86faf82f0fa8696d80005424e0b2089af

      Assume you were specifying a VLAN tag to the PG AM to stitch to ION:
      Within the stitching extension, at hop 3, you would specify both
      vlanRangeAvailability and suggestedVLANRange of the allocated VLAN tag.
      Presumably you could do something additionally in the <interface>
      element within the <node client_id="left">
      As I said, this is lower priority.
<IB> 19g.2 we'll see
<AH> Sounds great
}}}
 * ~~20. Authorization: Orca APIs use what? Same as GENI? To what extent is this not exactly the same policies as the GENI APIs?~~(see TM comment below)
{{{
<IB> Almost same as GENI. Some validity checks are disabled.
<JC> I think Ilia interpreted this as a question about "Alt-G".
     Maybe it was.  As for the internal APIs: In the ExoGENI
     configuration every ORCA AM will trust every registry-approved
     SM to validate a request before passing it to the AM.  AMs
     only check that the SM is registry-endorsed.
<AH> 6) Orca authorization (for their private APIs) apparently use similar
     checks to GENI. (Q20)
     We should ask exactly what they changed, to be sure it isn't worrisome.
     I don't have any real worries here though.
<TM> This item can be closed. The ORCA APIs will effectively provide the same 
     authorization as the GENI APIs by accepting identity certificates from 
     known and trusted certificate authorities, namely the GPO and ProtoGENI 
     CAs. While the GENI AM API requires credentials, there is no impediment 
     to getting those credentials for any registered user today. It is nonsensical 
     to require ORCA to build out additional infrastructure to require and honor 
     those credentials through their own APIs.
<JS> Hmm, so I have a question about this: If having a GENI user certificate is
     all you need in order to get an ExoGENI sliver via the ORCA API, does that
     mean that you wouldn't necessary have a GENI slice that contains your sliver?
     If that's not correct, and you do have a GENI slice: Where does that GENI
     slice come from?
     If that is correct, and you don't have a GENI slice: Is that ok, or will
     it cause other problems? (e.g. with things that assume that any allocated
     resource is part of a sliver, which is part of a slice, which is owned by
     a user -- if there's no slice, that chain may break down.)
<IB> It is correct that in that case there is no slice as far as SA is concerned. 
     This is where we get into the 'what is a sliver and what is a slice' argument. 
     What ORCA creates are in fact slices, not slivers. We just call them slivers 
     when GENI AM API is invoked. 
     I would say that anyone who cares about this, should use GENI AM API on ORCA SMs
     and these problems go away. You get weaker stitching in that case. Tradeoffs, as usual.
<JC> We agree on a base principle: ExoGENI will allocate resources only to registered 
     GENI users who have been granted rights by a GENI-approved trust root to allocate 
     resources on GENI.
     Is that enough of an answer to close down this item?
     In the short term, if the only way to get proof that a given user is authorized to 
     allocate resources on GENI is for that user to obtain CreateSliver rights to some 
     slice (any slice) on GENI, then that is what we will do.  Once we have that proof, 
     we can use it in various ways.
     Ideally there would be better ways to get that proof, and so the answer may change 
     if and when better ways become available.
     As for "GENI slice that contains your sliver", well...I think it's a long discussion 
     what that means, exactly.  Please, let's not have that discussion now. 
TO-ADD MORE LN
}}}
 * ~~21. What is the Actor registry used for? Is this an alternative non GENI way to authenticate inter-rack communications?~~ (hpd)
{{{
<IB> Yes, it is a way to manually approve actors joining into ExoGENI. There is no GENI way to authenticate inter-rack communications.
}}}
 * 22. ~~OpenFlow rack as onramp: when will this be supported?~~ (jbs)
{{{
<IB> Jeff Chase wants it yesterday. Realistically some time this year.
<JC> There is an MS student (Ke Xu ... Jessie) working in this area
     at Duke.  We are not sure what she can do yet.  We might be
     asking her to try some stuff on the GENI OF resources.   I
     think on-ramp is easy, but OpenFlow has to work.  That's the
     hard part.
<JS> I've heard the phrase "onramp" a couple of times, but don't know exactly
     what it means. Is it just use case 4? If not, is there a definition somewhere?
<JC> On-ramp is a stitch between private links owned by two different
     slices, by mutual consent of both slices. It is the moral equivalent
     of slices peering their virtual networks.
}}}
   * ~~ 22a. Are there conflicts between FOAM and Orca mechanisms to create FlowVisor rules? ~~ (hpd) This question is being replace by the new GPO question 29.
{{{
<IB> Hopefully FlowVisor will flag it.
<JC> No.
<AH> 9) Orca uses FlowVisor directly, opening up the possibility of conflicts
     between FOAM and Orca.
     The solution would be for Orca to use FOAM, once a pluggable API there
     exists, but it doesn't yet. ''We need to keep an eye on this.''
<AH> I'm pretty sure it won't, which is a point in favor of ORCA -> FOAM -> FV
     rather than ORCA -> FV + FOAM -> FV.
<JS> I'm pretty sure FlowVisor won't -- in particular, there's nothing
     fundamentally wrong with creating flowspace rules in FV that describe
     overlapping flowspaces. But you usually don't get what you want,
     especially if "you" are multiple experimenters who aren't even aware of
     each other's slivers.

     To my mind, this is a point in favor of having ORCA talk to FOAM, rather
     than having both ORCA and FOAM talk directly to FlowVisor.

     That said, it's certainly possible for both ORCA and FOAM to find out the
     flowspace on the FlowVisor, and to use that to avoid allowing people to
     have overlapping flowspace. But they have to actually do that explicitly.
}}}
 * ~~ 23. A few monitoring items are marked incomplete: Dates&  plans? ~~ (hdempsey)
   * ~~23a pubsub event feed to GMOC (is GMOC ok with your plan?): (Chaos is tracking in GST [ticket:3369])~~
{{{
<IB> GEC13
<CG> Per Jon-Paul, GMOC originally proposed the pubsub model and
     will support it, details TBD.
<CG> FYI: GMOC has agreed to send someone to tomorrow's monitoring call who
     can talk about the pubsub proposal they made for RENCI.  So we should
     know more by then about what has been proposed for slice monitoring
     data submission (question 23A), what the timeframe is, etc, and ideally
     that will lead to us having a more intelligent opinion about whether we
     like it.
<CG> 23a. Submitting per-slice/sliver relational data to GMOC:            
     This has been discussed a little bit on the monitoring@geni.net list.
    It sounds like both ExoGENI and GMOC can provide support to get an
    approach involving XMPP's pubsub protocol working.  GPO will work with
    GMOC to make sure that the per-slice/sliver data which is stored can
    be used across experiments with ExoGENI and non-ExoGENI pieces.
    So i think we are all set on this question.
<IB> I think we're all on the same page here.
<CG> I want to follow up again on these to make sure we are working from   
     the same assumptions.  If ExoGENI and GMOC can negotiate an approach
     using pubsub or direct nagios communication, and both sides can do the
     respective work to get that interface running, that's fine with GPO.
     Our desire here is just to have some useful data be submitted from each
     rack to GMOC (or polled from each rack by GMOC) reasonably often.
     However, the interface which already exists is the XML-over-HTTPS data
     submission API developed by GMOC.  This API is active and usable for
     time-series (operational measurement) data right now, and GPO and GMOC
     are working to make sure it will be ready for relational data (e.g. slice
     metadata) within the next couple of months.  ExoGENI racks will need to
     interface with these APIs as a minimum offering.
     Again, if ExoGENI and GMOC do the work between you to support something
     you both like better, that's very likely to be a fine substitute.
     Otherwise, the XML-based API is a "least common denominator" solution,
     and RENCI should submit data using it.
<IB> My main concern is GMOC's work so far has been outside of the GENI I&M 
     framework which may or may not be a good idea. I'm attempting to bring 
     everything under one roof by using the XMPP bus that will also be used for 
     GENI I&M to submit GMOC-relevant data and see if it flies.
<HD> Ilia,
     This is not going to work.  The GENI I&M framework is not fully implemented 
     yet,  and may not be for considerably longer than it takes for us to field 
     the racks this year.  The GMOC interface predates the I&M framework, and has
     already been in use in the mesoscale for well over a year.  No GMOC-relevant 
     operations data should be submitted via I&M, which is specifically for 
     experimenter-relevant data.  The GMOC has been part of the I&M project in order
     to make sure that it was possible to distribute operations data into the I&M 
     framework if there was demand for that among experimenters.
     It may be that the GMOC evolves their interface to be more like the I&M interface 
     for simplicity and ease of programming, especially for the aggregate providers.  
     However, we can't count on that for Spiral 4.
<CG> To clarify: we have no objection to the use of XMPP per se --- GMOC's
     generic interface for data submission uses XML sent via HTTPS, but, if
     data is collected or sent some other way, that's fine.  As Heidi said, we
     just want to see operational data transmission from each rack to GMOC's
     operational monitoring database during this spiral.  I believe using
     the existing data submission API is the most straightforward way to do
     that this year.  However, if the anticipated benefits of XMPP outweigh
     the extra work, and the work can be done this spiral, that's fine.
<IB> There are enough commonalities between what GMOC wants and what the 
     experimenters want that their work and I&M will converge.
     By March we should have an XMPP bus with GENI authn/authz available 
     (as part of our IMF project with Harry) via which we should be able 
     to make data available to GMOC and at the same time make it available 
     to anyone else with proper GENI credentials.
<CG> I can think of two concerns about using this approach in the short term:
 1. "Proper GENI credentials" sounds to me like an experimenter being
    able to get access to his own experiment's data.  That's not the
    same thing as operational monitoring data [1], which isn't going
    to be per-sliver, but rather might contain some amount of metadata
    about all slivers, and information which is not per se about slivers.
<IB> You're assuming experimenters only need access to data from their own 
     slices. I disagree. There are circumstances where an experiment in 
     GENI means looking at other experiments.
<CG> Do you have an implementation for allowing a non-experimenter
    operations group like GMOC read-only access to broader monitoring
    data using a GENI credential?
<IB> Working on it.
<CG>  2. On the GMOC end, there needs to be code to use this XMPP interface,
    acquire data, and put it into some operational database at
    http://gmoc-db.grnoc.iu.edu/.  If this is going to be done using
    a new interface rather than an existing one, someone will need to
    write that code.
<IB> We have example code that can get data from Pub/Sub. 
}}}
   * ~~23b thing with U Alaska from individual VMs~~: VMI would be nice to have, but is not critical for rack design, so i think we are satisfied with what we know here. (Chaos)
{{{
<IB> Need to ask them
<CG> Question 23B says "thing with U Alaska from individual VMs".  What does
     this question mean, and who asked it?  I am pretty sure it was not me,
     though a lot can happen in two days
<CE> I believe that RENCI wants to use the U. Alaska "virtual machine
     introspection" software to provide monitoring of what's happening
     inside individual VMs (based on my quick read of the design doc).
<CG> Ah, thanks Chip.  That's helpful, and, indeed, i see this in section
     4.3 of their proposal: <<Proposal text not enclosed here>>
     To me, this sounds useful if they can do it, and not critical if they
     can't.  Who put it on our list of design review questions, and what is
     your concern about it?
<HD> Aaron put it on our list and Ilia had no more information about it at
     this point--I think Aaron was just pointing out that the information
     about it was incomplete in the document, which is true.   I've talked
     to Brian from U of A a few times about the VM introspection software
     and think he would be a good collaborative addition to the team if he
     and Ilia work this out.  As you say, it won't be critical if they don't.
<AH> I put it on the list, and I just wanted a date. They said, in not so
     many words, 'we want to use this'. So I wondered when.
<VT> The VMI project has a milestone to demonstrate VMI in a Eucalyptus
     cluster environment at the March GEC.  They are working with Renci on
     getting this technology into Eucalyptus clusters that federate using
     the Orca framework (to be demonstrated in July).
}}}
   * 23c ~~Nagios interface to GMOC (is GMOC ok with your plan?)~~: (Chaos, GST [ticket:3369])
{{{
<IB> I don't know if GMOC is fully OK with it, but we prefer Nagios to the homegrown solution.
<CG> We don't yet know what will be doable for GMOC.  Mitch McCracken,
     the GMOC staffer who maintains the time-series data submission API,
     will be on our monitoring call tomorrow afternoon.  If you or Jonathan
     Mills or someone else from RENCI would like to be on that call and talk
     in more detail about what you'd like to do, what GMOC would need to do
     to support it, and why you prefer it, that would be a good next step.
     Mitch is new to maintaining the API and coming up to speed, so we won't
     make any decisions on the spot, but it would be a good forum for sharing
     information and understanding a bit better what you'd like to do.
     Let me know if you need more information about the call --- i know at
     least Jonathan has attended it before.
<CG> With GMOC's Mitch's permission, i asked RENCI to send someone to the
     Friday call to talk about operational time-series/event? data submission
     (question 23C).  Mitch's short answer was that he probably wants something
     more centralized than whatever RENCI is proposing, but he's interested
     in understanding more about what RENCI actually wants, and so am i,
     so hopefully they will show up and talk about it, and, again, we can
     use that to figure out whether we like their answers.
<IB> Jonathan Mills (who was present at the review) also attends the monitoring calls.
     He is in charge of modifying Nagios to our needs.
<JM> Yes, and I am planning to attend the next ExoGENI monitoring call.
<CG> At the monitoring call, Jonathan and Mitch agreed that it would
     be easy for RENCI to submit data from Nagios via the GMOC
     time-series data submission API.  They have started working
     on this on monitoring@geni.net.  I am satisfied that there is
     general agreement about what to do.
<CG> 23c. Nagios interface to GMOC:                                               
     This has also been discussed on monitoring@geni.net, and the consensus
     here is that ExoGENI and GMOC will work together to write a stub
     which sits on each rack's Nagios aggregator, and submits information
     from Nagios's status.dat file to GMOC via some data exchange format
     (probably the GMOC data submission API, but if ExoGENI and GMOC prefer
     something different, that is fine).  This sounds like it should not be
     a lot of work beyond what has already been done to get Nagios working
     on the racks, and it's already being worked on.  So that's great too.
<IB> I would suggest using the same XMPP pubsub mechanism as is used for
     manifests. They will also have access to the browser interface in Nagios.
<CG> As long as each rack is submitting its own operational time-series data
     directly to GMOC, i think whatever mechanism is easiest for Jonathan
     and Mitch is fine.  Since operational data might be used to help during
     an outage, we do want to make sure as much data submission as possible
     continues to work during an outage.
<JM> I have one thing to add, as an alternative way of getting the information
     out of Nagios itself, which is to directly query the LIvestatus broker.
     Livestatus is a Nagios broker module which can be queried in various ways.
     Broker modules are loaded into Nagios when the daemon launches, and thus
     they have direct access to its internal memory tables.  Queries in this
     manner are the fastest because no intermediate action occurs (for instance,
     writing to status.dat is not necessary; neither is writing to a SQL db with NDO).
     Because it is reading object status directly from Nagios's memory, the results
     are always 100% up to date......no time delay.  The broker module is already
     installed on any Nagios installation that I set up, because it is a required
     component of Check_MK.  It can be queried from either a TCP or Unix socket.
     Details can be found here:  http://mathias-kettner.de/checkmk_livestatus.html
     While this method of "getting at the data" has lots of upsides, it could require
     a rethinking of how the pubsub model would fit.  It necessarily shifts us from
     parsing/translating a file on disk (status.dat) to having to actively query something.
<IB> Querying/polling won't be a problem. This sounds like an interesting approach.
<CG> I want to follow up again on these to make sure we are working from   
     the same assumptions.  If ExoGENI and GMOC can negotiate an approach
     using pubsub or direct nagios communication, and both sides can do the
     respective work to get that interface running, that's fine with GPO.
     Our desire here is just to have some useful data be submitted from each
     rack to GMOC (or polled from each rack by GMOC) reasonably often.
     However, the interface which already exists is the XML-over-HTTPS data
     submission API developed by GMOC.  This API is active and usable for
     time-series (operational measurement) data right now, and GPO and GMOC
     are working to make sure it will be ready for relational data (e.g. slice
     metadata) within the next couple of months.  ExoGENI racks will need to
     interface with these APIs as a minimum offering.
     Again, if ExoGENI and GMOC do the work between you to support something
     you both like better, that's very likely to be a fine substitute.
     Otherwise, the XML-based API is a "least common denominator" solution,
     and RENCI should submit data using it.
<IB> My main concern is GMOC's work so far has been outside of the GENI I&M 
     framework which may or may not be a good idea. I'm attempting to bring 
     everything under one roof by using the XMPP bus that will also be used for 
     GENI I&M to submit GMOC-relevant data and see if it flies.
<HD> Ilia,
     This is not going to work.  The GENI I&M framework is not fully implemented 
     yet,  and may not be for considerably longer than it takes for us to field 
     the racks this year.  The GMOC interface predates the I&M framework, and has
     already been in use in the mesoscale for well over a year.  No GMOC-relevant 
     operations data should be submitted via I&M, which is specifically for 
     experimenter-relevant data.  The GMOC has been part of the I&M project in order
     to make sure that it was possible to distribute operations data into the I&M 
     framework if there was demand for that among experimenters.
     It may be that the GMOC evolves their interface to be more like the I&M interface 
     for simplicity and ease of programming, especially for the aggregate providers.  
     However, we can't count on that for Spiral 4.
<CG> To clarify: we have no objection to the use of XMPP per se --- GMOC's
     generic interface for data submission uses XML sent via HTTPS, but, if
     data is collected or sent some other way, that's fine.  As Heidi said, we
     just want to see operational data transmission from each rack to GMOC's
     operational monitoring database during this spiral.  I believe using
     the existing data submission API is the most straightforward way to do
     that this year.  However, if the anticipated benefits of XMPP outweigh
     the extra work, and the work can be done this spiral, that's fine.
<IB> There are enough commonalities between what GMOC wants and what the 
     experimenters want that their work and I&M will converge.
     By March we should have an XMPP bus with GENI authn/authz available 
     (as part of our IMF project with Harry) via which we should be able 
     to make data available to GMOC and at the same time make it available 
     to anyone else with proper GENI credentials.

<CG> I can think of two concerns about using this approach in the short term:
 1. "Proper GENI credentials" sounds to me like an experimenter being
    able to get access to his own experiment's data.  That's not the
    same thing as operational monitoring data [1], which isn't going
    to be per-sliver, but rather might contain some amount of metadata
    about all slivers, and information which is not per se about slivers.
<IB> You're assuming experimenters only need access to data from their own 
     slices. I disagree. There are circumstances where an experiment in 
     GENI means looking at other experiments.
<CG> Do you have an implementation for allowing a non-experimenter
    operations group like GMOC read-only access to broader monitoring
    data using a GENI credential?
<IB> Working on it.
<CG>  2. On the GMOC end, there needs to be code to use this XMPP interface,
    acquire data, and put it into some operational database at
    http://gmoc-db.grnoc.iu.edu/.  If this is going to be done using
    a new interface rather than an existing one, someone will need to
    write that code.
<IB> We have example code that can get data from Pub/Sub.
}}}
 * 24. Rspec support questions:
   * ~~24a. When will you support GENI v3 RSpecs - part of the GEC13 completion? ~~(hpd)
{{{
<IB> That's the goal. The differences are not that significant.
}}}
   * 24b. When will you support what RSpec conversions? Can you send sample manifests and advertisements? When can we test?
{{{
<IB> Will send separately. Testing can be done now (Luisa has).
<AH> 3) Ilia offered sample manifests. We should ask for those - to start
     checking that they include what we expect. (Q 24b)
}}}

 * 25. Have you tested performance of a single management node with a full load of running software (FV and OpenStack/Euca head and GENI services and monitoring etc.?  Or Is FV on a separate VM?
{{{
<IB> Everything but the OpenFlow components. It's not much of a
     load. Supporting FV still an open question in terms of performance
     needed.
<HD> To Nick: Do you have enough info yet to know whether FV on a VM in 
     ExoGENI rack will be OK at this point?  Have you given Ilia any 
     more info about what FV needs for acceptable performance in GENI?
<NB> I have no further information from Ilia, and he has not requested any
     information from me in reference to this.  My understanding is that he
     is aware that they have not characterized the FlowVisor workload and
     still need to do so.  (I also have other concerns about software
     interaction and compatibility as expressed on the mailing list).
}}}
 * 26. On layer 2 dataplane connectivity testing: Do you envision a long running slice where we can allocate VLANs to test as needed? What happens if the AM is unreachable/down?
{{{
<IB> If AM is unreachable, you can't provision a VLAN. When the
     VLAN is up it should stay up regardless of AM status.
<JS> We should bang on this a little more, and understand whether our
    monitoring stuff will in fact be in a slice, or a non-GENI thing. (I don't
    feel strongly about it, and don't recall now if we concluded that we
    preferred one or the other.)
<GC> 26. Dataplane reachability testing:                               
   We think it would be a good idea to have two types of tests to go
   with the two types of VLANs:
   * Where the ExoGENI AM is used to provision a VLAN, we'd like to see
     a test which stands up a VLAN, verifies that it can be used,
     and reports to monitoring on whether that entire system (which
     includes the AM, of course) is healthy.  I believe you discussed
     doing something like this already: does what i just said sound
     similar to what you have in mind?
<IB> I think so. A simple reachability test would not be difficult to do,
     but currently is not a high priority.

<CG> * Where an ExoGENI rack is going to be connected to a static
     (long-standing) VLAN outside of the rack, e.g. to the shared
     mesoscale VLANs or to a longstanding L2 connection to non-rack
     resources at a particular site, we'd like to see a static test
     interface on each VLAN which could be used to verify connectivity.
     It would be ideal if the test interface were non-OpenFlow-controlled
     on the rack, so that it could be used entirely to test "is this
     link up?".  Does this seem reasonable?
<IB> yes, but not with the current version of the switch which is OpenFlow
     all-or-nothing. When we have the hybrid mode towards the end of the year
     this should be possible.
<CG> Good point --- if everything on the dataplane switch is
     OpenFlow-controlled, then a non-OF-controlled testpoint is not possible.
     However, i think it would still be possible to place a static test
     interface on a static VLAN which reports to e.g. FOAM.
<NB> The external VLAN connection can be established and tested regardless
     of the hybrid-ness of the switch.  If the goal is merely to establish
     whether a link an external interface is up or down, this can be done
     within or without openflow, regardless of the state of the switch (the
     ability of a switch to determine port up/down status, electrical,
     protocol, or administrative, is independent of the openflow
     implementation).  If you are truly determined to require a
     non-openflow port to test (electrical?) connectivity, you can do that
     with the current BNT firmware as well (ports can be configured to be
     non-openflow, they just have no real features beyond that of a
     standard L2 learning switch, but those would suffice for this
     purpose), but there's no particular reason why this port can't be
     openflow controlled.
<IB> I think Chaos wanted to have an interface with an assigned IP address
     internal to the switch that can be used for L3 reachability testing.
<CG> L2 reachability testing, really.  Sorry if i've been unclear: the goal
     here is to have some tools to detect problems with shared VLANs.  If an
     ExoGENI rack participates in a core VLAN, but can't reach other things
     on that VLAN, then an experimenter might want to know something is wrong
     before trying to provision something attached to that VLAN on that rack.
     In practice, you don't want to provision too many different test
     resources, and there is a tradeoff between having a test which is most
     similar to an experiment ("if this test works, it is very likely that
     this resource is healthy enough for an experimenter to use") vs. having
     a test which gives you more information about what is wrong ("this test
     does not depend on OpenFlow, so, if this test fails, it tells us there is
     a connectivity problem caused by something other than OF on this rack").
     I think it's fine with us to have a test which runs in a sliver, but
     if we're testing a static VLAN to which an experimenter would connect
     by e.g. reserving resources using FOAM, the resource test should use a
     FOAM sliver.  An advantage to setting up a test interface which doesn't
     depend on a local sliver, is that people can use that interface for
     reachability testing without having to maintain that local sliver.  But,
     in fact, all of our core testing uses slivers somewhere --- there's no
     good way around that.  It just has to be feasible for that sliver to be
     used for frequent/automated testing.
<IB> The easiest way to test l2 reachability is to configure l3 addresses on
     two endpoints of a Vlan. With traditional switches you can  configure an
     interface on a Vlan and assign it IP address (internal to the switch).
     I thought that is what you meant. We do it here periodically (manually)
     between our switches.
<NB> Sure, I would use FlowVisor or FOAM for the first test (no sliver
     required to know what ports are up, or whether the datapath seems to
     be available at all), and SNMP for the second (also no sliver
     required).
     If you want this information centrally (via GMOC or something) we
     should probably offer a read-only FOAM monitoring API, since getting
     the information via the existing admin API seems like a bad idea, but
     that's a trivial problem.
<CG> Sorry i never got back to this.  This kind of approach is fine with us         
     for testing of static VLANs, and is indeed what we had in mind.
}}}
~~ * 27. We want to share the final ExoGENI rack parts list and rack diagram (when you finish it) on the GENI web site.  OK with you? ~~ (Luisa Nevers, see notes below)
{{{
<IB> It's available here:
       https://docs.google.com/document/d/1hzleT6TNmiDb0YkkgqjXxPFJ37P4O6qApLmXgKJHBZQ/edit
<LN> Collected remaining information for the parts list.  See:                                           
     http://groups.geni.net/syseng/wiki/GENI-Infrastructure-Portal/GENIRacks#ExoGENISpecifications 
     Checked on wiring rack diagram and found from Brad Viviano that the diagram will be 
     available after the GPO rack is assembled.
<LN> Email exchange on Feb 13 to gpo-infra included a wiring diagram which is attached as file
     named Rack-diagram-wiring.xls
}}}
 * 28. How does a site admin control resources which have been allocated to ExoSM and are controlled centrally by RENCI?"
{{{
<IB> ORCA configuration files. There is an actor configuration file
     (XML) and a resource description file (scary NDL-OWL).
}}}
 * ~~29. In the design review, ORCA indicated that they started a NOX instance to communicate with FlowVisor to communicate ExoGENI OpenFlow requests.  Nick Bastin said he would like to replace this with an API to FOAM. Nick and Ilia promised to follow up. This question should also address conflicting FlowVisor requests capture in Q 22a.~~ (jbs)
{{{
<JS> Then there was 22a:                                             
     22a. Are there conflicts between FOAM and Orca mechanisms to create
     FlowVisor rules?
     I had said "To my mind, this is a point in favor of having ORCA talk to
     FOAM, rather than having both ORCA and FOAM talk directly to FlowVisor."
     Nick also liked this idea; have you guys talked about it any further?
<IB> The implementation we have today talks to FlowVisor. We prefer this
     method because it bypasses the need to create OF RSpec. Also, based
     on discussions with Nick, he is unwilling currently to modify FOAM
     RSpec to what we need.
<JS> Hmm, I think there may be some confusion about this: I don't think that
     Nick was proposing that ORCA should write and submit rspecs to FOAM, but
     rather than ORCA would talk directly to a FOAM API. He plans to write a
     plugin API, which others (or he) can then use to write plugins to talk to
     FOAM via arbitrary custom APIs; most immediately, he says he'd be happy to
     write a custom FOAM API for ORCA. But rspecs don't enter into it at all in
     any case.
     The custom ORCA - FOAM API could look pretty much however you want. For
     that matter, it could be identical to the FlowVisor XMLRPC API -- but
     going through FOAM means that you can be aware of other FOAM slivers, and
     that FOAM is aware of ORCA-created slivers, for free.
<JB> 29. Will ORCA talk directly to FlowVisor, or to FlowVisor via FOAM?          
     require some FOAM development work, but Nick is eager to work on this.
     ISSUE: Nick and the ORCA folks should talk about timeframes, to make sure
     that Nick can do what the ORCA side needs, in time for them to use it, but
     he doesn't think it would be a problem to get it done very quickly.
<IB> The implementation we have today talks to FlowVisor. We prefer this method 
     because it bypasses the need to create OF RSpec. Also, based on discussions 
     with Nick, he is unwilling currently to modify FOAM RSpec to what we need. 
<IB> That's not to say I'm blaming Nick - he has his reasons. FlowVisor at this 
     time presents what appears to me the most stable and easy to program interface. 
     As the code and RSpec evolve in the future we can revisit this question.
<JS> Hmm, I think there may be some confusion about this: I don't think that
     Nick was proposing that ORCA should write and submit rspecs to FOAM, but
     rather than ORCA would talk directly to a FOAM API. He plans to write a
     plugin API, which others (or he) can then use to write plugins to talk to
     FOAM via arbitrary custom APIs; most immediately, he says he'd be happy to
     write a custom FOAM API for ORCA. But rspecs don't enter into it at all in
     any case.
<IB> I may be mistaken about the current state of FOAM. I thought it supported 
     GENI AM API, and that requires RSpec. Is there another interface and what is it? 
<NB> The point here is there *can* be another interface (there are already
     4 API interfaces, soon to be 5 when we add GENI AM API v2), so there's
     not really any problem adding one for exogeni (or, alternatively, just
     making one that looks like the flowvisor XMLRPC interface).
<IB> I don't have a problem with that when it becomes available. 
<JS> The custom ORCA - FOAM API could look pretty much however you want. For
     that matter, it could be identical to the FlowVisor XMLRPC API -- but
     going through FOAM means that you can be aware of other FOAM slivers, and
     that FOAM is aware of ORCA-created slivers, for free.
<IB> I expect a fairly hard partitioning between label spaces that FOAM operates in 
     and ORCA operates in, so this may not be a serious issue, but it may be worth discussing.
<NB> The intention would be for FOAM to be aware of the sliver URNs if at
     all possible.  Obviously this wouldn't be possible out of the box if
     we merely emulated the FV XML-RPC API, but if we added an extra
     parameter or two to CreateSlice it would be relatively easy
     information to provide.
<IB> I don't think I followed that. Which sliver URNs? 

<JS> We (GPO) think this would be worth spending a ten minute phone call to
     talk about, and we'd like to help facilitate that (whether you end up
     going this route or not -- we just want to promote communication); any
     chance you'd be available later this afternoon? Or maybe Monday?  
<IB> That's fine. Next week is better. Please use Doodle.                            
<NB> The URNs for the slivers which have associated FlowVisor slices.
     (Such that if one asked FOAM, it would know about all the slivers
      which had resources allocated…failing that, at least user URNs).
<JS> I've craeted http://www.doodle.com/r58q8esy4vawn7q8 as a Doodle poll
     suggesting times this afternoon and tomorrow; I think Ilia and Nick are
     essential, and anyone else who's interested could listen in. (And anyone
     else who Ilia thinks is essential from the ExoGENI team -- Ilia, let me
     know if you have anyone else in mind.)

"Note the message below is in response to a much earlier comment form Ilia
which stated:  <IB> I don't have a problem with that when it becomes available.

<NB> When which becomes available?  We're looking for some input here - is
     the path of least resistance to emulate the FV XML-RPC API, or should
     we develop something more specialized for exogeni?
<HD> To Nick: I've seen several emails on the exigent-design list, and it 
     sounds like you, Josh and Iila are planning a teleconf this week.  
     Do you think you'll be able to put enough effort into the discussions 
     to work out a rough agreement for a solution this week?
<NB> I believe this is conflating two issues:
     1) They have a separate software stack (their AM, not NOX) which
     communicates with FlowVisor outside the visibility of FOAM to allocate
     virtualized resources
     2) They have suggested using NOX to provide baseline control of their
     openflow resources for non-openflow experimenters.  I think many
     people (myself included) believe this is a bad idea, and we should
     explore precisely what they are trying to accomplish and how to best
     execute that.
   
     We are planning on discussing issue (1) this week, but there has been
     no further mention of issue (2).
<JS> Mm, I'd forgotten about that part. Perhaps because I feel like "they're
     planning to provide a service that we think is a bad idea" seems like less
     of a problem (we can ask them to turn off that service) than "they're not
     planning to do something that we think they'll need to do".
     Nick, should we be more concerned about this than we are?
<NB> I believe so.
     My general understanding is that because they can't run this switch in
     an "acceptable" hybrid mode (for varying values of whatever that is)
     they have identified a need to still be able to provide their
     non-openflow network service to experimenters, so they're going to run
     a controller to manage these slices and they've chosen NOX.  This is
     at least my understanding.
<JS> Ah, that sounds plausible. (I had lost the context here, and was thinking
     that they were talking about an even more optional service, which people
     who wanted to use OpenFlow could use if they didn't want to run their own
     controller; but in fact I think you're right, what they're talking about
     is something so that people who don't care at all about OpenFlow don't
     have to touch it at all.)
<NB> However, I believe they should run in this mode all the time - using
     hybrid datapaths only creates problems and limits functionality for
     the openflow portion of the network.  
<JS> Could be; that sounds like a conversation we can have with them when the
     switch firmware allows hybrid mode. If the experiment with pure OF mode
     has gone well enough until then, it might be an easy sell -- and so that's
     some incentive to see the pure-OF way work well.
<NB> That being said, I definitely
     don't think they should be running NOX to provide transport for
     non-openflow users - not particularly because this has anything to do
     with NOX so much as the fact that often when people say "run NOX" they
     mean run one of the NOX sample applications, which are not production
     applications (and certainly don't provide the functionality we would
     desire).  GIven Ilia's apparent lack of interest in writing any code
     to work with the openflow side of their rack, I highly doubt that
     they're intending to write a custom app for NOX to facilitate their
     use case.
<JS> That all sounds likely to me. This isn't something that we have a lot of
     experience with, because we've mostly been focused on supporting
     experimenters who (a) want to do nifty things with OpenFlow, and are thus
     writing their own controllers; or (b) just want a learning-switch
     controller, but are doing things on such a small scale that NOX 'switch'
     or 'pyswitch' is good enough for their needs.
     You mentioned "production applications"; do you have any insight into what
     *would* fit that bill, but not cost a lot of money? (Or time spent
     convincing a vendor (like BigSwitch or NEC) to donate a production
     controller, or whatever.) Is there in fact a better off-the-shelf solution
     than NOX 'switch'?

Note: Comment below after meeting with Ilia, Josh and Nick.
<JS> We did! And the main thing that we concluded is that Ilia doesn't think
     there's time before GEC 13 to change how ORCA talks to FlowVisor, so we're
     not going to try to throw together an ExoGENI-specific API to FOAM
     immediately. Instead:

     * Nick will continue to work towards the planned API plug-in layer for
       FOAM, which he thinks will be done by GEC 13.
     
     * The first two racks (RENCI and BBN) will have ORCA talking directly to
       FlowVisor, with the understanding that this may have some issues.
     
     * We'll aim to shift to a model where ORCA will talk directly to FOAM,
       after GEC 13. (Or, talk more between now and then about whether this is
       a good idea -- Jeff raised some questions about this, and we generally
       agreed that what we all really want is for FlowVisor to have only one
       administrative master, but that it isn't fundamentally important whether
       that master is FOAM or ORCA... So if ORCA can do everything we need it
       to do -- including managing flowspace for non-GENI resources that aren't
       part of the ExoGENI rack -- then perhaps it makes sense to not run FOAM
       in an ExoGENI rack at all. But we think this that isn't a short-term
       solution, because it would require a way for those non-GENI resources to
       interact with ORCA to tell it what flowspace they wanted, and I don't
       think we have even an idea about what that would be.)
     
     So, with that, I think question 29 from the original list is answered: In
     the first two ExoGENI racks (RENCI and BBN), ORCA and FOAM will both talk
     directly to FlowVisor; we'll continue to discuss between now and GEC 13
     about how to narrow this down to having only one of them do that; and
     we'll aim to implement a single-master solution soon after GEC 13 (and
     definitely before any additional racks ship).

     Sound right? Anything else I missed or otherwise got wrong?

}}}

 * 30. External resources (like the mesoscale) have to be manually configured to be available. We should make a list of the resources that could connect and we want connected, and get them to build those in advance. Like the mesoscale.
{{{
<HD> Configuration for meso-scale is worth persuing need answers to Josh's use cases first.
<IB> There will be a hard partitioning between resources controlled by the ExoGENI SM and 
other resources.  Changing the partitioning is pretty straightforward and not very disruptive.
<TU> We are initially planning to hand 10 VLANs that we have provisioned to our FrameNet 
endpoint and 1 VLAN that is provisioned to our ION endpoint to the ExoGENI team.  Initially, 
they will control these VLANs with the ExoGENI SM.  WE can give them more VLANs later, 
assuming it is easy.  We only plan on provisioning a single OpenFlow VLAN to the ExoGENI rack 
in our lab (1750) to start. 
<TU> Currently we are thinking about provisioning extra special use OpenFlow VLANs from each 
mesoscale campus.  We will have to let the ExoGENI team know how to reach these VLANs once we 
actually provision them.  This should be as simple as provisioning VLANs down to the rack and 
letting the rack know the VLAN IDs.
<TU> We are also thinking about having mesoscale campuses have a set of non-OF controlled 
VLANs.  I think we should just be able to tell the ExoGENI team the VLAN ID and the endpoint 
(FrameNet, ION, etc) that the mesoscale campus uses, an then the ExoGENI SM should be able to 
connect to that VLAN.
}}}


= Nick Bastin's Questions =

Network:
 * ~~ B-1. Why not use FOAM everywhere?~~ (hpd)
 * B-2. Why not run pure OpenFlow and slice on VLAN in FlowVisor w/translation at the rack edge?
 * ~~ B-3. How is IP space managed within the rack environment - can experimenters request more / specific IP space? ~~ (hpd) (Duplicate of question 10b)
 * B-4. The OpenFlow control channel looks to be extremely throughput constrained.
 * B-5(1). Does the switch not support the ENQUEUE action at all, or does it just not support all the openflow packet-queue structures?
{{{
<BV> B-5. Is there an IPMI connection from the head node to the
     management switch? If so I think that makes for 45 management
     switch ports used.
     o Worker node.  IBM x3650 with Virtual Media Key.  1 port for
       vKVM/IPMI/etc, 2 ports for 1GbE traffic.  Total of 30 (assuming
       10 worker nodes)
     o Head node.  IBM x3650 with Virtual Media Key.  1 port for
       vKVM/IPMI/etc, 8 port for 1GbE traffic.  Total of 9.
     o iSCSI enclosure.  Redundant controllers, each with 2 ports.
       Total of 4.
     o Juniper VPN appliance.  1 WAN port, 1 LAN port.
     o PDU.  1 port (For 208V based PDU's)
    How many ports in total get used on the management switch will
    depend on the connectivity from each campus.  If for example
    we can ONLY get 1 1GbE connection from campus the total will
    be 47 (46 from above, plus campus into the management switch).
    That would be the worst case situation and leaves us 1 open
    1GbE port on the switch.
<NB> Ok, I was just working off of the table on page 4 of the design
     document that has 44 ports used on the management switch.  It's a
     little hard to reconcile that table with figures 1 and 2, as well as
     the text.  Figure 1 has a red line connecting the management switch
     "to campus layer 3 network", and figure 2 has a line connecting the
     management switch to the Juniper SSG5 (which is not in figure 1), and
     no other connection to an outside L3 resource.  The text in 2.1 states
     "The connections to the commodity Internet via the campus network is
     expected to serve management access by staff as well as experimenters"
     - I read this to mean that all control-plane access (management and
     experimenter) would be coming in over the SSG5.  So, I guess the new
     question is, is there a direct campus L3 connection to the management
     switch, as well as a connection to the SSG5?  Also, do you really mean
     that the SSG5 is connected twice to the management switch?  (I
     understand how that would work, I'm just trying to figure out if
     that's what you mean)
}}}

Rack Configuration:
 * B-5(2). Is there an IPMI connection from the head node to the management switch? If so I think that makes for 45 management switch ports used.
 * B-6 I am concerned that the head node is under provisioned for all the services it needs to run - 12GB of ram seems low.
{{{
<BV> We don't have empirical evidence that 12GB of memory won't be
     enough. We felt it was a safe starting value, but ensured there
     are free DIMM slots if we need to expand to 24GB or 36GB.
     Although the cost of 2GB DIMM's vs 4GB's isn't significant,
     when multiplied out to 12-14 sites it was enough that we decided
     to start with 12GB.  If we decide later to move to 24GB, we'll
     expand future racks so they come from IBM that way.
<AH> 15) They haven't tested the head node, when the FlowVisor and FOAM are
     getting actively used, to check for performance problems. It's unclear
     if there is an issue here or not, but the only real solution appears to
     be to double the RAM - which they can do later if necessary.
<CG> They can do it later if necessary, but why not do it sooner?  I'm curious
     what the actual numbers involved here are: my personal experience has been
     that RAM is (a) cheap, and (b) always the thing you're short of.  I know
     they said they could send a tech on-site, but, for a new installation,
     12GB of RAM should be something like $150.  There's one head node per
     rack, and how many racks the first year?  Again, i don't know the actual
     numbers or tradeoffs, but i think it's very likely that this is cheap
     and may solve a real problem.  So IMHO they should just do it while it's
     early enough to never have to think about it again.
<NB> Fair enough.  I would also ask that we revisit the plan for all the
     software on the management node to be installed in the same OS
     instance - I really think this should be a virtualized environment
     (particularly because both FOAM and FlowVisor do not currently have
     RPM package builds).  This will put significant constraints on the
     software to use the same JVM versions, etc., or create an integration
     challenge to create separate environments for the software to run in
}}}
 * ~~B-7. How is the head node configured - do the services run in their own VMs, or do they need to co-exist on the same OS instance?~~ (jbs)
{{{
<IB> The VM option remains open, however currently we are not seeing any
     software conflicts that would require that. VMs will take some
     performance overhead and they may make it more difficult to
     communicate between some elements of the software stack.

     We have already built most of the components on our OS of choice -
     CentOS 6.2 and we're not seeing any conflicts. Despite the fact the
     CentOS/RedHat is not always officially supported, there are usually
     instructions for advanced users on how to build the software that seem
     to work.
<JS> Aha, ok. It might mean that you have to do more ongoing work to track
     updates to those components, if new versions don't build as cleanly, but,
     as you say, we can revisit if it turns out to be a problem. I think it's
     probably fine to call this closed, but since it was originally on Nick's
     list, I wanted to give him (or anyone else with contrary opinions) a
     chance to chime in before I crossed it off.
<JS> B-7. How is the head node configured - do the services run in their own            
     VMs, or do they need to co-exist on the same OS instance?
     ISSUE: We (GPO) think it would be better if the head node ran VMs, so that
     the various software that needs to run there can run in a more isolated
     environment, on its preferred OS; but it sounds like that's not how RENCI
     is planning to do it at this point. If you prefer the all-in-one-OS
     approach, can you talk more (maybe fork off a separate thread) about why?
<IB> The VM option remains open, however currently we are not seeing any software 
     conflicts that would require that. VMs will take some performance overhead 
     and they may make it more difficult to communicate between some elements of 
     the software stack.
<JS> Aha, ok. It might mean that you have to do more ongoing work to track
     updates to those components, if new versions don't build as cleanly, but,
     as you say, we can revisit if it turns out to be a problem. I think it's
     probably fine to call this closed, but since it was originally on Nick's
     list, I wanted to give him (or anyone else with contrary opinions) a
     chance to chime in before I crossed it off.
     We have already built most of the components on our OS of choice - CentOS 
     6.2 and we're not seeing any conflicts. Despite the fact the CentOS/RedHat 
     is not always officially supported, there are usually instructions for advanced 
     users on how to build the software that seem to work. 
<IB> OK
<NB> At the very least we're likely to run into the need to move common
     services (like SNMP) to custom ports, but I'm also concerned about
     finding ourselves in a situation where we have conflicts in required
     JVMs or similar (FlowVisor already trips over some known issues in
     commonly distributed JVMs) or Python versions.
<IB> We use JREs downloaded from Oracle site, not shipped with the distro. CentOS 6.2 
     seems to be reasonable up to date with python (2.6.6 is the stock version ). 
     Which components have SNMP interfaces on them? 
<JS> My summary is that Ilia is optimistic that there won't be any issues,
     Nick is pessimistic that there will be, Ilia has said that they'll
     revisit if they are, and that this is fine with us for now.
<NB> Both FlowVisor and FOAM will have SNMP interfaces in the medium term.
     The suggested use case for most installations would be that they would
     disable the FV interface and just use the FOAM one if they were
     running both, but that will be more difficult if FOAM doesn't know
     detailed information about everything in FlowVisor.  Also, I'm not
     saying there are necessarily any problems right *now* with JVM/Python
     versions etc, but this will be an ongoing software qualification
     concern when individual components become available with new versions.
}}}
 * ~~ B-8. PDUs are also useful for remote management if a node gets completely bricked (such that IPMI is useless) - I would think that the marginal cost would be more than worth it.~~ (hpd) (we're helping RENCI to work on in the first couple of rack integration efforts.  Ticket #3354)
{{{
<BV> IBM doesn't offer switched PDU's with 120V on their standard
     Bill of Material.  The 208V units on their standard BoM are
     switched and monitored.  For the first 2 racks (RENCI and BBN)
     we are sticking with IBM's standard BoM because to use
     non-standard BoM parts means it can't be assembled in the
     factory and has to goto the "Integration Center" which increases
     the lead time.  So for the BBN rack, we won't have switching.
     We hope for other sites that can only support 120V power we
     will be able to identify with IBM a reasonable switched PDU
     they can install.
<JS> I've forgotten, can we take a 208V unit? If so, then if that would get us
     a switched PDU, then it might be worth doing.
<JS> We (GPO) think it would be better if the head node ran VMs, so that
     the various software that needs to run there can run in a more
     isolated environment, on its preferred OS; but it sounds like that's
     not how RENCI is planning to do it at this point. If you prefer the
     all-in-one-OS approach, can you talk more (maybe fork off a separate
     thread) about why?
<IB> The VM option remains open, however currently we are not seeing any
     software conflicts that would require that. VMs will take some
     performance overhead and they may make it more difficult to
     communicate between some elements of the software stack.
<IB> We have already built most of the components on our OS of choice -
     CentOS 6.2 and we're not seeing any conflicts. Despite the fact the
     CentOS/RedHat is not always officially supported, there are usually
     instructions for advanced users on how to build the software that
     seem to work.
}}}

Resources:
 * ~~ B-9. Why not allow arbitrary bare-metal images? Is this any more dangerous than arbitrary VM images? ~~ (hpd) (Duplicate of question S.27)
{{{
<BV> As discussed briefing in the concall.  The reason to not allow
     custom bare metal images is two fold.  1) The decrease in
     security because users will have direct access to the bare
     metal network interface which connects to the management switch.
     2) The complexity of creating a bare metal image means the
     user would have to have a system identical to the one inside
     the ExoGeni racks so they could load all the hardware drivers,
     etc.  I don't think we've ruled out the possibility 100% and
     if a user provides a compelling reason for why they need it,
     then we can consider it.  But I think we have enough on our
     plates with the initial deployment without adding this level
     of complexity on day one.
}}}
 * ~~B-10. Where is the storage for the running instances - on the worker nodes?~~ (hpd)
{{{
<BV> We will have the ability to provide storage either on the
     running worker or via NFS from the head.  Long term plans
     include being able to provision raw iSCSI luns from the iSCSI
     unit with a slice and make those available as well.
}}}
 * B-11. What are the average IOPS available for each VM on a fully loaded (max running VMs) worker node?
{{{
<BV> Each worker has 2 hard drives.  1 146GB 10K RPM SAS and 1 600GB
     10K RPM SAS.  In the case of a VM worker, the OS (CentOS 6)
     will be installed on the 146GB drive and all the VM's storage
     will be installed on the 600GB drive.  In a bare metal install
     the user would have access to both and could use them as they
     saw fit.  The "standard" rating for a single 10K RPM SAS spindle
     is 180 IOPS.  There are 6 drive slots on each worker, we can
     add more spindles, but for each spindle we add, we remove 1
     worker because of the cost (i.e. 9 2.5" 600GB SAS spindles =
     about $4000, or the cost of a worker).

     In all the infrastructure designs it was a delicate balancing
     act between available funds and performance.  Our goal being
     to build something that was usable today but extensible for
     the future.  The first 2 racks are our on the job training.
     We fully expect that after these first 2 racks we will tweak
     the hardware configurations with IBM and hopefully have a
     smooth flow from IBM's integration center to the other sites
     for the remaining 10-12.

<NB> This seems optimistic - the latency of a 10k rpm spindle with a 2.5"
     platter is 3ms, and the IBM 5433 (the 600GB drive in question) has a
     4.2ms average read seek time (writes are slower, but we'll be
     optimistic here for the purposes of this discussion), which makes for
     ~139 IOPS (1 / 0.0072).  Of course, neither of these numbers are
     particularly useful if we don't have an idea of the workload - more on
     this below.
<NB> I've been doing some math on the back of some napkins and I think that
     might be a net positive tradeoff for total VM capacity based on a
     variety of workload calculations (although factoring bare metal into
     this makes that calculus more complicated).  I still have some work to
     do on this, so I'll followup later with my thoughts.

}}}

= Adam Slagell's Questions =

Software/Firmware Update
 * ~~S.1 What part of the software stack does exoGENI take responsibility for maintaining updates? IS there anything they don't?~~ (chaos, based on adam's comment)
{{{
<AS> Sounds like VM/BM images and all the software that comes with
     the racks. I didn't see any gaps or buyer bewares.
<IB> We will take care of software updates. The only buyer-beware concerns 
     the operation of FOAM - we don't want to be in the business of approving 
     user slices in FOAM and think this needs to be done by GPO or GPO delegate. 
}}}
 * S.2 Is there an automated updated system? If so, how is integrity insured?
{{{
<AS> Sounds like no.
<IB> Not at this time. The software is too diverse.
<AS> Maybe for the system images some sort of integrity verification 
     using digital signatures is feasible.
<IB> Currently the images for VMs go through such a verification -
     the user submits a URL and a SHA-1 hash of the image they want booted. 
     For bare-metal images if we add filesystem integrity verification, it
     can cover the images locally cached on the head node.
<AS> 1. Auto update system: There are no plans for an autoupdate system for
     the GENI racks. With a large and complex software stack and many racks 
     at many institutions, this could become problematic to keep up-to-date. 
     The quickest way to a security incident is to have out of date software.
     BTW, isn't there a GENI project (by Justin Cappos I think) that is 
     supposed to help make getting secure and reliable updates easy.
}}}
 * S.3 Is there a service guarantee for updates? Say a flowvisor vulnerability is found and a patch made. How quickly can you push out updates?
{{{
<IB> Since none of the GENI software I know runs as root, I think we can be 
     relatively lax about this. I would say 72 hours if it is a straight-forward 
     update that does not require significant reconfiguration and repackaging. 
<AS> 2. Vulnerability management: Any major system going out needs a plan for 
     monitoring and investigating vulnerability impacts. The more complex the 
    software stack and the more things that depart from a vanilla OS distribution, 
    the harder this becomes. You need to (1) be aware of all potential vulnerabilities 
    (challenging for a complex software stack), (2) test for exploitability, (3) determine 
    impact, (4) test patch or mitigation, and (5) push out a solution all very rapidly. 
    The previous comment in #1 really addresses just the last bit, and I see no 
    vulnerability management plan into which you could insert it now.
<JM> There's been talk of several strategies, and no single solution will get it
     all done.  We will all know what the state of OS patching looks like, since 
     I have a Nagios/Check_MK plugin that essentially runs a 'yum check-update'.  
     It does this with the security plugin enabled.  The result is, for each host, 
     we will know how many updates it needs, and how many of those updates and 
     security-related.  Of course, this only helps us with the base OS; it cannot 
     address potential vulnerabilities in the GENI-ORCA-OpenStack-Neuca world.  
     Ilia will have to comment on the latter.
     In terms of stopping SSH brute force attacks, I think denyhosts is a good way 
     to go.  But our sshd is tcpwrappered by default anyway (set up by kickstart).  
     This is kind of attack won't be an issue. 
<AS> There's also the VM and bare metal images as well, right.
<IB> Regarding the VM images - since users are allowed to boot their own, 
     the main weapon we have there is the ability to match resources to slices 
     and shut down misbehaving resources. 
     Bare-metal images will be restricted to a small selection (size 1 initially). 
     The problem with frequently changing/updating those is that it makes repeatable 
     experimentation more difficult, e.g. if an experimenter expects a certain image 
     with certain versions of kernel, drivers and software and we continuously move that mark. 
     The GPO will need to weigh in on what is more important - repeatability or the 
     potential impact on security, because this is an important tradeoff we're talking 
     about here.
<AS> Good point.
<BV> Also, I'd like to add, based on conversations yesterday, neither VM's 
     nor bare-metel servers will have direct internet access.  Our plan is 
     to proxy all public IP traffic through the headnode at each site, using 
     IP tables.  This gives us the opportunity to shutdown a site very quickly 
     if there is a report of a problem, but keep the problem system running 
     (VM or bare metal), so we can analyze what is going on and resolve the 
     issue with the experimenter.
<AS> I'm not sure what you are saying exactly here. Are they private IPs that 
     are NATed, are they going through an application layer gateway? What do 
     you mean by not direct?
<JS> Hmm, my impression is that if we wanted to create a new bare-metal image,
     we wouldn't necessarily delete old one(s), but rather that the list would
     grow over time.
     Ah, but that may not have been what you meant: Indeed, if we update an
     existing image to fix security problems, that would potentially have an
     impact on repeatability. I think we'd need to at least identify that the
     image had changed (e.g. by changing its name), so an exprimenter would be
     aware of that, and could re-validate that their experiment still produced
     the same results after the change.
     We could also devise some way for the experimenter to capture the
     vulnerable image, so they could run it somewhere else if they felt the
     need. (Or just boot it up on an isolated system of their own so that they
     could look at it, or whatever.)
<AS> I was assuming you would add new images that you support over time, but 
     existing ones would get security patches as time goes by. Of course you'd
     want to enumerate them and specify how they differ, perhaps in /CHANGELOG.txt 
     or something.
<IB> I don't know that we can guarantee that a particular 'security' patch will 
     not affect the performance of one or other of the kernel subsystems thus 
     affecting repeatability.
<JS> Ja; I think "track, notify, and archive" is the right approach here.
<SS> I'd turn this around -- we know that many changes will affect performance, 
     sometimes in only minor ways, but sometimes in major ways.
     If space is not a problem, I'd plan to keep every old version of standard 
     images around. The naming convention is just a detail, but
     OS-version-exogeni-current might be the name that gets you the latest 
     patched/supported image, but the logs would show you the precise version 
     you got (OS-version-exogeni-x.y or -yyyy-mm-dd).
     If an experimenter is running a slice that is on a closed (virtual) network, 
     e.g. configured so that only a fixed set of well-known machines can reach it, 
     then it is possible to bring up even old images with security vulnerabilities 
     and repeat earlier test runs or collect new data using those older images.
     If that same experimenter wants to run on a slice that provides "service" 
     to some larger, open set of users (on campuses or wherever), then they are 
     going to appreciate having automatic support for getting the latest OS patches 
     into the base images.
     I'm going to guess that we will see both sorts of use cases, but more 
     "closed networks" first.
<AS> Sounds like a reasonable balance.

<CG> > The GPO will need to weigh in on what is more important - repeatability
     > or the potential impact on security, because this is an important
     > tradeoff we're talking about here.
     So, my two cents: in our lab, we do try to apply OS updates to our
     experimental images, the same way we would to any other nodes we run.
     I think having an update schedule which applies to experimental OS
     images for which standard patches are available, as well as for servers,
     is a good idea.  If you can flag your images with metadata saying when
     they were last updated, so that experimenters know, so much the better.
     And if it's possible to keep old images around in case someone has a
     special-case need for one, again, that's a feature.
     I agree that it's a tradeoff, but i think doing periodic updates of
     images is the better bet.
}}}
 * S.4  Will there be someone actively monitoring for vulnerabilities on the entire software stack, or is it best effort (e.g., we update all the problems we are told about by someone else).
{{{
<IB> At this point there is no dedicated person. However our ACIS group 
    (members of which are part of the operations staff) are usually aware 
    of latest vulnerabilities as part of their data center responsibilities.
<AS> It may be worth doing google alerts on Bugtraq for all the software.
<IB> I'll ask our ACIS folks what they do today. 
}}}
Log Collection & Management

 * S.5 What do you log and how?
{{{
<IB> ORCA actor state transitions and handler execution outputs. We will log 
     entire manifests to make them available to GMOC. The manifest will be the 
     main vehicle for correlating substrate to slivers.
     There are syslogs on individual hosts as well. Other elements (FlowVisor, 
     FOAM) have their own logs.
<AS> 4. Logging: I think remote logging is a must for integrity and availability. 
     This should be for syslogs and AM transactions that are needed to maintain 
     accountability of actions. Some additional integrity checking on the hosts
     is nice, but icing on the cake.
<JM> The remote logging infrastructure is mostly complete.  There is a central 
     server, in a protected VLAN deep in the heart of RENCI, running rsyslog 
     on CentOS 6.2  It only accepts connections in a high numbered port, using 
     RELP, from control.exogeni.net.  The latter is a forwarder for all logs.  
     We have a simple LogAnalyzer web interface to the central rsyslog box (which 
     is syslog.exogeni.renci.org).  This is protected with SSL, Apache basic auth, 
     using LDAPS to authenticate to ldap.exogeni.net.  What remains to be done here 
     involves making all the nodes in each rack forward their messages appropriately. 
     And lastly, if there are any non-standard logs we need capturing (for instance, 
     OpenStack, Neuca, or ORCA logs), I'll need to create a template for handling them. 
}}}
 * S.6 Are remote copies logged?
{{{
<IB> Not at this time
}}}
 * S.7 Do you do anything special on the racks to maintain the integrity of the logs?
{{{
<IB> Not at this time
}}}
 * S.7.B What about other file integrity checking for config files and critical system files.
{{{
<IB> Has not been considered so far, but I think can be added. 
<AS> Also useful is minimizing setuid programs or watching for changes 
     to setuid bits.
<IB> Noted
<AS> 3. SetUIDs and configuration management: I think that it is good 
     that most things don't need to run as root on the racks, but the 
     number of setuid programs should be minimized too. Once you have the 
     list, I think xCat has decent configuration management utilities to 
     make sure security hardening policies like that persist across upgrades 
     and changes. If not, you should have a plan on how to make sure that 
     updates don't move you to a less secure state by modifying configuration 
     unintentionally.
}}}
 * S.8 Do you log enough to map timestamp/IP/port tuple to a particular slice?
{{{
<AS> Sounds like it is the information is there, though it may take
     some manual investigation, especially if NAT was involved.
<AH> 8) They haven't really worked out logging, but mostly hope to just send
     everything to GMOC and be done.
     This is probably just fine. This is essentially 'racks will log to a
     remote Logging API' which is consistent with recent architecture group
     discussions. We just need to (a) ensure we are asking for all the right
     bits of information, and (b) have them at least outline the algorithm
     for going through all those logs to get the information we really need
     (eg, what slice ID used IP X Port Y at time Z?)

     We should check more specifically on what is stored on the racks in
     terms of logs, if anything.
<IB> Manifests have the information. 
}}}
 * S.8.B What if you bridge some other device into GENI through your AM but hide it behind your NAT? For example, could there be some campus device causing a problem, but show up as one of the IPs on your rack, but not actually be under your control? And in that case, could you determine from your logs the device and what slice it was a part of?
{{{
<IB> In the current architecture this is not really possible. The IP 
     addresses given to the rack are used by rack resources only.
}}}
 * ~~S.9 Can you easily tell what slices are running on a given rack? How about each node on a rack?~~ (chaos, based on adam's comment)
{{{
<AS> Sounds like that is not a problem
<IB> Yes, although we need to do better with respect to making 
     this information available in an easier form.
}}}
 * S.10 How long do you keep local copies of the logs?
{{{
<IB> Depends on the verbosity. Once manifests start getting published 
     onto XMPP bus, this will be no longer an issue, as a separate log 
     repository can slurp them up and keep them in one place.
     The syslog logs probably should be configured to go to a central 
     syslog server in addition to having a local copy.
}}}
 * S.11 Is there a mechanism that could be used to send allocation log information back to the clearinghouse for global policy verification for slices?
{{{
<IB> XMPP bus - we want to use it as the means to make this data 
     available to multiple consumers.
}}}
Administrative Interfaces
 * S.12 What is the authentication mechanism for the VPN?
{{{
<IB> LDAP + possibly RADIUS slaved to LDAP (for switches)
<AS> LDAP would be for authorization, but what kind of credential 
     would be used for authentication. Maybe I am missing something.
<IB> LDAP stores usernames and passwords (as well as groups, which
     would be used to partition rights). RADIUS can read LDAP.
}}}
 * S.13 Does being on the VPN on one rack get you to the admin interfaces of all the others, or is this one way from RENCI?
{{{
<IB> One way from RENCI 
}}}
 * S.13.B How does one authenticate to the admin interface (separate from the VPN)? Is it root login?
{{{
<IB> Depends on the device (e.g. a switch vs. a compute node). 
     We opt for sudo whenever possible.
}}}
 * S.14 Are the credentials used to authenticate to the admin interface different for each rack?
{{{
<IB> This has not been discussed or codified.
<AS> When architecting this, it would good to strive for containment. So 
     if one unscrupulous person with a GENI rack reverse engineers something, 
     it doesn't give them the credentials they would need to do bad things to 
     other racks. It can complicate initial setups but probably pays off in 
     the long run.
<IB> Noted
}}}
 * S.14.B What about within a rack, is the root or admin password the same for each node/device?
{{{
<IB> We tend to use the same password for all worker nodes currently. 
<AS> I think within a rack, all nodes of the same type could be considered 
     at the same level of trust and treated this way.
<IB> Noted
}}}
 * S.15 Is authentication for admins the same whether or not they login through the VPN or SSH into the head node?
{{{
<IB> LDAP will be the back end, so yes.
<AS> So again, I am confused. LDAP as I have seen it used is just for 
     authorization. There are still SSH keys or passwords or OTP tokens 
     for different accounts. 
<IB> LDAP stores usernames and passwords. SSH uses PAM on the end hosts 
     to talk to LDAP over SSL channel. Switches use RADIUS that is slaved 
     to LDAP directly.
<AS> OK, makes sense. Though I presume it is actually salt and hashes stored.
<IB> Yes, the passwords stored in LDAP are not plain-text. Typically an 
     MD-5 hash is used.
}}}
 * S.15.B Are the SSH credentials to the head node different for each rack or shared?
{{{
<IB> Same as S.14. I don't know that these two questions are different. 
<AS> So here I am talking about two separate racks installed at different 
     institutions. Would a password or key that a local admin used to SSH 
     into the head node at University X also let them do the same at 
     University Y?
<IB> It is likely we will use LDAP groups to partition users such that users 
     are limited to specific racks. Root logins will likely be disabled 
    (and we may disallow 'sudo su -' for most users). 
}}}
 * S.16 How is accountability of actions recorded if there are more than one admin or is it just a shared root login?
{{{
<IB> we tend to use sudo, so some of the commands and privilege 
     escalations are logged.
}}}
 * S.17 Does the KVM for console access have an network interface that gives remote console access?
{{{
<AS> Sounds like NO.
<IB> No
}}}
 * S.18 What devices and interfaces can you see from the VPN interface?
{{{
<IB> All of them. 
}}}
 * S.18.B Does this differ for those logging in through the head node?
{{{
<IB> No. Head node access is a redundant means to do the same.
}}}
 * S.19 Would the hosting organization have a different admin interface?
{{{
<IB> No, just a different set of logins with different credentials. 
     Hosting organizations probably will not have VPN access.
}}}
 * S.20 Is the only authentication mechanism password based, or two factor auth or ssh keys used?
{{{
<IB> Right now based on LDAP passwords only. 
<AS> Oh, so you are using LDAP to distribute something like the /etc/shadow 
     file? So here, we use LDAP just to essentially distribute /etc/password, 
     but authentication is done through PAM with Kerberos or OTP. Am I 
     understanding this right, that LDAP does both for you, sort of like NIS?
<IB> Sort of. Except we don't distribute /etc/passwd - PAM talks to LDAP live
     and there usually is a caching daemon that caches the getpasswd entries 
     temporarily.
<AS> 5. Remote root access: It was not clear whether remote root login was 
     allowed anywhere. I read that sudo was used when possible, but I would 
     hope no sshd_config files allow remote root login.
<JM> root SSH is disabled by default in our kickstarts 
}}}
 * S.21 If ssh keys are used anywhere, are they stored unencrypted on any of these racks.
{{{
<AS> I suspect yes with xCat.
<IB> Yes.
}}}
 * S.22 If SSH keys are used, are they different for different racks?
{{{
<IB> We will probably generate different keys.
}}}
 * S.23 If passwordless SSH keys are used, can they be used multi-directionally? For example, if an xCat process needs to use them to do something on a less trusted part of the system, that other piece should not be able to use the same key to ssh back into the xCat manager.
{{{
<IB> xCAT uses only explicitly registered keys, so this can be avoided. 
     However we will disallow node-to-node logins as per:
     http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Disable_node_to_node_root_passwordless_access
}}}
 * S.24 Do the admin interfaces need to connect back to anywhere initiating outbound connections?
{{{
<IB> Not that I know of. 
}}}
 * S.25 What is meant by " Since ExoGENI slices have management network access via the commodity Internet, this is the default behavior." on pg 13? (Perhaps you will have explained this by now and can ignore)
{{{
<IB> This simply says that if you don't care about isolated connectivity 
    between slivers, you always have the commodity Internet connecting them.
}}}

Isolation
 * ~~S.26 Are you tired yet? I am. :-)~~ (chaos, per adam's comment)
 * S.27 What is the vetting process for bare metal nodes?
{{{
<AS> Sounds like no process yet, but there is recognition that we
     don't want bare metal hosts to be able to sniff in promiscuous
     mode and break the nice isolation properties
<CG>  26. Dataplane reachability testing:
      We think it would be a good idea to have two types of tests to go
      with the two types of VLANs:
      * Where the ExoGENI AM is used to provision a VLAN, we'd like to see
        a test which stands up a VLAN, verifies that it can be used,
        and reports to monitoring on whether that entire system (which
        includes the AM, of course) is healthy.  I believe you discussed
        doing something like this already: does what i just said sound
        similar to what you have in mind?
      * Where an ExoGENI rack is going to be connected to a static
        (long-standing) VLAN outside of the rack, e.g. to the shared
        mesoscale VLANs or to a longstanding L2 connection to non-rack
        resources at a particular site, we'd like to see a static test
        interface on each VLAN which could be used to verify connectivity.
        It would be ideal if the test interface were non-OpenFlow-controlled
        on the rack, so that it could be used entirely to test "is this
        link up?".  Does this seem reasonable?
<IB> No process yet. 
<AS> 7. Image vetting: I think a process, or maybe a set of criteria, is 
     needed for vetting bare metal images. What are the requirements? Things
     such as an "inability to sniff traffic in promiscuous mode on the NICs" 
     would fit into such a list.
<MB> Is the example you propose below an actual proposed requirement or just 
     a for instance? I ask because the capabilities that come immediately to 
     my mind as wanting bare metal seem likely to want to do exactly this.
<AS> It was being proposed and an example. I think it is desirable to prevent 
     from a security perspective because it provides better isolation of slices. 
<SS> Most of these are non-controversial (at least to security folks!) but I 
    didn't quite understand a couple points, maybe because I joined the review
    late and will admit to not reading all the messages.
     7. Image vetting: ... are GENI researchers going to be able to sudo root 
     on bare metal images? (I would have presumed yes, but maybe that isn't the model.)
<AS> I did not presume so because there was talk about state being preserved 
     between jobs/users. If they aren't wiping images between experiments and 
     users have root access, then there is a whole other security issue. 
<NB> > It was being proposed and an example. I think it is desirable to prevent
     > from a security perspective because it provides better isolation of slices.
     The ability to capture traffic from a promiscuous NIC on a bare-metal image 
     has no impact on slice isolation.  This is very much something that we should allow.
<AS> It depends. If it allows me to watch traffic on other slices and there is any 
     expectation of privacy, then it does impact a form of isolation. If there is 
     neither an expectation or promise of privacy, or switching would prevent one 
     from seeing such traffic even if in promiscuous mode, then the it isn't an 
     issue. I don't know the answer to either of those questions, though.
<NB> The privacy question is a good one, and should be discussed, but isn't
     a factor here.  If you have bare metal, you have exclusive access to
     the switch port and can't capture traffic that belongs to another slice.

}}}
 * S.28 Are the bare metal hosts diskless?
{{{
<AS> No, they have 146GB FS for OS and 600GB FS for data. However,
     they are wiped clean and reinstalled from a fresh vetted image
     between allocations. State is gone.
<IB> We're still debating whether we want stateful or stateless bare-metal
     nodes. Both options are open.
<AS> The nice thing about stateless is it more like a white list. If there 
     is state left behind, you have to always wonder if you thought of 
     everything that you need to clean up in between users.
<IB> This is still TBD and there are advantages to both. 
}}}
 * S.29 What are the main isolation mechanisms between slices?
{{{
<AS> VM hypervisors or wiped bare-metal systems isolate experiments
     at system level. At the network level, this is done with VLANs.
     The same VLAN won't have slivers from multiple slices.
<IB> Yes. VLANs have QoS associated with them wherever possible 
    (rate and buffer size limits).
<AS> 6. Isolation between racks: Isolation between racks is important, 
     especially since these are distributed across the country. Reverse 
     engineering something at one rack should not result in some class-wide 
     vulnerability that affects all racks. Companies like IBM often like 
     to install things with default keys and passwords, and you really 
     need to make sure those are changed and individualized for different 
     racks. Any password hash on a rack off-site is accessible and potentially 
     crackable.
<SS> Most of these are non-controversial (at least to security folks!) but 
     I didn't quite understand a couple points, maybe because I joined the
     review late and will admit to not reading all the messages.
     6. ... "Any password hash on a rack off-site is accessible..." 
     ... I thought all these racks were getting installed in "well-known" 
     facilities. So while remote, they aren't exactly in physically unprotected
     locations, right?
<AS> I don't know how much we trust the administrators at the dozens 
     and eventually hundreds of sites. Might students be admins of some 
     of these racks? If it really is a small set of trusted admins with 
     racks on data center floors, then it is less of an issue. 
}}}

Miscellaneous
 * S.30 For each rack, could the aggregate operator give a concrete block of IP addresses unique to it?
{{{
<AS> Sounds like this is a policy issue and could be made a part
     of the configuration guidelines for each rack. It is helpful
     for the LLR to be able to tell from IP if something is from a
     GENI rack and at which organization quickly.
<IB> A block or a list of addresses is fine
}}}
 * S.31 Are any user credentials stored anywhere, even temporarily? If so how are they protected and how long do they live?
{{{
<AS> You argue this is not applicable in the white paper I think?
<IB> If ABAC is adopted in GENI, user certs may be cached on the 
     head node as part of authorization process. They, however, 
     constitute public information and do not require confidentiality 
     protection
}}}