wiki:GeniOmisGEC1Notes

OMIS Working Group Meeting 1 October 9, 2007

Please send any comments, additions or corrections to omis-wg@geni.net. These notes include references added after the meeting to make it easier for people to follow up on meeting topics. Asterisks (*) mark identified speakers.

Presentation materials from this and other working group meetings (including audio) are available at the GENI Web site:

http://www.geni.net/geni-engineering-conference/geni-engineering-conference.html

The initial goal of the working group at this meeting was to discuss high-level operations, management, integration, and security requirements for GENI. Volunteers led discussions in different areas. The first OMIS group deliverable is a short document describing required high-level operations functions, particularly those needed to interface with the narrow waist.

There is a funded position for an OMIS system engineer in GENI. Anyone interested in this work, or in being a working group chair should get in touch with Heidi. Mike Patton currently serves as an interim system engineer volunteer.

There will be several GENI solicitations published on www.geni.net per year for GENI risk reduction efforts. The first should be issued in December, with proposals due in February. (See the GPO Solicitation presentation for details at http://www.geni.net/geni-engineering-conference/geni-engineering-conference.html). Operations is one of three main goals for this first solicitation, so OMIS members are encouraged to submit proposals.

Critical Times to Remember: A prototype network should be running in a year (small scale network to start risk reduction). A Conceptual Design Review is scheduled for spring 2008. The CDR is based on documentation, not prototypes.

Q: What is the relationship between OMIS and other working groups? A: There is a lot of overlap between working groups. Security overlaps all working groups. Aaron Falk and working group chairs will help determine how to separate work that overlaps. For example, the Narrow Waist group has proposed a security architecture that includes PKIs, which implies that OMIS must determine how to put infrastructure, tools, and procedures in place to support certificates and the different ways in which they will be used (OMIS might also provide design input on whether PKIs can be managed effectively for all types of proposed GENI substrates.) OMIS must design appropriate security policies and determine how to monitor the network, respond to security events, and escalate security problems in order to implement a level of security consistent with what the Narrow Waist architecture proposes. In particular, OMIS must consider how managing security for GENI slices might be different than managing security for existing operational networks.

Q: How will we discuss these overlapping areas? A: Joint working group meetings will probably happen soon. We also can use the mailing lists.

Q: How big is GENI? A: There are many possible measurements to answer this. Some that have been mentioned in GENI documents as design goals to date include:

Thousands of projects and researchers ("registered" GENI users) Successfully supports one thousand simultaneous experiments Includes many federated organizations and "legacy" Internet peering Includes multiple substrates (optical, wireless, wired, other) Facility has a core backbone with ~25 primary POPs and many more secondary POPs (50-100) See GDD-07-45 The GENI Facility Design (http://www.geni.net/documents.html) for a starting-point document from the GENI Planning Group. (The GENI Planning group developed GENI concepts in 2006 and early 2007.)

Q: How do we get more definition into what the scale is? Is there a working group working on this? A: The GPO expects that participants will first define what scale is achievable right away (what have we already done in similar projects?), and then what the steps are to get from there to the MREFC goals (examples in the previous question). This is the spiral development process that will be covered in Chip's talk later in the conference (see the conference web page referenced earlier). There is no single working group working on scale.

One of the OMIS members observed that we shouldn't limit ourselves by picking a number--we should make the best design possible to support the largest network possible.

  • Matt Davy (Indiana University) spoke about lessons learned from operations support of other research networks at the GRNOC (http://globalnoc.iu.edu/). GRNOC supports Internet2, NLR, PlanetLab nodes, 100x100 project, HOPI testbed, Internet2 Observatory, NLRView and other networks and infrastructures.

-Operations needs to know where the production layer ends and where the research layer starts. It is not always easy to separate what is infrastructure and what is experiment. HOPI network was an example where the NOC would observe components or links to be "down," and needed to know whether it was down by intention of the experimenters, or because there was an operational problem that needed to be resolved. Communications between operations and experimenters was key.

  • Users and site contracts also need to know what the infrastructure is and how/when it changes. For example, if NOC updates the NetFlow software on routers, it may cause an anomaly in researcher's data. GRNOC uses RSS feed, logs of past changes, and online calendars to distribute change information. Email was not effective.

Q: Can the OMIS members characterize the size of networks we've had experience with (total nodes, users, experiments) to give us some idea of how far GENI goals are from current practice? A: Good idea. Heidi asked attendees to send answers to the list, and will ask the rest of the list for the same info.

Q: Will GENI characterize what the interface between experimenter and operations is? A: All agreed OMIS needs to be involved in this.

Q: If there was an interface from control plane to operations tools, would that be sufficient? Or are we trying to control allocation? A: Both.

Q: Once pieces of the network have been allocated to an experimenter, does operations even need to pay attention to it? A: Some discussion back and forth. Operations needs to pay attention to the slices and how they are allocated/isolated, but may not need to look inside slices. On the other hand, some experimenters might want their experiments to be monitored, since they aren't expecting to change anything below the application layer for example. Some experiments might define "service agreements" to define the demarcation and what they want monitored/not monitored.

Q: What about virtualization (slices)? Do we have any past experience with how successful network virtualization has been, and how expensive it is? Why wouldn't you just have more equipment and a simpler design? A: Although NLR was cited as one example where some virtualization had made it to an operational network, full virtualization was too expensive. No other examples cited--will need to follow up on this issue.

  1. Data virtualization is easier than control plane virtualization. A: Defining the "demarc" between experiment and infrastructure may be more important to operations than how successful we are at virtualization.

A: It might be that successful virtualization would make the demarc less important in GENI because the physical link state will not be affected by the virtual state if they are completely isolated. A: It may be that experimenters still need to affect the physical link as part of their experiment. A: What if someone wants to do an experiment about virtualization? Do we know what it means for a network to be fully virtualizable? Could you run all the code proposed for GENI in a slice, and then use that to learn how well it works before deciding to deploy it?

  • Jon-Paul Herron (Indiana University) spoke about data collection and monitoring from GRNOC experience, particularly NLRview. NLRView is a suite of servers and software on NLR network to 1) provide operations test points, 2) provide measurement data about the network and 3) provide a shared research platform.

-Operations needs to collect data for users and for the NOC's own purposes. Some data is shared with users and some is there only for operations/debugging. Some is used for trend analysis and network engineering/planning.

--Data security and privacy are big concerns. Collected records have data about users that could be considered sensitive. Universities have been especially concerned about privacy. NetFlow data is an example of where this has been an issue. (Wireless location was another one mentioned later in the discussion.) Anonymizing data isn't a solution. If people generating traffic haven't "opted in," data collection is only one of the privacy problems. --Data storage is important and expensive. Tools, collection methods, and archives change over time, and you have to support all that you've ever had in operations, or invest in converting data after changes. If access/privacy policies change over time, this makes it even more complicated.

Q: How far back do users ask for data? A: Some have asked for data from 2 years ago, but that was also as far back as we kept records, so probably you are likely to get requests that go back as far as you can keep records.

Q: Will slicing make data collection, storage and access harder? A: Seems so. OMIS will have to follow up.

Here are some URLs that GRNOC provided later, for those who are interested in exploring details of some available data from GRNOC-managed networks:

Utilization data, "looking glass"/router proxy and performance test points for NLR: http://noc.nlr.net/nlr/network-status.html

"Research data" such as routing logs and raw utilization stats: http://noc.nlr.net/nlr/research_data.html

Similar information for other networks is also available on the GlobalNOC site. For example, Internet2 Network data is located at:

http://noc.net.internet2.edu/i2network/live-network-status.html http://noc.net.internet2.edu/i2network/research-data.html

A summary of the NLRView infrastructure is also at: http://noc.nlr.net/nlr/maps_documentation/nlrview-documentation2.html

  • Kevin Miller (Duke University) spoke about security.

--There's overlap between (at least) OMIS, Services and End-user Opt-in working groups in security. --If there are multiple authentication and access control services, how will they interoperate? Or will they? Some systems will not support PKIs directly (for example, storage systems), so some sort of translation will be needed. --GENI documents thus far have a pretty aggressive security posture. Does that mean operations security is expected to be similarly aggressive? --How do we handle incident management? Revoking access is difficult when you have multiple systems and substrates. For example, components on a wireless network may not be reachable when you need to update access controls.

Q: At what level should authorization happen in GENI? A: There was discussion, but no obvious conclusion.

  • Ricci from U of Utah said the Narrow Waist group defines a slice as the right level, and that we should be able to quickly revoke GENI access for slice.

Q: Was there a proposal for an authorization language or protocol? A: See GDD-06-23 "GENI Facility Security" on geni.net (http://www.geni.net/documents_nav.html). There's also a specific proposal about the RSpec, which needs comments from working groups, especially about feasibility of implementing it. (See the RSpec slides presented during the Narrow Waist Working Group meeting (http://www.geni.net/docs/RSPec-GDC1.pdf).

  • Stephen Farrell (Trinity College Dublin) also spoke about security.

--We will need multiple security mechanisms and authorization styles, not just PKIs. The multiple methods will have to cross-connect. It makes sense to have PKI as a main mechanism, particularly for users and hosts, but not as the only mechanism. Kerberos is an example alternate mechanism.

  • Gary Minden, University of Kansas pointed out that the GRID community has been dealing with federated authorization and security for over 10 years, and we should see how they've done. He knows it is still an issue for them.

--Authorization model will need to include signing authority over to others. This is still an active research area. --It is likely that many of the security functions are still active research areas, so the operations solutions will have to evolve over time, starting with the ones that we already know how to implement and interconnect. We should capture existing and evolving alternatives as part of planning.

Q: Are we talking about just securing GENI resources, or about security for users opting in to experiments? A: Both. GENI resources have distributed ownership, so there is overlap in required functions for both. Users would certainly like to be able to use a single identity for both functions. Q: Schools that connect to GENI are going to want to be able to see what information is going into and out of a GENI experiment to the school's "production" infrastructure. They will need this for incident response, but may want it for other reasons as well (for example to prevent certain kinds of information from coming in or out). We don't want to add firewalls everywhere in GENI, but we do have to provide some capability for auditing information flows in order to be able to connect shared infrastructure pieces. How do we do this? Q: What is the enumeration of the threat models? A: See GDD-06-23 "GENI Facility Security" on geni.net (http://www.geni.net/documents_nav.html).

--Security models generally tend towards either specifying privileges or specifying policy enforcement points (more like Radius). Narrow waist working group has been mostly leaning towards the first model. Ideally, GENI should avoid the situation where every node has to understand every privilege, which might be difficult in GENI given scale and variability of nodes (e.g. wireless nodes that aren't on the net when policy updates happen). Radius style model, where nodes can ask for a decision from a database might be easier to manage. OMIS should look into this.

Q: Does it make sense to build security "light" first so we can get going quickly, or to build in more research prototype security from the beginning, facing the existing security problems head-on, but slowing down the build? A: The opinion in the room was divided on this. One person commented that if you use the spiral build model, the duration of each build/test cycle will be very short, because the whole risk reduction and planning phase must complete in 4 years. GENI will have to balance this tension.

--GENI documents have inconsistent levels of concern about security. How paranoid do we need to be? One measure is that we have to take at least the same amount of care that the IT managers at schools and institutions connecting to GENI do, because that's what's needed to convince them to hook up to GENI in the first place. If we are very careful about attacks, but not too careful about software bugs, then the results aren't likely to be what was intended. GENI needs to pick a consistent approach. --Have to be conscious of the security impacts on usability to make appropriate trade-offs and vice versa. (For example, usability tests have shown that huge numbers of users will ignore all screen clues and happily type their passwords across cleartext connections when they've been told they shouldn't.)

--A lot of virtualization experiments put demands on the physical layer (specifically optical) for things that haven't been done before (e.g. slicing). She's interested in following up with OMIS members on these issues. --How will operations manage access to substrates? --How will users opt-in to inter-layer experiments? --How will operations give access to appropriate control interfaces for inter-layer experiments? --What are operations concerns that are cross-layer themselves? Example, latency and BER can be changed if they are outside acceptable ranges for users. Latency may be more important than BER. --Can we learn from experiences with GigaPOPs?

Q: Substrate could change a lot without the higher layers being impacted. Do all experiments have to worry about controlling substrate? A: No. There can be experiments where the slice doesn't need any cross-layer communication at all. Expect GENI to support multiple kinds of slices with varying cross-layer communications functions simultaneously. A: Some experiments might like to compare what happens across two slices with different cross-layer communications functions. A: A real-world example is latency, which is very important to some applications. Bit Error Rate is probably not so important in the real world at 10G and above because of Forward Error Correction. A: There might be cases where you don't have FEC in a clean-slate network. A: That's true, and GENI should be transparent enough to allow many different approaches at physical layer. You would like architectures that let you choose the right layer of switching for what your data needs, and enable you to set up and control transmission at that layer. It's worth noting that economics vary significantly between layers, which might drive optimization to the lowest possible layer. A: Survivability is also a consideration that requires you pay attention to the physical layer, for example to protect applications that function together from multiple sites. Q: Might GigaPOPs eventually need similar capabilities, because they will be between GENI core and some endpoints? A: Yes.

  • Mike Patton (MAP Network Engineering) was scheduled to lead discussion on network engineering topics, but we ran out of time, so he had several discussions with interested participants later. Mike also took notes and helped with these minutes.

Thanks to Mike and all the other volunteers who led discussions and participated at the working group session. Now let's hear from the mailing list members who weren't able to attend. What are your concerns about GENI? What do you want OMIS to do? What operations needs should the early GENI solicitations provide support for? Email the mailing list and let us know.

Last modified 15 years ago Last modified on 04/28/09 16:29:59