OMIS Use Cases

At the second GENI Engineering Conference Aaron Falk presented a Use Case for how a researcher would set up and operate an experiment. The main body didn't include much relevant to OMIS, but there was an additional "Mini Use Case" on a single slide that talked about emergency shutdown. Both of these started us thinking about how various operational uses would look.

These are some GENI Use Cases covering aspects of the OMIS WG's charter. They describe the steps that happen for several Operations events. These Use Cases have a common set of Actors and Assumptions which you may want to review before reading the actual Use Cases.

The Use Cases written up so far:

There's a terminology problem in all of the use cases that I haven't quite figured out how to deal with...the term "ticket" is used to refer to two very different things... I hope that, from context, it can always be figured out. The tickets that the NOC manages are for reporting and tracking problems and other events. Since these use cases are all about dealing with various operational incidents, these kinds of tickets feature prominently. There are also tickets issued by components, a cryptographic data structure, that promise resources and can be redeemed to get the service described on the ticket. When there's potential for confusion, the type the NOC deals with can be qualified as "trouble tickets", but the other kind doesn't have a similar adjective that can be applied.

If you have ideas for additional OMIS related use case(s) (since the ones written up so far are all O&M we could use a few I or S ones, or, for S, just expand the existing ones [including Aaron's?] to include all the security steps), please feel free to write them up or send suggestions to the mailing list (you have to join in order to post).

Open questions

There are some open questions common to all of these Use Cases:

  • In the use cases we speak of notifying someone related to the slice in question. But, when experiments are composed another experiment might be using the service in the affected slice, but otherwise not be connected to that slice. It would be nice if there was a way for any researcher to register an interest in events affecting a slice, so that researchers that are relying on some other experiment for a service can get included in tickets for slices in that experiment.

Comments from the Indiana University NOC group

Michael raised several issues in his latest message about OMIS Use Cases. We would like to add a bit to that discussion in two areas, ticket terminology and notifications and operational data sharing.

Ticket Terminology

Michael raised a question of terminology confusion between “trouble tickets” and “resource tickets” if both are called “tickets”. We have a suggestion here. It seems to us that one way to eliminate this confusion is to talk about tickets and tokens. A component issues a token (or resource token [RT]) that can be redeemed for services. These tokens are directed toward a Clearinghouse (and perhaps later used in a conversation between the Clearinghouse and an experiment). A trouble ticket (TT) would be created by a NOC (Aggregate Ops or GENI Ops), and would describe a problem (current issue) or an event (future issue). A TT would (could) be shared with a Clearinghouse and experimenters. But, tokens and tickets would be issued by different entities (components and Ops).

Notifications and operational data sharing

The very pervasive issue of “notifications” and operational data sharing seems to be a crucial area that deserves detailed discussion. An open question posed by Michael in all Use Cases relates to how interrelated experiments would notify each other about service issues. The issue of notifications seems one of basic importance to operations, and one which cuts across all of the use cases mentioned. As Michael states, notification is a difficult and problematic area. Should notifications come from a myriad of NOCs, or be channeled through some kind of normalizing system to coordinate and standardize the notifications? Who does Ops (GENI or Aggregate) notify about outages (emergency or scheduled)? How does an experiment, end user, or aggregate signal its desire to receive notices? How do notification recipients signal what type of notifications they want to receive, and how they want to receive them? How does an experiment or an aggregate signal its intention to send out notices reflecting its internal state? These issues are already complex even for networks that are simpler and less federated, such as the NLR and Internet2 networks. We’ve already begun to see keen interest in developing a rich mix of push/pull notifications of customized granularity and audience. Some in the community want to see only a very small set of notifications for very specific things. Others want to see everything they can. Some want a simple system to “poll” for network status information. Others want notifications to be pushed out very aggressively through a number of channels. Given the diversity of audience for GENI-related notifications, this will only be more complicated (and important) for GENI.

Here are some initial ideas based on our experience.

  1. Any notification feature must be automated as much as possible. This will have implications for (among other things) the data structure that describes experiments, aggregates and clearinghouses. This seems obvious. But, this network will be way too complex for much “operator intervention” at the notification level.
  1. For notification information to be exchanged there must be some process for information providers (the ones producing notifications) and information consumers (the ones wanting to see notifications) to opt-in, and authorize the exchange of information. This process must be strict enough to protect against information flooding or leaking, but must also be simple enough so it doesn’t become a roadblock to research information sharing and so it doesn’t inadvertently slow down operational troubleshooting and maintenance.
  1. Standards for notification formats could be very helpful. If notifications can be sent in a standardized format, the Aggregate owners and Experimenters can use the notifications programmatically. This could be to update the aggregates trouble ticketing system or to change the configuration/ suspend an experiment.
Last modified 10 years ago Last modified on 04/30/09 16:43:44