wiki:OperationsIssues

Top 10 Ops List

Here is a list of operations issues that have been discussed in OMIS. If you see issues that are on your top 10 list here, please increment their count column by 1. (See the wiki formatting page for reminders on how to edit the page after you've logged in.) If your top 10 issues aren't on this list, please add them, and start the count column at 1 for each new one. If anybody would like to manage the list or make the voting more automated, please contact hdempsey@geni.net

Issue On your Top 10 List?
Tracking changes to the network. What's the procedure for contributing and does the NOC need to know? What about reclaiming a contribution?2
Observing current network state 1
Collecting network measurements 1
Making network measurements available to users. Tools that access the archive need to evolve as formats, precision, and other details evolve. But access to historical data needs to be maintained as well.1
Archiving network measurements1
Defining and supporting appropriate privacy policies for collected data. Different users have different privacy concerns, and operations supports all users, so there can be many conflicts. Over time, the original experimenter may have moved on and other users come and go.2
Keeping user access policies up to date, especially for mobile users1
Resource trust and reputation management1
Separating the "production" infrastructure from the "experimental" infrastructure. On a research network, how do you track when a status change is part of an experiment and decide whether it is something that needs operations action?1
Notifying users of planned outages in a way that makes sure they know about any changes that will affect them (not by just sending mass emails)1
Reflecting underlying outages into experiments that want that vs isolating experiments that want reliability from them.2
Defining and tracking demarcations between network infrastructure and user's infrastructure1
Defining, monitoring, and delivering on service level agreements1
Security monitoring and event management1
Backwards compatibility for tools (for example, accessing network performance data associated with software releases or hardware devices that are no longer active in the network)2
Making multiple security mechanisms and authorization styles work together, particularly for federated networks1
Communications with network administrators at user sites (in both directions). IT managers are very busy, and may not consider research network connections a high priority.1
Distinguishing component failure from outside network failure that prevents contacting a working component.1
Keeping stable routing (both internal and with external peers) as internal topologies change quickly with experiments coming and going.1
Last modified 16 years ago Last modified on 07/13/08 03:39:11