wiki:OperationsProcedures/GeniClearinghouseOrPortalEventProcedure

Version 1 (modified by adaadwil@indiana.edu, 7 years ago) (diff)

--

GMOC Procedure - GENI Clearinghouse or Portal Event

GMOC is responsible for receiving and responding to notices of portal or clearinghouse events on GENI. These notices range from critical active events to inquiries from GENI users. GMOC will need to work with the GPO (currently) and UKY (after the clearinghouse and portal transition to UKY operations), as well as with experimenters and experimenter support groups to resolve these events.

We break these notices down to two types: Active Events and User Inquiries.

Active Event: GMOC receives email/phone call about a clearinghouse/portal event, or detects an event through active monitoring (portal or clearinghouse down, DDOS attack, scheduled maintenance etc.). It is possible for an event to be both a security and portal/clearinghouse event. If initial investigation indicates a possible security issue, follow the Security Event procedure. (GMOC question: Do we need to separate active events from scheduled events such as regular software updates? We expect that software updates will happen regularly.) User Inquiry: GMOC receives email/phone call asking about a problem a user is having with the portal or clearinghouse.

Initial Information Gathering

In all cases, the GMOC will create a ticket and record as much as the following information as possible

  • Initial reporter’s contact information. NOTE: Verify information in the GMOC DB.
    • Name
    • Organization
    • Phone Number
    • Email Address
  • Type of Event
    • VM unreachable, slow responding, can’t log in, etc.…
  • When did this start?
  • Symptoms and Impact to GENI
  • Criticality of Issue

Note: A clearinghouse or portal active event is likely to impact a large number of GENI users, so this type of event should be treated as a GENI critical issue in most cases (see separate security procedure for the special case of a GENI critical security issue). User inquiries are usually not critical issues. While GMOC is 24x7, GENI and its other partners usually provide support only during normal business hours (Critical security events are an exception.) This means the procedure assumes no after hours support or responses from other GENI members, even though some may reply on their own initiative.

Active Event

Once the initial gathering of informfation is complete, follow the below process when responding to an Active Event. All steps and communication should be documented in the ticket.

  • Verification: No verification is needed if a member of the distributed operations team reports the initial event, because they should already have performed these steps. For all other reporters, follow these steps.
    • The GMOC should check that the portal/clearinghouse host is/is not active (this can be done independently (e.g. a ping to the appropriate address), or by checking information from the GENI Monitoring tool.
    • The GMOC should check that the portal/clearinghouse service is running by verifying that the front page displays via a web browser directed to a portal URL (e.g. https;portal.geni.net (no sign in required) or https://portal.geni.net/secure/home.php (requires sign in))
    • The GMOC should log in to the portal with an active operator’s account to verify that Apache services are operating correctly.
  • GENI Team Handoff and Tracking: Record the results of verification steps, and contact appropriate parties needed to resolve the issue based upon initial information gathering. Note the initiator should be cc’d on all email communication.
    • Handoff to Clearinghouse/Portal support team by email to help@geni.net with a cc: to gpo-infra@geni.net
    • Notify the experimenters and operators mailing lists of the event. (An active event is likely to impact the community.)
    • If no response takes place in 2 hours for an event that occurs during business hours, escalate to Heidi Dempsey and cc: help@geni.net .
    • Track progress with the support team and update the ticket until resolution.
    • Follow up with the reporter and the support team to ensure the issue has been resolved completely.
      • Send notification of resolution to the community
      • Determine if after action report is needed
      • Close ticket

User Inquiry

  • Verification Follow these steps.
    • The GMOC should check that the portal/clearinghouse host is/is not active (this can be done independently (e.g. a ping to the appropriate address), or by checking information from the GENI Monitoring tool.
    • The GMOC should check that the portal/clearinghouse service is running by verifying that the front page displays via a web browser directed to a portal URL (e.g. https://portal.geni.net (no sign in required) or https://portal.geni.net/secure/home.php (requires sign in))
    • Ideally, the GMOC operator will log in to the portal with an active account to verify that Apache services are operating correctly.
  • GENI Team Handoff and Tracking: Record the results of verification steps and contact appropriate parties needed to resolve the issue based upon initial information gathering. Note the initiator should be cc’d on all email communication.
    • If verification indicates no problems, then this is probably not an operations issue
      • Handoff to the clearinghouse/portal support team by email to help@geni.net, with a cc: to gpo-expt-support@geni.net.
      • Track progress with the support team and update the ticket until the support team confirms that this is not an operations issue.
      • Close ticket
  • If verification indicates any problem
    • Escalate to the clearinghouse/portal support team by email to help@geni.net with a cc: to gpo-infra@geni.net
    • Track progress with the support team and update the ticket until resolution. (The support team may immediately respond to the inquiry, or they may determine after escalation that this is an active event, in which case the GMOC should follow the Active Event Handoff and Tracking procedure from this point forward.)
    • Determine if an after action report is needed for the inquiry
    • Close ticket

More information

  • At the GPO currently the portal and clearinghouse run in a single virtual machine (nye.gpolab.bbn.com). This virtual machine has two DNS CNAMEs (aliases): portal.geni.net (portal) and ch.geni.net (clearinghouse).
  • When the clearinghouse and portal move to the University of Kentucky the portal and clearinghouse will run on separate virtual machines hosted by Amazon Web Services (AWS). The DNS CNAMEs (aliases) will be repointed to the AWS virtual machines.
  • Clearinghouse and portal information, including source code, is available at these URLs:
  • http://groups.geni.net/geni/wiki/GeniClearinghouse
  • https://github.com/GENI-NSF