wiki:GENIOperationsTrial/MonitoringAddingSites

Version 4 (modified by lnevers@bbn.com, 9 years ago) (diff)

--

OPS-001-B Adding Monitoring Sites

This procedure define the steps to add a new site to the GENI Monitoring System.

A request to add a new site can be made by the GENI Rack Team or by an operations group. Adding a new site is part of creating and setting up a new rack at a site, for which GMOC has a master ticket.

Regardless of the source of the request, a ticket must be written to track the completion of this task. Ticket must copy the issue reporter and does not generate notifications to GENI users.

1. Issue Reported

GMOC gathers technical details for the addition request including:

  • Requester Organization
  • Requester Name
  • Requester email
  • New Site Name (i.e. the Aggregate Manager nick name)
  • New Site Aggregate Manager URN
  • New Site Aggregate Manager API URLs
  • New Site Data Store URL

Note: These details should be part of the new site master ticket to begin with.

1.1 GENI Event Type Prioritization

GMOC should classifies the addition of new GENI sites to the GENY Monitoring System as Normal priority.

1.2 Create Ticket

The GMOC ticketing system is used to capture information above. GMOC may follow up to request additional information as site is added. This operation results in the requester getting a ticket email.

For the moment, the response steps involve a very short outage of the ops config data store, which would create a very minor disturbance, if noticeable at all. As a result, there is no need to schedule a maintenance for that system. This assessment may be revised in the future.

2. Investigate and Identify Response

2.1 Investigate the Problem

  • N/A

2.2 Identify the Response

The ops-monitoring distributed system follows this architecture. Therefore adding a site to monitoring means primarily adding it to the Ops Config data store.

The instruction to access the ops monitoring software repository are detailed here and the instructions to get an account for the ops monitoring software repository are detailed here.

The party responsible for maintaining the Ops Config Data store is currently the GPO. The GPO will execute the following steps to add a new site to the ops-monitoring system.

2.2.1 Adding a section to the json file: config/opsconfig.json.jwc in the repository

An entry for the new site needs to be added to the "aggregatestores" section of the config/opsconfig.json.jwc file in the repository.

As an example, here's such an entry for a typical production site (utah-ig here):

    {
       "urn": "urn:publicid:IDN+utah.geniracks.net+authority+cm",
       "amtype": "instageni",
       "href": "https://www.utah.geniracks.net:5001/info/aggregate/utah-ig"
    },

Here's an entry for a site that is not yet in production (i.e. the AM API URLs and nickname are not known to the portal - vt-ig here):

    {
       "urn": "urn:publicid:IDN+instageni.arc.vt.edu+authority+cm",
       "amtype": "instageni",
       "href": "https://www.instageni.arc.vt.edu:5001/info/aggregate/vt-ig",
       "amurl": "https://instageni.arc.vt.edu:12369/protogeni/xmlrpc/am",
       "amurl1": "https://instageni.arc.vt.edu:12369/protogeni/xmlrpc/am/1.0",
       "amurl2": "https://instageni.arc.vt.edu:12369/protogeni/xmlrpc/am/2.0",
       "amurl3": "https://instageni.arc.vt.edu:12369/protogeni/xmlrpc/am/3.0",
       "am_nickname": "vt-ig"
    },
  • The file should be edited and committed either on the develop branch, or an a ticket branch. For the develop branch: git checkout develop and use editor of choice.
  • At the top of the repository, run make to create config/opsconfig.json. Cut & Paste the content into http://jsonlint.com/ and make sure the json file passes validation. (It is very easy to forget a quote, colon or comma and/or add one too many)
  • Once committed, you can check the build results at http://lemongrass.gpolab.bbn.com:8080/ to make sure the commit didn't break anything.

2.2.2 Create a minor Ops Monitoring release

  • Checkout the master branch of the repository: git checkout master
  • Merge the branch you used to commit the ops config changes. For example if you used the develop branch: git merge develop --no-ff && git push
  • Create a new minor release. If the previous release was 2.0.n, create 2.0.n+1. In the top directory of the repository, do: cm/make_release v2.0.n+1
  • When asked to enter the tag description, use:
    Ops Monitoring version 2.0.n+1
     - Added site X to ops config json file.
    
  • When the release script has finished executing, there are 2 files that have been built: ops-monitoring/release/ops-monitoring.v2.0.n+1.tar.gz and /ops-monitoring/release/ops-monitoring.gpo-only.v2.0.n+1.tar.gz.

2.2.3 Install the new release on the Ops Config data store

Currently, the ops config data store is hosted at opsconfigdatastore.gpolab.bbn.com.

  • Copy the release files to opsconfigdatastore.gpolab.bbn.com: scp ops-monitoring/release/ops-monitoring.v2.0.n+1.tar.gz ops-monitoring/release/ops-monitoring.gpo-only.v2.0.n+1.tar.gz opsconfigdatastore.gpolab.bbn.com:
  • Log on the ops config data store system: ssh opsconfigdatastore.gpolab.bbn.com
  • Untar the release: cd /usr/local; sudo tar xvzf ~/ops-monitoring.v2.0.n+1.tar.gz; sudo tar xvzf ~/ops-monitoring.gpo-only.v2.0.n+1.tar.gz
  • Modify the ownership: sudo chown -R root:root /usr/local/ops-monitoring-v2.0.n+1
  • Stop the apache server: sudo service apache2 stop
  • Change the soft link: sudo rm /usr/local/ops-monitoring && ln -s /usr/local/ops-monitoring-v2.0.n+1 /usr/local/ops-monitoring
  • Restart apache: sudo service apache2 start

2.2.4 Verify that the new site is monitored

Immediately after restarting Apache on the ops config data store, the json response from the data store should include the new section.

curl -k --cert <collector cert> https://opsconfigdatastore.gpolab.bbn.com/info/opsconfig/geni-prod

Note: The collector cert is one of the crypto certificates issued by the GENI Clearing House to each of the different interested parties of the Ops monitoring project.

Within an hour or so of the new minor release installed on the ops config data store, the GENI Monitoring System should list the new aggregate.

3. GMOC Response

The GMOC implements the actions outlined and updates the ticket to capture the actions taken.

3.1 Implement Response

The GMOC executes the steps outlined. If GMOC finds procedure to be lacking, then steps should be taken to get the procedures updated.

4. Resolution

After notification from GPO, GMOC will verify that the aggregate is part of the GENI Monitoring System list of monitored aggregate.

4.1 Document Resolution and Close Ticket

GMOC captures that the site was added to the monitoring system in the ticket and closes the ticket. This should result in notification back to the problem reporter.