Changes between Version 20 and Version 21 of GENIRacksAdministration


Ignore:
Timestamp:
10/03/13 13:19:13 (10 years ago)
Author:
lnevers@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIRacksAdministration

    v20 v21  
    44
    55This page describes GENI racks administrative tasks and duties associated with each GENI rack.  For each rack type, a site contact coordinates delivery, installation, configuration, and maintenance of the rack.  In this very important role, you can rely on GPO support.   Please contact us at [mailto:help@geni.net] for any questions.  The GPO also provides a real-time public IRC chat room where engineers are often available, `chat.freenode.net` channel `#geni`, for debugging any issues you may encounter.  See [wiki:HowTo/ConnectToGENIChatRoom] for details.
     6[[PageOutline]]
    67
    7 == Site Requirements and Rack Installation ==
     8= GENI Rack Administration =
    89
    9 The site contact works with the organization deploying the rack (GPO, RENCI, or HP) to get get the rack (ExoGENI, InstaGENI, or Starter respectively) and site requirements for their specific site networks defined.  The site requirements include:
    10  * Network Setup - Define how the rack will connect to the Internet and to the GENI backbones. Ex Regional connections, connection speed, VLANs to be used, etc.
    11  * Site Security Requirements- Determine engineering and procedures needed for rack connectivity, such as FlowVisor rules, IP filters, etc.
    12  * Address assignment for rack components - Define which addresses, subnet masks, and routes need to be configured for the rack components.
    13  * Power requirements - Define which PDU and related power equipment matches on site power availability
    14  * Administrative accounts - Setup site administrator accounts and other accounts that may be needed to manage access to the rack resources.  Sites can choose to operate without administrator accounts if they prefer that option, and administrative responsibility will be delegated to a GENI operations group.
    15  * Delivery logistics - Details for ''where'' the rack is to be delivered, ''who'' will accept the delivers, and ''when'' the delivery will take place.  Also covers identifying and plannign for any physical restrictions for the rack delivery.
    16  * GENI Agreements - Sites need to read and agree to basic usage agreements for GENI racks.  All sites should read and understand the [wiki:GpoDoc#GENIRecommendedUsePolicy GENI Recommended Usage Policy] and the [wiki:GpoDoc#GENIAggregateProvidersAgreement GENI Aggregate Providers Agreement].  There may also be additional specific agreements for the ExoGENI and InstaGENI projects.
     10This page captures a site administrator's perspective on GENI rack installation, maintenance, and support.   Detailed questions and answers that other site admins have found helpful are on the '' '''[wiki:LuisaSandbox/GENIRacksAdmin/InstaGENIFAQ InstaGENI FAQ]''' '' and '' '''[wiki:LuisaSandbox/GENIRacksAdmin/ExoGENIFAQ ExoGENI FAQ]''' '' pages.
    1711
    18 In addition a site contact has continual administrative responsibilities that include:
    19  * managing user accounts for experimenters and for other operators.
    20  * managing updates for software and firmware, depending on the rack type.  (See sections below for specific rack type)
    21  * accessing compute and network resource consoles in the rack to support/manage experimenter resources or debug problems.
    22  * ensure that security and usage procedures are followed.
     12== Before Your Rack Arrives ==
    2313
    24 = ExoGENI Administration =
     14Each GENI site should provide the following high-level support for a GENI rack:
    2515
    26 See the [wiki:GENIRacksHome/ExogeniRacks/Administration ExoGENI Administration] page for rack administration documentation.
    27 ----
     16 * Provide space, power, security (as with other site resources)
     17 * Provide at least 1Gbps !OpenFlow/SDN data path between the site rack and the upstream research I2/NLR network (10GE or 40GE is also possible, depending on the type of rack)
     18 * Provide an SDN path from the GENI rack to downstream campus subscribers who are interested in connecting to GENI  (SDN paths are typically Layer2 Ethernet VLANs)
     19 * Operate with up-to-date GENI-specified software (e.g. AM API, !OpenStack, Xen)
     20 * Provide no-cost access to rack resources for GENI authorized users at other campuses  (Access is controlled by experimenters' and operators' individual GENI credentials)
     21 * Provide points of contact for the GENI response team, as specified in the [http://groups.geni.net/geni/attachment/wiki/ComprehensiveSecurityPgm/Aggregate%20Provider%20Agreement%20v04.pdf Aggregate Provider Agreement]. The points of contact support debugging, software updates, and Emergency Stop requests for the rack.
     22
     23The Compute and !OpenFlow (FOAM) aggregates in the rack require a small amount of attention from an administrator on an ongoing basis, when there are no emergencies or outages.  The amount of time you spend on these administrative tasks mostly depends on your site policy.  To reduce the administrative workload, you may request to have the FOAM aggregate configured for auto approval of some or all !OpenFlow experiment requests. If not requested, FOAM will require manual administrative approval for all requests.  For more detail see the [http://groups.geni.net/geni/wiki/OpenFlow/FOAM/AdminIntro FOAM administration introduction] page.
     24
     25When monitoring reports a problem with your rack (e.g. an apparent power failure), we will email your site contact mailing list to ask for help with resolving the issue.  Occasionally the rack team may also ask for help from a site contact (e.g. confirming bad hardware).  The rack teams currently handle all software upgrades to the rack without requiring help from a site administrator.  Site contacts must notify the GMOC when there is a scheduled outage or a problem observed by the site.  Here is some more info on how to report a problem at the GENI GMOC [http://gmoc.grnoc.iu.edu/gmoc/support/report-a-problem.html Report a Problem] page.  There are additional GMOC [http://gmoc.grnoc.iu.edu/gmoc/index/support.html support] pages where you can view the ''Operations Calendars'', ''Operations Bi-Weekly Reports'' and existing ''Trouble Tickets''.
    2826 
    29 = InstaGENI Administration =
     27== InstaGENI Rack Deployment ==
    3028
    31 See the [wiki:GENIRacksHome/InstageniRacks/Administration InstaGENI Administration] page for rack administration documentation.
     29The InstaGENI team sends an email with information about what's needed for the rack installation and warranty support to the main IT contact for a potential new rack site.  Site contacts fill out an InstaGENI [http://www.protogeni.net/wiki/instageni/checklist site questionnaire] as soon as possible after ordering their rack.  The questionnaire requests networking details and administrator account information that InstaGENI engineers need to pre-configure your rack. The team creates your initial administrator account, which provides access to all devices in the rack via SSH public keys, and allows you to create additional administrator accounts.  The GPO coordinates additional site configuration and network integration activities for the deployment of your rack. You will be asked to play a role or provide information as part of the four activities below:
    3230
    33 ----
     31  1. The GPO will contact you to determine how your rack will connect to the GENI core networks.  The GPO will help engineer your site's layer 2 data paths and the shared, exclusive and stitching VLAN options to configure for your network connections. For details see the [http://groups.geni.net/syseng/wiki/sydSandbox/Connectivity/endToEndConnectionsOF Meso-scale Connection Requirements] page. The GPO also adds a test point host to monitor your site's !OpenFlow access.  GMOC monitoring is set up to gather operational data about your site. There are no actions required of a site administrator to set up monitoring, but if you are interested in more details, see the [http://groups.geni.net/geni/wiki/TangoGeniMonitoring GENI Operations Monitoring] page. During this integration phase, your sites will need to accept the [http://groups.geni.net/geni/attachment/wiki/ComprehensiveSecurityPgm/Aggregate%20Provider%20Agreement%20v04.pdf?format=raw Aggregate Provider Agreement].
    3432
    35 = Starter Racks Administration =
     33  2. The GPO will execute experimenter, administrative and monitoring tests on each InstaGENI site.  As part of these tests, you will be asked to add an initial administrator account for the GPO,  and you will be asked to remove the account upon test completion. Instructions for adding and removing InstaGENI Administrative accounts can be found [http://www.protogeni.net/ProtoGeni/wiki/RackAdminAccounts here].  The InstaGENI New Site Confirmation Test Plan can be found [http://groups.geni.net/geni/wiki/GENIRacksHome/InstageniRacks/SiteConfirmationTests here], and status for all InstaGENI Site Confirmation tests can be found [http://groups.geni.net/geni/wiki/GENIRacksHome/InstageniRacks/ConfirmationTestStatus here].
    3634
    37 This section provides a few example of the administrative task on a Starter Rack.  Example administrative tasks for ExoGENI and InstaGENI racks are different, but will accomplish similar functions.
     35  3. As part of the pre-production activities, each site is asked to run tests to verify readiness for production. This testing opens your site to experimenters and verifies your ability to support them.  Site administrative contacts also join the response-team@geni.net mailing list at this stage. (This is a GENI-wide mailing list whose members respond to issues with GENI production resources). Finally, the site works with the GMOC to provide emergency contact information, and transitions to being actively supported by the GMOC.
    3836
    39 == Get Starter rack Accounts ==
     37  4. Your site moves to production status, following the [wiki:ProductionRelease Production Release] procedure.  This includes four steps:
     38           ''a)'' Site added to the the [wiki:GeniClearinghouse GENI clearinghouse] and to the Utah [http://www.protogeni.net/wiki/ClearingHouseDesc ProtoGENI clearinghouse], [[BR]]
     39           ''b)'' Site added to the aggregates listed in the [https://portal.geni.net/ GENI Portal] and in the [http://trac.gpolab.bbn.com/gcf/wiki Omni] package, [[BR]]
     40           ''c)'' Site marked as a production resource in monitoring and [[BR]]
     41           ''d)'' Site officially announced and tracked as production.
    4042
    41 '''Requesting an account'''
     43   ''Note:'' For a list of sites currently in production, see the [wiki:GENIProduction GENI Production Resources] page.
    4244
    43 Site operators should contact [mailto:gpo-infra@geni.net] to request sudo-capable login accounts on the Starter rack hosts by providing:
    44  * Preferred username
    45  * Preferred fullname
    46  * SSH public key for remote login
    47  * Hashed password for sudo obtained by running:
    48 {{{
    49 openssl passwd -1
    50 }}}
    51  and typing a password twice.  The resulting string should be of the form: `$1$xxxxxxxx$xxxxxxxxxxxxxxxxxxxxxx`
     45== InstaGENI Rack Maintenance ==
    5246
    53 ''' Policies for Unix account use '''
    54  * Remote account access is via public-key SSH only (no password-based login).
    55  * Do not run interactive sessions as root (don't use `sudo bash`, but instead run individual commands under sudo for logging).
    56  * Do not share account credentials.  We are happy to create individual accounts, or to give staffers who don't have logins access to our emergency account for outage debugging.
    57  * GPO staffers actively manage these systems using the puppet configuration management utility.  If you need to modify a system, please e-mail us at [mailto:gpo-infra@geni.net] to ensure that the desired change takes effect.
     47All rack maintenance activities are announced by the GMOC on the response-team@geni.net mail list. GMOC also provides an operations calendar for both scheduled and unscheduled maintenance activities [http://globalnoc.iu.edu/gmoc/index/support/gmoc-operations-calendars.html here].
    5848
    59 '''Accounts on non-Unix rack devices'''
     49Currently, the InstaGENI team centrally handles updating software on all racks. Updates take place in a maintenance window each Friday at 3pm (Pacific).
    6050
    61 Please contact [mailto:gpo-infra@geni.net] if you need login access to:
    62  * Control router or dataplane switch
    63  * IP KVM for remote console access
    64  * PDU for remote power control
     51== InstaGENI Rack Monitoring ==
    6552
    66 == Access Devices Consoles ==
    67 ''' Compute Resource consoles'''
    68  * The fold-out console in the rack can be used to view the consoles of any of the hosts in the rack.
    69  * The KVM hotkey for changing which device is displayed is `Ctrl Ctrl`.
    70 
    71 '''Network Devices Consoles'''
    72 The `monitor1` node in each rack can be used as a serial console for network devices located in that rack.
    73  * Login to `monitor1` using the console
    74  * Use screen to access the desired serial device, e.g.:
    75 {{{
    76 screen /dev/ttyS0
    77 }}}
    78  * When done using screen, kill the session by pressing: `Ctrl-a K`
    79 
    80 == Monitoring Starter rack Health ==
    81 
    82 '''Service Health'''
    83 
    84 GPO uses Nagios as a front-end for alerting about rack problems. The following services are monitored in the Starter Racks:
    85  * Resource problems with CPU, swap, or disk space on each host.
    86  * IP connectivity failures from the rack server to commodity internet (Google) and to the GPO lab.
    87  * Excessive CPU usage and excessive uplink broadcast traffic on the experimental switch.
    88  * Problems with standard experimental use of the Eucalyptus aggregate.
    89 
    90 The current state of monitored hosts and services at a given city can be viewed at:
    91  * [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=cities-cha&style=detail Chattanooga Status]
    92  * [http://monitor.gpolab.bbn.com/nagios/cgi-bin/status.cgi?hostgroup=cities-cle&style=detail Cleveland Status]
    93 
    94 If you would like to be added to any of these notifications, please contact us at [mailto:gpo-infra@geni.net].
    95 
    96 '''Compute Resources Health'''
    97 Unix hosts report system health information via ganglia to the [http://monitor.gpolab.bbn.com/ganglia/ GPO Monitoring Server]:
    98  * [http://monitor.gpolab.bbn.com/ganglia/?c=Chattanooga Chattanooga hosts]
    99  * [http://monitor.gpolab.bbn.com/ganglia/?c=Cleveland Cleveland hosts]
    100 
    101 '''Network Devices Health'''
    102 Network devices are polled for system health via SNMP, and that data is also available at the [http://monitor.gpolab.bbn.com/ganglia/ GPO Monitoring Server]:
    103  * [http://monitor.gpolab.bbn.com/ganglia/?c=Chattanooga Chattanooga devices]
    104  * [http://monitor.gpolab.bbn.com/ganglia/?c=Cleveland Cleveland devices]
    105 
    106 If you need read-only SNMP access to the network devices in a Starter rack, please contact [mailto:gpo-infra@geni.net]
    107 
    108 == Perform an experiment in your Starter rack ==
    109 
    110 '''1.''' In this example, we specify 2 VM instances using the same image, it is also possible to specify 2 separate instances using different images:
    111 {{{
    112 $ euca-run-instances -k mykey -n 2 emi-05AC15E0
    113 RESERVATION     r-47F80755      agosain agosain-default
    114 INSTANCE        i-45E007BF      emi-05AC15E0    0.0.0.0 0.0.0.0 pending mykey   0               m1.small        2011-10-21T02:06:22.451Z   cha-euca        eki-8F5A137E    eri-CB4F1461
    115 INSTANCE        i-335C067F      emi-05AC15E0    0.0.0.0 0.0.0.0 pending mykey   1               m1.small        2011-10-21T02:06:22.453Z   cha-euca        eki-8F5A137E    eri-CB4F1461
    116 }}}
    117 
    118 '''2.''' Login to the VMs. When connecting to your image you must use the private key from the Eucalyptus keypair you created above. The {{{-i}}} flag lets you specify the private key. Each image also has a specified username that you will use on instances. In the case of the Ubuntu 10.04 (Lucid) image, the username is "ubuntu". So the complete ssh command for this image is:
    119 {{{
    120 $ ssh -i mykey.priv ubuntu@192.1.243.56
    121 $ ssh -i mykey.priv ubuntu@192.1.243.53
    122 }}}
    123 
    124 '''3.''' Now that the VMs are running you can use an iperf client and server setup to exchange traffic between the two VMs. First, install the Iperf application on both VMs:
    125 {{{
    126 apt-get install iperf
    127 }}}
    128 Them, start the iperf server:
    129 {{{
    130 ubuntu@ip-10-153-0-67:~$ iperf -s
    131 ------------------------------------------------------------
    132 Server listening on TCP port 5001
    133 TCP window size: 85.3 KByte (default)
    134 ------------------------------------------------------------
    135 [  4] local 10.153.0.67 port 5001 connected with 10.153.0.66 port 52930
    136 [ ID] Interval       Transfer     Bandwidth
    137 [  4]  0.0-30.0 sec  1.92 GBytes    549 Mbits/sec
    138 }}}
     53To access operational data gathered for your site see the [https://gmoc-db.grnoc.iu.edu/protected/ GMOC Live Database]. Note that you will need an OpenID, !InCommon, or GlobalNOC account to access your site data.
    13954
    14055
    141 '''4.''' Then, connect to the private IP address of other VM and start the iperf client:
    142 {{{
    143 ubuntu@ip-10-153-0-66:~$ iperf -c 10.153.0.67 -t 30
    144 ------------------------------------------------------------
    145 Client connecting to 10.153.0.67, TCP port 5001
    146 TCP window size: 16.0 KByte (default)
    147 ------------------------------------------------------------
    148 [  3] local 10.153.0.66 port 52930 connected with 10.153.0.67 port 5001
    149 [ ID] Interval       Transfer     Bandwidth
    150 [  3]  0.0-30.0 sec  1.92 GBytes    549 Mbits/sec
    151 }}}
    152 '''5.''' Terminate your VM instances after you have completed your tests:
    153 {{{
    154 euca-terminate-instances i-38E807A1
    155 }}}
     56== ExoGENI Rack Deployment ==
    15657
    157 == Install a VM image on your Starter rack ==
     58ExoGENI rack deployments
    15859
    159 The following procedure outlines an experimenter view into using the Starter racks Eucalyptus VMs as a resource for an experiment.
     60  1. An ExoGENI engineer will contact your site to collect information for configuring and installing your rack.  The racks are built at IBM, and pre-configured at the IBM integration center.  The ExoGENI rack team supports the initial rack installation remotely, once you've connected your new rack to power and external network cables.  The ExoGENI operations team verifies connectivity to the rack components and completes detailed configuration for all rack components (e.g. storage and switches).  They then configure ORCA and !OpenFlow software, along with cloud and GENI federation software. You may be asked to provide some support for these activities. 
    16061
    161 To request an account for a GENI Starter Rack send an email request to [mailto:gpo-infra@geni.net] including the following details:
    162  * Preferred username and full name.
    163  * Public SSH public key for remote login into rack resources.
    164  * Provide an MD5 hash of the password for sudo use. Generated by {{{openssl passwd -1}}}
     62  2.  The GPO and ExoGENI teams will coordinate to integrate your site into the GENI network. For details, see the [http://groups.geni.net/syseng/wiki/sydSandbox/Connectivity/endToEndConnectionsOF Meso-scale Connection Requirements] page. The GPO also adds a test point host to monitor your site's !OpenFlow access. During this integration phase, your site will need to accept the [http://groups.geni.net/geni/attachment/wiki/ComprehensiveSecurityPgm/Aggregate%20Provider%20Agreement%20v04.pdf?format=raw Aggregate Provider Agreement].
    16563
    166 1. Install Euca2ools (where???), which are command-line tools for interacting with the Eucalyptus open-source cloud-computing infrastructure.
    167 {{{
    168   $ sudo apt-get install euca2ools
    169 }}}
     64  3. The GPO will execute experimenter, administrative and monitoring tests on each site ExoGENI site.  As part of this test, the GPO requests an administrator account from the ExoGENI team by sending email to exogeni-ops@renci.org to request LDAP credentials. Note that only ExoGENI team can add accounts to the central LDAP master and that administrative privileges are granted via sudo, which is dependent upon LDAP group membership. For more information see the [https://wiki.exogeni.net/doku.php?id=public:operators:start Rack Operators] page.
    17065
    171 2. Install Euca credentials. These credentials can be downloaded as a package from your Eucalyptus web site. If you do not have an account you can request one at ????  Once the account is verified and approved, go to the "Credentials" tab. In the "Credentials ZIP-file" section, click on the "Download Credentials" button. Locate the downloaded zip file (the location depends on your OS and web browser) and move it to a working directory.
     66  4. As part of the pre-production activities, each site is asked to run tests to verify readiness for production. This testing opens your site to experimenters and verifies your ability to support them.  Site administrative contacts also join the response-team@geni.net mailing list at this stage. (This is a GENI-wide mailing list whose members respond to issues with GENI production resources). The site alos works with the GMOC to provide emergency contact information, and transitions to being actively supported by the GMOC. Additionally, site contacts should register for the geni-orca-users@googlegroups.com mail list.
    17267
    173 3. Unpack the credential and source the environment:
    174 {{{
    175   $ mkdir ~/euca
    176   $ mv ~/Downloads/euca2-myaccount-x509.zip ~/euca
    177   $ cd ~/euca
    178   $ unzip euca2-myaccount-x509.zip
    179   $ . eucarc
    180 }}}
     68  5.  Your site moves to production status, following the [wiki:ProductionRelease Production Release] procedure.  This includes four steps:
     69           ''a)'' Site added to the the [wiki:GeniClearinghouse GENI clearinghouse] and to the Utah [http://www.protogeni.net/wiki/ClearingHouseDesc ProtoGENI clearinghouse], [[BR]]
     70           ''b)'' Site added to the aggregates listed in the [https://portal.geni.net/ GENI Portal] and in the [http://trac.gpolab.bbn.com/gcf/wiki Omni] package, [[BR]]
     71           ''c)'' Site marked as a production resource in monitoring and [[BR]]
     72           ''d)'' Site officially announced and tracked as production.
    18173
    182 4. Add firewall rules to your euca instance, below ssh and ping are allowed in the example:
    183 {{{
    184   $ euca-authorize -P tcp -p 22 -s 0.0.0.0/0 default
    185   $ euca-authorize -P icmp -t -1:-1 -s 0.0.0.0/0 default
    186 }}}
     74   ''Note:'' For a list of sites currently in production see the [wiki:GENIProduction GENI Production Resources] page.
    18775
    188 5. Generate key pair to connect to eauca instance:
    189 {{{
    190   $ euca-add-keypair mykey > mykey.priv
    191   $ chmod 600 mykey.priv
    192 }}}
    19376
    194 6. Show available images, start a euca instance with your newly generated keys:
    195 {{{
    196   $ euca-describe-images   # show list of available images
    197   IMAGE emi-48AA122D  ubuntu-9.04/ubuntu.9-04.x86-64.img.manifest.xml   chaos   available  public  x86_64       machine
    198   IMAGE emi-62E51726  ubuntu-10.04/lucid-server-cloudimg-amd64.img.manifest.xml tmitchel  available  public  x86_64 machine             
    199   $ euca-run-instances -k mykey emi-62E51726
    200 }}}
     77== ExoGENI Rack Maintenance ==
    20178
    202 7. Set public address for euca VM created above, by requesting for an address to be allocated and then assigning it to the specific euca instance:
    203 {{{
    204   $ euca-allocate-address    # will show address that is allocated to you
    205   ADDRESS       192.1.243.55
    206   $ euca-associate-address -i i-38E807A1 192.1.243.55 
    207 }}}
     79All rack maintenance activities are announced by the ExoGENI team on the geni-orca-users@googlegroups.com mail list.  GMOC also provides an operations calendar for both scheduled and unscheduled maintenance activities [http://globalnoc.iu.edu/gmoc/index/support/gmoc-operations-calendars.html here].
    20880
    209 8. You may now connect into the Euca VM:
    210 {{{
    211   $ ssh -i mykey.priv ubuntu@192.1.243.55
    212 }}}
    21381
    214 Your Euca instance may now be used to run an experiment.
     82== ExoGENI Rack Monitoring ==
    21583
     84To access operational data gathered for your site see the [https://gmoc-db.grnoc.iu.edu/protected/ GMOC Live Database]. Note that you will need an !OpenID, !InCommon, or GlobalNOC account to access your site data.
     85
     86ExoGENI racks also includes a Nagios installation which uses the Check_MK plugin to retrieve data and WATO, a Check_MK's Web Administration Tool.  Each rack supports the WATO web interface to get access to rack statistics. For example, Nagios information for the BBN rack can be accessed at ''!https://bbn-hn.exogeni.net/rack_bbn/check_mk/'', links for your site will vary based on the site name, simply replace ''bbn'' in the URL with your site name to get access to the monitoring date. For example, if your site is FIU, the URL is ''!https://fiu-hn.exogeni.net/rack_fiu/check_mk/''.
    21687
    21788