wiki:GENIRacksHome/OpenGENIRacks/InstallationGuideGrizzlyOffline

Version 5 (modified by rrhain@bbn.com, 10 years ago) (diff)

--

GRAM Offline Installation Guide

Introduction

This document describes the procedures and context for installing GRAM software. There are these aspects which will be covered individually:

  • Configuration Overview
  • Hardware Requirements
  • Software Requirements
  • Network configuration
  • OpenStack Installation and Configuration
  • GRAM Installation and Configuration

Hardware Requirements

The minimum requirements are:

  • 1 Control Server
  • 1 Compute Server (Can be more)
  • 1 Switch with at least (number of servers)*3 ports [For non-dataplane traffic]
  • 1 OpenFlow Switch with at least (number of servers) ports [For dataplane traffic]
  • Each server should have at least 4 Ethernet ports

Software Requirements

Ports

The following ports will be used by GRAM components. Verify that these ports are not already in use. If so, change the configuration of the gram component below to use a different port.

  • Controller node
    • 8000: GRAM Clearinghouse (Unless you are using a different clearinghouse). See this section to change this port.
    • 8001: GRAM Aggregate Manager. See this section to change this port.
    • 9000: VMOC Default Controller
    • 7001: VMOC Management. See this section to change this port.
    • 6633: VMOC

Openstack Requirements

  • This guide was written for Ubuntu 12.04
  • All dependencies will be downloaded for the Ubuntu repository.

Image requirements

  • Currently, nova images must meet the following requirements for GRAM:
    1. Must have the following packages installed:
      • cloud-utils
      • openssh-server
      • bash
      • apt

Configuration Overview

OpenStack and GRAM present software layers on top of rack hardware. It is expected that rack compute nodes are broken into two categories:

  • Controller Node: The central management and coordination point for OpenStack and Gram operations and services. There is one of these per rack.
  • Compute Node: The resource from which VM's and network connections are sliced and allocated on requrest. There are many of these per rack.
  • OpenStack and GRAM require establishing four distinct networks among the different nodes of the rack:
    • Control Network: The network over which OpenFlow and GRAM commands flow between control and compute nodes. This network is NOT OpenFlow controlled and has internal IP addresses for all nodes.
    • Data Network: The allocated network and associated interfaces between created VM's representing the requested compute/network resource topology. This network IS OpenFlow controlled.
    • External Network: The network connecting the controller node to the external internet. The compute nodes may or may not also have externaly visible addresses on this network, for convenience.
    • Management Network: Enables SSH entry into and between created VM's. This network is NOT OpenFlow controlled and has internal IP addresses for all nodes.

The mapping of the networks to interfaces is arbitrary and can be changed by the installer. For this document we assume the following convention:

  • eth0: Control network
  • eth1 : Data network
  • eth2 : External network
  • eth3 : Management network

The Controller node will have four interfaces, one for each of the above networks. The Compute nodes will have three (Control, Data and Management) with one (External) optional.

More details on the network configuration are provided in wiki:"ArchitectureDescription".

Network Configuration

The first step of OpenStack/Gram configuration is establishing the networks described above.

We need to define a range of VLAN's for the data network (say, 1000-2000) and separate VLANs for the external, control, and management networks (say, 5, 6, and 7) on the management switch. The external and control network ports should be configured untagged and the management port should be configured tagged.

The Control, External and Management networks are connected between the rack management switch and ethernet interfaces on the Controller or Compute nodes.

The Data network is connected between the rack OpenFlow switch and an ethernet interface on the Control and Compute nodes.

No image "GRAMSwitchDiag.jpg" attached to GENIRacksHome/OpenGENIRacks/InstallationGuideGrizzlyOffline

OpenFlow Switch for the Data Network

The ports on the OpenFlow switch to which data network interfaces have been connected need to be configured to trunk the VLANs of the data network. How this is done varies from switch to switch but typical commands look something like

conf t
  vlan <vlanid>
    tagged <ports>
    exit
  exit
write memory

On the OpenFlow switch, for each VLAN used in the data network (1000-2000), set the controller to point to the VMOC running on the control node. The command will vary from switch to switch but this is typical:

conf t
  vlan <vlanid>
    openflow controller tcp:<controller_addr>:6633 enable
    openflow fail-secure on
    exit
  exit
write memory

For the Dell Force 10 switch, The following lines set up the vlan trunking in the data network, and sets up the default openflow controller on the VMOC.

!
interface Vlan 1001 of-instance 1
 no ip address
 tagged TenGigabitEthernet 0/0-2
 no shutdown
!
.........
!
openflow of-instance 1
 controller 1 128.89.72.112  tcp
 flow-map l2 enable
 flow-map l3 enable
 interface-type vlan
 multiple-fwd-table enable
 no shutdown
!

The above snippet assumes that the controller node, running VMOC, is at 128.89.72.112

For a sample configuration file for the Dell Force10, see attachment:force10-running

Management Switch =

The ports on the management switch to which management network interfaces have been connected need to be configured to trunk the VLAN of the management network. How this is done varies from switch to switch, but typical commands look something like:

conf t
  int gi0/1/3
  switchport mode trunk
  switchport trunk native vlan 1
  switchport trunk allowed vlan add 7
  no shutdown
  end
write memory

Here is a config file for a Dell Powerconnect 7048: attachment:powerconnect-running. We use VLAN 200, 300 and 2500 for the control plane, management plane and external network respectively.

GRAM and OpenStack Installation and Configuration

GRAM provides a custom installation script for installing and configuring OpenStack/Folsom particularly for GRAM requirements as well as GRAM itself.

  1. Install fresh Ubuntu 12.04 image on control and N compute nodes
  • From among the rack nodes, select one to be the 'control' and others to be the compute nodes. The control node should have at least 4 NIC's and the compute should have at least 3 NIC's.
  • Install Ubuntu 12.04 image on each selected rack. Server is preferred
  • Create 'gram' user with sudo/admin privileges
  • If there are additional admin accounts, you must manually install omni for each of these accounts.

1a. Set up a local repository with the packages and dependencies

  • unpack gram_pkgs.gz
    cd /home/gram/
    tar -zxvf gram_pkgs.tar.gz
    
  • add gram_pkgs to the list of repositories:

Edit "/etc/apt/sources.list" add the line

deb file:/home/gram/pkgs ./ 

and comment off all other sources.

  • run the command
    sudo apt-get update 
    
  1. Install mysql on the control node
sudo apt-get install mysql-server python-mysqldb
  • You will be prompted for the password of the mysql admin. Type it in (twice) and remember it: it will be needed in the config.json file for the value of mysql_password.
  1. Install OpenStack and GRAM on the control and compute nodes
  • Get the DEBIAN files gram_control.deb and gram_compute.deb. These are not available on an apt server currently but can be obtained by request from gram-dev@bbn.com.
sudo apt-get install -y ubuntu-cloud-keyring --force-yes
sudo apt-get install gdebi-core
  • Install the gram package (where <type> is control or compute depending on what machine type is being installed):
    sudo gdebi gram_<control/compute>.deb
    
  • Edit /etc/gram/config.json. NOTE: This is the most critical step of the process. This specifies your passwords, network configurations, so that OpenStack will be configured properly. [See section "Configuring config.json" below for details on the variables in that file]
  • Run the GRAM installation script (where <type> is control or compute depending on what machine type is being installed):
    sudo /etc/gram/install_gram.sh <control/compute>
    
  • Configure the OS and Network. You will lose network connectivity in the step, it is recommended that the following command is run directly on the machine or using the Linux 'screen' program.
    sudo /tmp/install/install_operating_system_[control/compute].sh
    
  • Configure everything else. Use a root shell
    /tmp/install/install_[control/compute].sh
    

This last command will do a number of things:

  • Read in all apt dependencies required
  • Configure the OpenStack configuration files based on values set in config.json
  • Start all OpenStack services
  • Start all GRAM services

If something goes wrong (you'll see errors in the output stream), then the scripts it is running are in /tmp/install/install*.sh (install_compute.sh or install_control.sh). You can usually run the commands by hand and get things to work or at least see where things went wrong (often a problem in the configuration file).

  • Set up the namespace only on the control node. Use a root shell.
  1. Check that sudo ip netns has two entries - the qrouter-* is the important one.
  2. If qdhcp-* namespace is not there, type sudo quantum-dhcp-agent-restart
  3. If you still cannot get 2 entries, try restarting all the quantum services:
  • sudo service quantum-server restart
  • sudo service quantum-plugin-openvswitch-agent restart
  • sudo service quantum-dhcp-agent restart
  • sudo service quantum-l3-agent restart

And then in the root shell, type

export PYTHONPATH=$PYTHONPATH:/opt/gcf/src:/home/gram/gram/src:/home/gram/gram/src/gram/am/gram
python /home/gram/gram/src/gram/am/gram/set_namespace.py
  1. Edit /etc/hosts - Not clear that this is necessary anymore.

Each control/compute node must be associated with the external ip address. It should look similar to:

127.0.0.1       localhost
128.89.72.112   bbn-cam-ctrl-1
128.89.72.113   bbn-cam-cmpe-1
128.89.72.114   bbn-cam-cmpe-2
  1. Installing OS Images : Only on the Control Node

At this point, OS images must be placed in OpenStack Glance (the image repository service) to support creation of virtual machines.

The choice of images is installation-specific, but these commands are provided as a reasonable example of a first image, a 64-bit Ubuntu 12.04 server in qcow2 format (http://cloud-images.ubuntu.com/releases/precise/release/ubuntu-12.04-server-cloudimg-amd64-disk1.img)

wget http://cloud-images.ubuntu.com/releases/precise/release/ubuntu-12.04-server-cloudimg-amd64-disk1.img
glance image-create --name "ubuntu-12.04" --is-public=true \
--disk-format=qcow2 --container-format=bare < \
ubuntu-12.04-server-cloudimg-amd64-disk1.img
#Make sure your default_OS_image in /etc/gram/config.json is set to 
# the name of an existing image

Another image, a 64-bit Fedora 19 in qcow2 format (http://download.fedoraproject.org/pub/fedora/linux/releases/19/Images/x86_64/Fedora-x86_64-19-20130627-sda.qcow2)

wget http://download.fedoraproject.org/pub/fedora/linux/releases/19/Images/x86_64/Fedora-x86_64-19-20130627-sda.qcow2
glance image-create --name "fedora-19" --is-public=true \
--disk-format=qcow2 --container-format=bare < \
Fedora-x86_64-19-20130627-sda.qcow2

Another image, a 64-bit CentOS 6.5 in qcow2 format (http://download.fedoraproject.org/pub/fedora/linux/releases/19/Images/x86_64/Fedora-x86_64-19-20130627-sda.qcow2)

wget http://repos.fedorapeople.org/repos/openstack/guest-images/centos-6.5-20140117.0.x86_64.qcow2 
glance image-create --name "centos-6.5" --is-public=true \
--disk-format=qcow2 --container-format=bare < \
centos-6.5-20140117.0.x86_64.qcow2 

*In the event, these links no longer work, copies of the images have been put on an internal projects directory in the GPO infrastructure.

  1. Edit gcf_config

If using the GENI Portal as the clearinghouse:

If using the local gcf clearinghouse, set up gcf_config: In ~/.gcf/gcf_config change hostname to be the fully qualified domain name of the control host for the clearinghouse portion and the aggregate manager portion (2x) eg,

host=boscontroller.gram.gpolab.bbn.com

Change the base_name to reflect the service token (the same service token used in config.json). Use the FQDN of the control for the token.

base_name=geni//boscontroller.gram.gpolab.bbn.com//gcf

Generate new credentials:

cd /opt/gcf/src
./gen-certs.py --exp -u <username>
./gen-certs.py --exp -u <username> --notAll

This has to be done twice as the first creates certificates for the aggregate manager and the clearinghouse. The second creates the username certificates appropriately based on the previous certificates.

Generate public key pair

ssh-keygen -t rsa -C "gram@bbn.com"

Modify ~/.gcf/omni_config to reflect the service token used in config.json: (Currently using FQDN as token)

authority=geni:boscontroller.gram.gpolab.bbn.com:gcf

Set the ip addresses of the ch and sa to the external IP address of the controler

ch = https://128.89.91.170:8000
sa = https://128.89.91.170:8000
or
ch = https://boscontroller.gram.gpolab.bbn.com:8000
sa = https://boscontroller.gram.gpolab.bbn.com:8000

Configuring config.json

The config.json file (in /etc/gram) is a JSON file that is parsed by GRAM code at configre/install time as well as run time.

JSON is a format for expressing dictionaries of name/value pairs where the values can be constants, lists or dictionaries. There are no constants, per se, in JSON, but the file as provided has some 'dummy' variables (e.g. "000001") against which comments can be added.

The following is a list of all the configuration variables that can be set in the config.json JSON file. For some, defaults are provided in the code but it is advised that the values of these parameters be explicitly set.

parameter definition
default_VM_flavor Name of the default VM flavor (if not provided in request RSpec), e.g. 'm1.small'
default_OS_image Name of default VM image (if not provided in request RSpec), e.g. 'ubuntu-12.04'
default_OS_type Name of OS of default VM image, e.g. 'Linux'
default OS_version Version of OS of default VM image, e.g. '12'
external_interface name of the nic connected to the external network (internet) e.g. eth0. GRAM configures this interface with a static IP address to be specified by the user
external_address IP address of the interface connected to the external network
external_netmask netmask associated with the above IP address
control_interface name of the nic that is to be on the control plane
control_address IP address of control address. This should be a private address
data_interface name of the nic that is to be on the data plane
data_address IP address of the data interface
internal_vlans Set of VLAN tags for internal links and networks, not for stitching, this must match the OpenFlow switch configuration
management_interface name of the nic that is to be on the management plane
management_address IP address of the management interface
management_network_name Quantum will create a network with this name to provide an interface to the VMs through the controller
management_network_cidr The cidr of the quantum management network. It is recommended that this address space be different from the addresses used on the physical interfaces (control, management, data interfaces) of the control and compute nodes
management_network_vlan The vlan used on the management switch to connect the management interfaces of the compute/control nodes.
mysql_user The name of the mysql_user for OpenStack operations
mysql_password The password of the mysql_user for OpenStack operations. ([1] above.]
rabbit_password The password for RabbitMQ interface OpenStack operations
nova_password The password of the nova mysql database for the nova user
glance_password The password of the glance mysql database for the glance user
keystone_password The password of the keystone mysql database for the keystone user
quantum_password The password of the quantum mysql database for the quantum user
os_tenant_name The name of the OpenStack admin tenant (e.g. admin)
os_username The name of the OpenStack admin user (e.g. admin)
os_password The password of the OpenStack admin user
os_auth_url The URL for accessing OpenStack authorization services
os_region_name The name of the OpenStack region namespace (default = RegionOne)
os_no_cache Whether to enable/disable caching (default = 1)
service_token The unique token for identifying this rack, shared by all control and compute nodes of the rack in the same OpenStack instance (ie. the name of the rack, suggest FQDN of host)
service_endpoint The URL by which OpenStack services are identified within keystone
public_gateway_ip The address of the default gateway on the external network interface
public_subnet_cidr the range of address from which quantum may assign addresses on the external network
public_subnet_start_ip the first address of the public addresses available on the external network
public_subnet_end_ip the last address of the public addresses available on the external network
metadata_port The port on which OpenStack shares meta-data (defult 8775)
backup_directory The directory in which the GRAM install process places original versions of config files in case of the need to roll-back to a previous state.
allocation_expiration_minutes Time at which allocations expire (in minutes), default=10
lease_expiration_minutes Time at which provisioned resources expire (in minutes), default = 7 days
gram_snapshot_directory Directory of GRAM snapshots, default '/etc/gram/snapshots'
recover_from_snapshot Whether GRAM should, on initialization, reinitialize from a particular snapshot (default = None or "" meaning no file provided)
recover_from_most_recent_snapshot Whether GRAM should, on initialization, reinitialize from the most recent snapshot (default = True)
snapshot_maintain_limit Number of most recent snapshots maintained by GRAM (default = 10)
subnet_numfile File where gram stores the subnet number for last allocated subnet, default = '/etc/gram/GRAM-next-subnet.txt'. Note: This is temporary until we have namespaces working.
port_table_file File where GRAM stores the SSH proxy port state table, default = '/etc/gram/gram-ssh-port-table.txt'
port_table_lock_file File where SSH port table lock state is stored, default = = '/etc/gram/gram-ssh-port-table.lock'
ssh_proxy_exe Location of GRAM SSH proxy utility, which enables GRAM to create and delete proxies for each user requested, default = '/usr/local/bin/gram_ssh_proxy'
ssh_proxy_start_port Start of SSH proxy ports, default = 3000
ssh_proxy_end_port End of SSH proxy ports, default = 3999
vmoc_interface_port Port on which to communicate to VMOC interface manager, default = 7001
vmoc_slice_autoregister SHould GRAM automatically reigster slices to VMOC? Default = True
vmoc_set_vlan_on_untagged_packet_out Should VMOC set VLAN on untagged outgoing packet, default = False
vmoc_set_vlan_on_untagged_flow_mod Should VMOC set VLAN on untagged outgoing flowmod, default = True
vmoc_accept_clear_all_flows_on_startup Should VMOC clear all flows on startup, default = True
control_host_address The IP address of the controller node's control interface (used to set the /etc/hosts on the compute nodes
mgmt_ns DO NOT set this field, it will be set up installation and is the name of the namespace containing the Quantum management network. This namespace can be used to access the VMs using their management address
disk_image_metadata This provides a dictionary mapping names of images (as registered in Glance) with tags for 'os' (operating system of image), 'version' (version of OS of image) and 'description' (human readable description of image) e.g.
   {
   "ubuntu-2nic":
       {
        "os": "Linux",
        "version": "12.0",
	"description":"Ubuntu image with 2 NICs configured"
        },
   "cirros-2nic-x86_64":
       {
        "os": "Linux",
	"version": "12.0",
        "description":"Cirros image with 2 NICs configured"
        }
control_host The name or IP address of the control node host
compute_hosts The names/addresses of the compute node hosts, e.g.
{
   "boscompute1": "10.10.8.101",
   "boscompute2": "10.10.8.102",
   "boscompute4": "10.10.8.104"
}
host_file_entries The names/addresses of machines to be included in /etc/hosts, e.g.
{
   "boscontrol": "128.89.72.112",
   "boscompute1": "128.89.72.113",
   "boscompute2": "128.89.72.114"
}
stitching_info Information necessary for the Stitching Infrastructure
aggregate_id The URN of this AM
aggregate_url The URL of this AM
edge_points A list of dictionaries for which:
local_switch URN of local switch mandatory
port URN port on local switch leading to remote switch mandatory
remote_switch URN of remote switch mandatory
vlans VLAN tags configured on this port mandatory
traffic_engineering_metric configurable metric for traffic engineering optional, default value = 10 (no units)
capacity Capacity of the link between endpoints optional, default value = 1000000000 (bytes/sec)
interface_mtu MTU of interface optional, default value = 900 (bytes)
maximum_reservable_capacity Maximum reservable capacity between endpoints optional, default value = 1000000000 (bytes/sec)
minimum_reservable_capacity Minimum reservable capacity between endpoints optional, default value = 1000000 (bytes/sec)
granularity Increments for reservations optional, default value = 1000000 (bytes/sec)

Installing Operations Monitoring

Monitoring can be installed after testing the initial installation of GRAM. Most supporting infrastructure was installed by the steps above. Some steps, however, still need to be done by hand and the instructions can be found here: Installing Monitoring on GRAM

Testing GRAM installation

This simple rspec can be used to test the gram installation - attachment:2n-1l.rspec

# Restart gram-am and clearinghose
sudo service gram-am restart
sudo service gram-ch restart

# check omni/gcf config
cd /opt/gcf/src
./omni.py getusercred

# allocate and provision a slice
# I created an rspec in /home/gram called 2n-1l.rspec
./omni.py -V 3 -a http://130.127.39.170:5001 allocate a1 ~/2n-1l.rspec
./omni.py -V 3 -a http://130.127.39.170:5001 provision a1 ~/2n-1l.rspec

# check that the VMs were created
nova list --all-tenants

# check that the VMs booted, using the VM IDs from the above command:
nova console-log <ID>

# look at the 192.x.x.x IP in the console log

# find the namespace for the management place:
sudo ip netns list
     # look at each qrouter-.... for one that has the external (130) and management (192)
sudo ip netns exec qrouter-78c6d3af-8455-4c4a-9fd3-884f92c61125 ifconfig

# using this namespace, ssh into the VM:
sudo ip netns exec qrouter-78c6d3af-8455-4c4a-9fd3-884f92c61125 ssh -i ~/.ssh/id_rsa ssh gramuser@192.168.10.4

# verify that the data plane is working by pinging across VMs on the 10.x.x.x addresses
# The above VM has 10.0.21.4 and the other VM i created has 10.0.21.3
ping 10.10.21.3

Turn off Password Authentication on the Control and Compute Nodes

  1. Generate an rsa ssh key pair on the control node for the gram user or use the one previously generated if it exists: i.e. ~gram/.ssh/id_rsa and ~gram/.ssh/id_rsa.pub

ssh-keygen -t rsa -C "gram@address"

  1. Generate a dsa ssh key pair on the control node for the gram user or use the one previously generated if it exists: i.e. ~gram/.ssh/id_dsa and ~gram/.ssh/id_dsa.pub. Some components could only deal well with dsa keys and

so from the control node access to other resources on the rack should be using the dsa key.

ssh-keygen -t dsa -C "gram@address"

  1. Copy the public key to the compute nodes, i.e. id_dsa.pub
  2. On the control and compute nodes, cat id_rsa.pub >> ~/.ssh/authorized_keys
  3. As sudo, edit /etc/sshd/config and ensure that these entries are set this way:

RSAAuthentication yes
PubkeyAuthentication yes
PasswordAuthentication no

  1. Restart the ssh service, sudo service ssh restart.
  2. Verify by login in using the key ssh -i ~/.ssh/id_dsa gram@address

TODO

*Need to make link to /opt/gcf in compute nodes

*Make sure that your rabbitMQ IP in /etc/quantum/quantum.conf is set to the controller node: (broken sed in OpenVSwitch.py)

*Service token not set in keystone.conf

*add a step in the installation process that checks the status of the services before we start our installation scripts - check dependencies

*fix installation such as the gcf_config has the proper entry for host in aggregate and clearinghouse portions - also need to check where the port number is actually read from for the AM - as it is not the gcf_config

DEBUGGING NOTES

  • If it gets stuck at provisioning, you may have lost connectivity with one or more compute nodes. Check that network-manager is removed
  • If ip addresses are not being assigned and the VMs stall on boot: quantum port-delete 192.168.10.2 (the dhcp agent) and restart quantum-* services

.

.

More Debugging Notes