wiki:GENIRacksHome/OpenGENIRacks/DebuggingNotes

Version 1 (modified by Jeanne Ohren, 5 years ago) (diff)

--

Useful Notes for Debugging Gram with a Grizzly OpenStack Installation

Getting the software

git clone git@superior.bbn.com:gram.git

To get a git account, one needs to provide a public key to Jeanne Ohren as the repository lives in the Minnesota office.

Debugging Gram

  1. Insert import pdb; pdb.set_trace() before a line of interest in the gram code.
  2. Need to set the PYTHONPATH variable - copy the ones in /etc/init/gram-am.conf
  3. And then do python gram-am.py and debug as regular python code.

General Notes for Installation of the OS and Switches

  1. If this is a machine that has been running gram, it is probably best to save off the /etc/gram/config.json file, the /etc/hosts file, and the /etc/udev/rules.d/70-persistent-net.rules from the control and compute nodes. It will give some reference for how the machine was configured.
  2. Remember that when checking for connectivity for ports to check the way that the ports on the switches are configured. If some of the switches are trunked, then packets are expected to be tagged with VLAN numbers and trying to do generic pings will not work because the ports are configured to be trunked.
    • From Jeanne, the ping usually didn't work for the data interfaces because the interface is tied to the OVS bridge. If the interface is removed from the OVS bridge temporarily on all nodes, then one can do a layer 3 ping. For example: sudo ovs-vsctl del-port br-eth1 eth1. One just has to remember to add it back after testing with pings (sudo ovs-vsctl add-port br-eth1 eth1)
  3. Also, it is best when installing a fresh Ubuntu 12.04 for fresh installation to manually configure the external interface since allowing automatic configuration puts an errant 127... localhost entry in the /etc/hosts file. So, if you do not configure manually, you should check the /etc/hosts file to delete that extra 127 entry.

Getting the Smiley Faces on OpenStack Service

  1. sudo nova-manage service list

Looking at Logs

Places to look at logs
All of the gram processes should have logs in /var/log/upstart/
In particular, sudo tail -f /var/log/upstart/gram-am.log is usually useful to have running.

To look for the console logs of VM:

  1. source /etc/novarc
  2. nova list --all-tenants
  3. nova show "ID" from step 2 and this will give you the VM server where the VM was provisioned or at this point do a nova console-log "ID" to see the console log.
  4. On the VM server, the instance can be found in directory /var/lib/nova/instances/"ID" and there should be a console.log to examine.

Configuring Control Node Access via Password

  1. Edit /etc/ssh/sshd_config and comment out
    # Change to no to disable tunnelled clear text passwords[[BR]]
    #PasswordAuthentication no
    
  2. sudo service restart ssh

Configuring Control Node Access via SSH Key

  1. Edit /etc/ssh/sshd_config and uncomment
    # Change to no to disable tunnelled clear text passwords
    PasswordAuthentication no
    
  2. sudo service restart ssh

Issues with Certificates

For the first user that a certificate is generated: in directory /opt/gcf/src, typing ./gen-certs.py --exp -u user will generate not just certificates for this user, but it will also generate certificates for the AM and the Clearing House.

To generate certificates for a specific user in directory /opt/gcf/src, type ./gen-certs.py --exp -u sdabidee --notAll.
This generates new certs just for this user.

Restarting the gram-ch is usually necessary after certs are created: sudo service gram-ch restart

And if certificates need to be displayed in readable form: openssl x509 -text -in ~/.gcf/sdabidee-cert.pem

Trusted certs can be found on the control node in /etc/gram/certs/trusted_roots.

Namespace Tips

  1. List namespaces - sudo ip netns list
  2. To figure out where the management namespace is, you can check /etc/gram/config.json for the value of mgmt_ns or you can go through the list given by 1 and type
    • sudo ip netns exec qrouter... ifconfig (the one listing a 192.168.10 interface is the management namespace)
  3. To login to a VM, it is usually possible to login via the management namespace, 192.168.10. This is usually done to verify that the user's keys were uploaded.
    • sudo ip netns exec qdhcp-9f8ae72c-b1f9-46a8-9263-234451e8324e ssh -i fakekey sdabidee@192.168.10.5
  4. To check for port forwarding in the management space.
    • Check /etc/gram/config.json for the value in mgmt_ns.
    • sudo ip netns exec qrouter_.. iptable -L -t nat --line-numbers (port forwarding at the top)

Quantum Issues

  1. DHCP - After typing quantum port-list, if you do not see 192.168.10.2 or if dhcp is broken, quantum port-delete id for 192.168.10.2 and do sudo service quantum-dhcp restart.
    • Refer to Jeanne's tips added as an attachment below
  2. Sometimes the namespace for the management router is not instantiated. The namespace does not seem to change between machine bootings, but does not always appear.
  3. Sometime restarting quantum-l3-agent restart will get it to appear.
    • If not the more complete cleaning up of the quantum installation and doing a reinstall needs to be done. If doing this process, the external router needs to get deleted and the public network and the management network. It really means cleaning up everything quantum created. The easiest way to do it is to go into horizon and delete all networks and routers. Usually, one can type sudo /etc/gram/install_gram.sh <control/compute> grizzly/folsom and do the bottom part of the /tmp/install/install_quantum.sh. Everything in the file after source /etc/novarc should be redone. Warning that doing the above line redoes the certificates, which means you need to recreate all user certs again. When redoing certificates for users, make sure to use --notAll in the certificate regeneration line. And one needs to restart the gram-ch process. It might be safer to restart all of the gram services. Refer to Cleaning Up to make sure all ip table and snapshot information has been cleared.
  4. OpenStack Documentation for networking - http://docs.openstack.org/grizzly/openstack-network/admin/content//under_the_hood_openvswitch.html

Cleaning Up GRAM and OpenStack Tips

Use the OpenStack Horizon Interface

  1. Point a browser to the control nodes IP address/horizon. For example: http://128.89.91.170/horizon
  2. You will login using the account of admin and the admin password listed in the /etc/novarc file.

From here, it is very easy to delete or terminate all kinds of resources. It is sometimes easier to clean things up here than via nova command line steps.

  1. However, you can also login as the tenant to get a better topology view for that tenant. (Minimal Use)

Cleaning up

The best way to clean up is to use omni as this deletes resources in GRAM, OpenStack and the Operating System. If you use Horizon, then you must:

  1. Clean up the latest GRAM snapshot /etc/gram/snapshots/gram/
  2. Remove port forwarding rules: /etx/gram/GRAM-next-subnet.txt
  3. Ip table rules: First list the rules in the nat table
    • sudo ip netns exec qrouter-.... iptables -t nat --line-numbers

Then delete the rules:

  • sudo ip netns exec qrouter-... iptables -t nat -D POSTROUTING <line number>

Troubleshooting a VM

Sometimes we have to debug why a VM does not boot or is not accessible.

  1. Is it getting all of its IP addresses assigned by DHCP?
    • check the console log and you should see lines like below, otherwise you get something about things timing out
      ci-info: lo    : 1 127.0.0.1       255.0.0.0       .
      ci-info: eth1  : 0 .               .               fa:16:3e:c5:83:78
      ci-info: eth0  : 1 192.168.10.3    255.255.255.0   fa:16:3e:e3:ac:80
      ci-info: route-0: 0.0.0.0         192.168.10.1    0.0.0.0         eth0   UG
      ci-info: route-1: 192.168.10.0    0.0.0.0         255.255.255.0   eth0   U
      
    • if not, troubleshoot the DHCP path (see Quantum issues)
  2. Is it getting its metadata?
  3. Are the user keys getting configured correctly?
    • the best way to check this is to login using the experimenter ssh key - double check that you are using the correct key and logging in as the right user
    • the log should also have this at the end on ubuntu boxes. Key loading is not so clear on the fedora OS.
      -----BEGIN SSH HOST KEY KEYS-----
      ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBKNlnMoRjikutVNoaxozKfdmFj8hyewdfSIiLqmqWXehj7jFKBK8E/Ha5UWSoueg2xCjksPzdpDei4qJRbzCuy0= root@exp1-host1
      ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUz4CG6+e3/A9PjtBD0jfpGGzuDEWOZ4SZ6a3LRDcnZ1bgOxzHgrR/IZ5Xt1cg9qqgnN1XVDe8Ps8qmjpHPHthLsafVChPmBXFUhmNLW1s1lFNJj5+tybUl9782mdciYGM6CTb0ZPvK6i5ncLxEF6TQWWlE1X12qZexHii88vhkEaJC9ehlItqLXBe2vYYSovRwI4W8u6aEM4NMe3wHNs6qWhhUjIyBS3zS45Kbs+lD6fU5AnMxrhSAOPGBqzEU40ppt63RxjLuK9TwW2kXusz52+KUD+YF7Omc7FOr6n84Ol8aHnFm5OwDi1qPRS1r3JcDchHd8tBb1XtrEO+NxKZ root@exp1-host1
      -----END SSH HOST KEY KEYS-----
      

Attachments (1)

Download all attachments as: .zip