wiki:GENIEducation/SampleAssignments/OpenFlowLoadBalancerAssignment/ForInstructors

Version 3 (modified by shuang@bbn.com, 11 years ago) (diff)

--

Materials and Guidance for leading this exercise: <OpenFlow Load Balancer ASSIGNMENT>

Exercise materials

Anything that the instructor might need, e.g.:

Guidance for leading the exercise

  • useful commands: You will find it helpful to know what is going on inside your OpenFlow controller and its associated switch when implementing these exercises.

This section contains a few tips that may help you out if you are using the Open vSwitch implementation provided with this tutorial. If you are using a hardware OpenFlow switch, your instructor can help you find equivalent commands.
The Open vSwitch installation provided by the RSpec included in this tutorial is located in /opt/openvswitch-1.6.1-F15. You will find Open vSwitch commands in /opt/openvswitch-1.6.1-F15/bin and /opt/openvswitch-1.6.1-F15/sbin. Some of these commands may be helpful to you. If you add these paths to your shell’s $PATH, you will be able to access their manual pages with man. Note that $PATH will not affect sudo, so you will still have to provide the absolute path to sudo; the absolute path is omitted from the following examples for clarity and formatting.

  • 2.1 ovs-vsctl
    Open vSwitch switches are primarily configured using the ovs-vsctl command. For exploring, you may find the ovs-vsctl show command useful, as it dumps the status of all virtual switches on the local Open vSwitch instance. Once you have some information on the local switch configurations, ovs-vsctl provides a broad range of capabilities that you will likely find useful for expanding your network setup to more complex configurations for testing and verification. In particular, the subcommands add-br, add-port, and set-controller may be of interest.
  • 2.2 ovs-ofctl
    The switch host configured by the given rspec listens for incoming OpenFlow connections on localhost port 6634. You can use this to query the switch state using the ovs-ofctl command. In particular, you may find the dump-tables and dump-flows subcommands useful. For example, sudo ovs-ofctl dump-flows tcp:127.0.0.1:6634 will output lines that look like this:
    cookie=0x4, duration=6112.717s, table=0, n packets=1, n bytes=74, idle age=78,priority=5,tcp,
    nw src=10.10.10.0/24 actions=CONTROLLER:65535
    
    This indicates that any TCP segment with source IP in the 10.10.10.0/24 subnet should be sent to the OpenFlow controller for processing, that it has been 78 seconds since such a segment was last seen, that one such segment has been seen so far, and the total number of bytes in packets matching this rule is 74. The other fields are perhaps interesting, but you will probably not need them for debugging. (Unless, of course, you choose to use multiple tables — an exercise in OpenFlow 1.1 functionality left to the reader.)
  • 2.3 Unix utilities
    You will want to use a variety of Unix utilities, in addition to the tools listed in ExerciseLayout, to test your controllers. The standard ping and /usr/sbin/arping tools are useful for debugging connectivity (but make sure your controller passes ICMP ECHO REQUEST and REPLY packets and ARP traffic, respectively!), and the command netstat -an will show all active network connections on a Unix host; the TCP connections of interest in this exercise will be at the top of the listing. The format of netstat output is out of the scope of this tutorial, but information is available online and in the manual pages.

Solutions

  • 3.3 Load Balancing
    Load balancing in computer networking is the division of network traffic between two or more network devices or paths, typically for the purpose of achieving higher total throughput than either one path, ensuring a specific maximum latency or minimum bandwidth to some or all flows, or similar purposes. For this exercise, you will design a load-balancing OpenFlow controller capable of collecting external data and using it to divide traffic between dissimilar network paths so as to achieve full bandwidth utilization with minimal queuing delays.
    An interesting property of removing the controller from an OpenFlow device and placing it in an external system of arbitrary computing power and storage capability is that decision-making for network flows based on external state becomes reasonable. Traditional routing and switching devices make flow decisions based largely on local data (or perhaps data from adjacent network devices), but an OpenFlow controller can collect data from servers, network devices, or any other convenient source, and use this data to direct incoming flows.
    For the purpose of this exercise, data collection will be limited to the bandwidth and queue occupancy of two emulated network links.

Linux netem
Use the tc command to enable and configure delay and bandwidth constraints on the outgoing interfaces for traffic traveling from the OpenFlow switch to the Aggregator node. To configure a path with 20 Mbps bandwidth and a 20 ms delay on eth2, you would issue the command:

sudo tc qdisc add dev eth2 root handle 1:0 netem delay 20ms
sudo tc qdisc add dev eth2 parent 1:0 tbf rate 20mbit buffer 20000 limit 16000

See the tc and tc-tbf manual pages for more information on configuring tc token bucket filters as in the second command line. Use the tc qdisc change command to reconfigure existing links,instead of tc qdisc add.
The outgoing links in the provided lb.rspec are numbered 192.168.4.1 and 192.168.5.1 for left and right, respectively.

Balancing the Load
An example openflow controller that arbitrarily assigns incoming TCP connections to alternating paths can be found at load-balancer.rb.
The goal of your OpenFlow controller will be to achieve full bandwidth utilization with minimal queuing delays of the two links between the OpenFlow switch and the Aggregator host. In order to accomplish this, your OpenFlow switch will intelligently divide TCP flows between the two paths. The intelligence for this decision will come from bandwidth and queuing status reports from the two traffic shaping nodes representing the alternate paths.
When the network is lightly loaded, flows may be directed toward either path, as neither path exhibits queuing delays and both paths are largely unloaded. As network load increases, however, your controller should direct flows toward the least loaded fork in the path, as defined by occupied bandwidth for links that are not yet near capacity and queue depth for links that are near capacity.
Because TCP traffic is bursty and unpredictable, your controller will not be able to perfectly balance the flows between these links. However, as more TCP flows are combined on the links, their combined congestion control behaviors will allow you to utilize the links to near capacity, with queuing delays that are roughly balanced. Your controller need not re-balance flows that have previously been assigned, but you may do so if you like.
The binding of OpenFlow port numbers to logical topology links can be found in the file /tmp/portmap on the switch node when the provided RSpec boots. It consists of three lines, each containing one logical link name (left, right, and outside) and an integer indicating the port on which the corresponding link is connected. You may use this information in your controller configuration if it is helpful.
You will find an example OpenFlow controller that arbitrarily assigns incoming TCP connections to alternating paths in the file load-balancer.rb. This simple controller can be used as a starting point for your controller if you desire. Examining its behavior may also prove instructive; you should see that its effectiveness at achieving the assignment goals falls off as the imbalance between balanced link capacities or delays grows.

Gathering Information
The information you will use to inform your OpenFlow controller about the state of the two load-balanced paths will be gathered from the traffic shaping hosts. This information can be parsed out of the file /proc/net/dev, which contains a line for each interface on the machine, as well as the tc -p qdisc show command, which displays the number of packets in the token bucket queue. As TCP connections take some time to converge on a stable bandwidth utilization, you may want to collect these statistics once every few seconds, and smooth the values you receive over the intervening time periods.
You may find the file /tmp/ifmap on the traffic shaping nodes useful. It is created at system startup, and identifies the inside- and outside-facing interfaces with lines such as:

inside eth2
outside eth1

The first word on the line is the “direction” of the interface — toward the inside or outside of the network diagram. The second is the interface name as found in /proc/net/dev.
You are free to communicate these network statistics from the traffic shaping nodes to your OpenFlow controller in any fashion you like. You may want to use a web service, or transfer the data via an external daemon and query a statistics file from the controller. Keep in mind that flow creation decisions need to be made rather quickly, to prevent retransmissions on the connecting host.

Questions
To help user to fetch the information about the amount of traffic as well as the queue depth (measured in number of packets) on both left and right node, we provide a script that the user can download and run on both left and right node
You can download the script from netinfo.py (If you have already downloaded it, ignore this). Then do the following to start monitoring network usage:

    • 1. install Twisted Web package for Python on both left and right node:
      sudo yum install python-twisted-web
      
    • 2. upload netinfo.py onto left and right node, then change the listening IP address in netinfo.py to the public IP address of left and right node respectively. i.e., replacing the following 0.0.0.0 in your netinfo.py file to the public IP address of the left/right node.
      reactor.listenTCP(8000, factory, interface = '0.0.0.0')
      
    • 3. apply qdisc on interface eth1 of both left and right node by executing (you may further change the parameters by using tc qdisc change):
      sudo /sbin/tc qdisc add dev eth1 root handle 1:0 netem delay 20ms
      sudo /sbin/tc qdisc add dev eth1 parent 1:0 tbf rate 20mbit buffer 20000 limit 16000
      
    • 4. run the script by:
      python netinfo.py
      
    • 5. verify it is working by opening a web browser and typing the following URL (replacing 155.98.36.69 with your left or right node's public IP address):
      http://155.98.36.69:8000/qinfo/0
      
      For more information about netinfo.py, please look at the comments in the file.
      Question: Implement your load-balancer.rb, run it on switch, and display the number of bytes and queue length on both left and right node when a decision is made
      Solution: The source code for load-balancer.rb can be found at lb-solution.rb
      Steps to verify that it works:
    • 1. Download the code lb-solution.rb, upload to node switch, open the source code, change $leftip and $rightip to the corresponding public IP address of the left and right node
    • 2. Have both left and right node configured with (you can change the parameters but it has to be on dev eth1):
      sudo /sbin/tc qdisc add dev eth1 root handle 1:0 netem delay 20ms
      sudo /sbin/tc qdisc add dev eth1 parent 1:0 tbf rate 20mbit buffer 20000 limit 16000
      
    • 3. On both left and right node, running netinfo.py. On node switch, run sudo /opt/trema-trema-8e97343/trema run lb-solution.rb to run the OpenFlow controller.
    • 4. Run multiple TCP flows from outside node to inside node: on inside, run:
      /usr/local/etc/emulab/emulab-iperf -s
      
      On outside run the following multiple times, with about 6 seconds interval between each run:
      /usr/local/etc/emulab/emulab-iperf -c 10.10.10.2 -t 100 &
      
      This will give the netinfo.py enough time to collect network usage statistics from both left and right node so that the load-balancer can make the right decision.
    • 5. Pay attention to the output from the load balancer every time a new TCP flow is generated from outside. Sample output should be similar to the following:
      left:  5302.5338056252 bytes, Q Depth: 20.5240502532964 packets
      right: 14193.5212452065 bytes, Q Depth: 27.3912943665466 packets
      so this new flow goes left. Total number of flows on left: 1
      left:  30864.5415104438 bytes, Q Depth: 5.35704134958587 packets
      right: 499.390132742089 bytes, Q Depth: 0.963745492987305 packets
      so this new flow goes right. Total number of flows on right: 1
      left:  34797.5504056022 bytes, Q Depth: 4.51942007138619 packets
      right: 119634.69213493 bytes, Q Depth: 9.33169180628916 packets
      so this new flow goes left. Total number of flows on left: 2
      
      Generally speaking, the decision is made as follows, when the available queue length seen on one node is drastically more than the other, then the new flow goes to the that node/path, otherwise the new flow goes to which-ever node/path with less number of bytes seen.
      The experimenters can further try to change the tc token bucket filter parameters to test their load balancer.
  • Simplified Load Balancer:
    The above question requires the user to implement a web server on both left node and right node to report the querying results about token bucket buffer statistics. At the same time, the openflow controller that the experimenter implemented should pull the web page, parse the content to get those statistics, which seems to be too complicated.

An alternative way to accomplish this is by querying Flow statistics directly from the OpenFlow switch.

Process: Upon the arrival of a new TCP flow, the OpenFlow controller should send out a FlowStatsRequest message to the OpenFlow switch. The OpenFlow switch will reply with statistics information about all flows in its flow table. This flow statistics message will be fetched by the stats_reply function in the openflow controller implemented by the user on node switch. Based on the statistics, experimenters can apply their own policy on which path to choose in different situations. The FlowStatsReply message is in the following format:

FlowStatsReply.new(
  :length => 96,
  :table_id => 0,
  :match => Match.new
  :duration_sec => 10,
  :duration_nsec => 106000000,
  :priority => 0,
  :idle_timeout => 0,
  :hard_timeout => 0,
  :cookie => 0xabcd,
  :packet_count => 1,
  :byte_count => 1,
  :actions => [ ActionOutput.new ]
)

For more information about FlowStatsRequest and FlowStatsReply, please refer to http://rubydoc.info/github/trema/trema/master/Trema/FlowStatsRequest and http://rubydoc.info/github/trema/trema/master/Trema/FlowStatsReply.
The difference between this Load Balancer and the Load Balancer introduced in the previous section is, this Load Balancer only reports the cumulated statistics of each flow over-time while the previous Load Balancer fetches the real-time network traffic information from both paths.

We have already implemented a sample Load Balancer that decides path based on the accumulated number of bytes sent through left and right path (such that the new flow will go to the one path with less number of bytes sent).
Experimenter can download the sample Load Balancer HERE.

Question: Try modify the downloaded load balancer so that it decides path based on the average per-flow throughput observed on each path Note: since Trema does not yet support multi-thread mode, this simple implementation runs in one thread. As a result, users will experience some delay in fetching the flow statistics (i.e., stats_reply will not be called right after a FlowStatsRequest message has been sent in packet_in handler). Solution: The source code for load-balancer-simple-tp.rb can be found at load-balancer-simple-tp.rb