Changes between Version 2 and Version 3 of GENIEducation/SampleAssignments/OpenFlowLoadBalancerTutorial/ExerciseLayout/Execute


Ignore:
Timestamp:
06/20/13 14:17:14 (11 years ago)
Author:
shuang@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIEducation/SampleAssignments/OpenFlowLoadBalancerTutorial/ExerciseLayout/Execute

    v2 v3  
    4040
    4141= Exercises =
    42  - '''Load Balancing''' -- Files to download: [http://www.gpolab.bbn.com/experiment-support/OpenFlowExampleExperiment/ExoGENI/load-balancer.rb load-balancer.rb] [[BR]]
    43  Load balancing in computer networking is the division of network traffic between two or more network devices or paths, typically for the purpose of achieving higher total throughput than either one path, ensuring a specific maximum latency or minimum bandwidth to some or all flows, or similar purposes. For this exercise, you will design a load-balancing OpenFlow controller capable of collecting external data and using it to divide traffic between dissimilar network paths so as to achieve full bandwidth utilization with minimal queuing delays. [[BR]]
     42 - '''Load Balancing''' -- Files to download: [http://www.gpolab.bbn.com/experiment-support/OpenFlowExampleExperiment/load-balancer.rb load-balancer.rb] [[BR]]
     43 We will implement a Load Balancer OpenFlow Controller on node `Switch` using Trema. Load balancing in computer networking is the division of network traffic between two or more network devices or paths, typically for the purpose of achieving higher total throughput than either one path, ensuring a specific maximum latency or minimum bandwidth to some or all flows, or similar purposes. For this exercise, you will design a load-balancing OpenFlow controller capable of collecting flow status data from OpenFlow switches and using it to divide traffic between dissimilar network paths so as to achieve full bandwidth utilization. [[BR]]
    4444 An interesting property of removing the controller from an OpenFlow device and placing it in an external system of arbitrary computing power and storage capability is that decision-making for network flows based on external state becomes reasonable. Traditional routing and switching devices make flow decisions based largely on local data (or perhaps data from adjacent network devices), but an OpenFlow controller can collect data from servers, network devices, or any other convenient source, and use this data to direct incoming flows. [[BR]]
    45  For the purpose of this exercise, data collection will be limited to the bandwidth and queue occupancy of two emulated network links.
     45 For the purpose of this exercise, data collection will be limited to the flow statistics reported by open vswitches.
    4646
    4747  '''Experimental Setup [[BR]]'''
     
    5151    - '''Inside and Outside Nodes''': These nodes can be any ExoGENI Virtual Nodes.
    5252    - '''Switch:''' The role of the Open vSwitch node may be played either by a software Open vSwitch installation on a ExoGENI Virtual Node, or by the OpenFlow switches available in GENI — consult your instructor.
    53     - '''Traffic Shaping Nodes (Left and Right)''': These are Linux hosts with two network interfaces. You can configure netem on the two traffic shaping nodes to have differing characteristics; the specific values don’t matter, as long as they are reasonable. (No slower than a few hundred kbps, no faster than a few tens of Mbps with 0-100 ms of delay would be a good guideline.) Use several different bandwidth combinations as you test your load balancer.
    54     - '''Aggregator''': This node is a Linux host running Open vSwitch with a switch controller that will cause TCP connections to “follow” the decisions made by your OpenFlow controller on the Switch node.
     53    - '''Traffic Shaping Nodes (Left and Right)''': These are Linux hosts with two network interfaces. You can configure netem on the two traffic shaping nodes to have differing characteristics; the specific values don’t matter, as long as they are reasonable. Use several different delay/loss combinations as you test your load balancer.
     54    - '''Aggregator''': This node is a Linux host running Open vSwitch with a switch controller that will cause TCP connections to “follow” the decisions made by your OpenFlow controller on the Switch node. So leave this node alone, you only need to implement the OpenFlow controller on node `Switch`.
    5555
    5656  '''Linux netem'''[[BR]]
    57   Use the ''tc'' command to enable and configure delay and bandwidth constraints on the outgoing interfaces for traffic traveling from the OpenFlow switch to the Aggregator node. To configure a path with 20 Mbps bandwidth and a 20 ms delay on eth2, you would issue the command:
     57  Use the ''tc'' command to enable and configure delay and lossrate constraints on the outgoing interfaces for traffic traveling from the OpenFlow switch to the Aggregator node. To configure a path with a 20 ms delay and 10% lossrate on eth2, you would issue the command:
    5858{{{
    59 sudo tc qdisc add dev eth2 root handle 1:0 netem delay 20ms
    60 sudo tc qdisc add dev eth2 parent 1:0 tbf rate 20mbit buffer 20000 limit 16000
     59sudo tc qdisc add dev eth2 root handle 1:0 netem delay 20ms loss 10%
    6160}}}
    62   See the ''tc'' and ''tc-tbf'' manual pages for more information on configuring tc token bucket filters as in the second command line. Use the ''tc qdisc change'' command to reconfigure existing links,instead of ''tc qdisc add''. [[BR]]
    63   The outgoing links in the provided lb.rspec are numbered 192.168.4.1 and 192.168.5.1 for left and right, respectively.
     61  Use the ''tc qdisc change'' command to reconfigure existing links,instead of ''tc qdisc add''. [[BR]]
     62  The outgoing links in the provided rspec are numbered 192.168.4.1 and 192.168.5.1 for left and right, respectively.
    6463
    6564  '''Balancing the Load''' [[BR]]
    6665  An example openflow controller that arbitrarily assigns incoming TCP connections to alternating paths can be found at [http://www.gpolab.bbn.com/experiment-support/OpenFlowExampleExperiment/load-balancer.rb load-balancer.rb] (If you have already downloaded it, ignore this). [[BR]]
    67   The goal of your OpenFlow controller will be to achieve full bandwidth utilization with minimal queuing delays of the two links between the OpenFlow switch and the Aggregator host. In order to accomplish this, your OpenFlow switch will intelligently divide TCP flows between the two paths. The intelligence for this decision will come from bandwidth and queuing status reports from the two traffic shaping nodes representing the alternate paths. [[BR]]
    68   When the network is lightly loaded, flows may be directed toward either path, as neither path exhibits queuing delays and both paths are largely unloaded. As network load increases, however, your controller should direct flows toward the least loaded fork in the path, as defined by occupied bandwidth for links that are not yet near capacity and queue depth for links that are near capacity. [[BR]]
    69   Because TCP traffic is bursty and unpredictable, your controller will not be able to perfectly balance the flows between these links. However, as more TCP flows are combined on the links, their combined congestion control behaviors will allow you to utilize the links to near capacity, with queuing delays that are roughly balanced. Your controller need not re-balance flows that have previously been assigned, but you may do so if you like. [[BR]]
     66  The goal of your OpenFlow controller will be to achieve full bandwidth utilization of the two links between the OpenFlow switch and the Aggregator host. In order to accomplish this, your OpenFlow switch will intelligently divide TCP flows between the two paths. The intelligence for this decision will come from the flow statistics reports from the open vswitch on node `Switch`. [[BR]]
    7067  The binding of OpenFlow port numbers to logical topology links can be found in the file /tmp/portmap on the switch node when the provided RSpec boots. It consists of three lines, each containing one logical link name (left, right, and outside) and an integer indicating the port on which the corresponding link is connected. You may use this information in your controller configuration if it is helpful. [[BR]]
    7168  You will find an example OpenFlow controller that arbitrarily assigns incoming TCP connections to alternating paths in the file [http://www.gpolab.bbn.com/experiment-support/OpenFlowExampleExperiment/load-balancer.rb load-balancer.rb]. This simple controller can be used as a starting point for your controller if you desire. Examining its behavior may also prove instructive; you should see that its effectiveness at achieving the assignment goals falls off as the imbalance between balanced link capacities or delays grows.
    7269
    7370  '''Gathering Information''' [[BR]]
    74   The information you will use to inform your OpenFlow controller about the state of the two load-balanced paths will be gathered from the traffic shaping hosts. This information can be parsed out of the file /proc/net/dev, which contains a line for each interface on the machine, as well as the tc -p qdisc show command, which displays the number of packets in the token bucket queue. As TCP connections take some time to converge on a stable bandwidth utilization, you may want to collect these statistics once every few seconds, and smooth the values you receive over the intervening time periods. [[BR]]
    75   You may find the file /tmp/ifmap on the traffic shaping nodes useful. It is created at system startup, and identifies the inside- and outside-facing interfaces with lines such as:
     71  The information you will use to inform your OpenFlow controller about the state of the two load-balanced paths will be gathered via sending OpenFlow `FlowStatsRequest` from the OpenFlow controller to the OpenFlow switch and then collecting OpenFlow `FlowStatsReply` message from the OpenFlow switch to the OpenFlow controller. For more information about !FlowStatsRequest and !FlowStatsReply, please refer to http://rubydoc.info/github/trema/trema/master/Trema/FlowStatsRequest and http://rubydoc.info/github/trema/trema/master/Trema/FlowStatsReply. [[BR]]
     72
     73  '''Question 1: Implement your load-balancer.rb, run it on ''switch'', and display the total number of bytes sent to the left and right paths when a new TCP flow comes and forward the new TCP flow to the path with less number of bytes transferred.'''[[BR]]
     74  '''A sample output is as follows: [[BR]]'''
    7675{{{
    77 inside eth2
    78 outside eth1
    79 }}}
    80   The first word on the line is the “direction” of the interface — toward the inside or outside of the network diagram. The second is the interface name as found in ''/proc/net/dev''. [[BR]]
    81   You are free to communicate these network statistics from the traffic shaping nodes to your OpenFlow controller in any fashion you like. You may want to use a web service, or transfer the data via an external daemon and query a statistics file from the controller. Keep in mind that flow creation decisions need to be made rather quickly, to prevent retransmissions on the connecting host. [[BR]]
    82 
    83   '''Questions'''[[BR]]
    84   To help user to fetch the information about the amount of traffic as well as the queue depth (measured in number of packets) on both left and right node, we provide a script that the user can download and run on both left and right node [[BR]]
    85   You can download the script from [http://www.gpolab.bbn.com/experiment-support/OpenFlowExampleExperiment/netinfo.py netinfo.py] (If you have already downloaded it, ignore this). Then do the following to start monitoring network usage: [[BR]]
    86  -
    87   - 1. install Twisted Web package for Python on both left and right node:
    88 {{{
    89 sudo yum install python-twisted-web
    90 }}}
    91   - 2. upload netinfo.py onto left and right node, then change the listening IP address in netinfo.py to the public IP address of left and right node respectively. i.e., replacing the following 0.0.0.0 in your netinfo.py file to the public IP address of the left/right node.
    92 {{{
    93 reactor.listenTCP(8000, factory, interface = '0.0.0.0')
    94 }}}
    95   - 3. apply qdisc on interface eth1 of both left and right node by executing (you may further change the parameters by using ''tc qdisc change''):
    96 {{{
    97 sudo /sbin/tc qdisc add dev eth1 root handle 1:0 netem delay 20ms
    98 sudo /sbin/tc qdisc add dev eth1 parent 1:0 tbf rate 20mbit buffer 20000 limit 16000
    99 }}}
    100   - 4. run the script by:
    101 {{{
    102 python netinfo.py
    103 }}}
    104   - 5. verify it is working by opening a web browser and typing the following URL ('''replacing 155.98.36.69 with your left or right node's public IP address'''):
    105 {{{
    106 http://155.98.36.69:8000/qinfo/0
    107 }}}
    108   For more information about netinfo.py, please look at the comments in the file. [[BR]]
    109 
    110   To help you get started, the following is the ruby class that helps collecting the monitoring results:
    111 {{{
    112 class DataCollector
    113     @@Weight = 0.2
    114 
    115     attr_reader :port
    116 
    117     def initialize(host, port)
    118         @host = host
    119         @uri = 'http://' + host + ':8000/qinfo/'
    120         @last = 0
    121         @ewmabytes = 0
    122         @ewmapkts = 0
    123         @lock = Mutex.new
    124         @port = port
    125     end
    126 
    127     def run
    128         starttime = Time.now.to_f
    129         while true
    130             data = Net::HTTP.get(URI(@uri + @last.to_s))
    131             data.each do |line|
    132                 ts, bytes, qlen = line.chomp.split(' ').map { |x| x.to_i }
    133                 @lock.synchronize do
    134                     if ts <= @last
    135                         next
    136                     elsif @last == 0
    137                         @ewmabytes = bytes
    138                         @ewmapkts = qlen
    139                     else
    140                         # Just assume we haven't missed too many entries
    141                         @ewmabytes = bytes * @@Weight + @ewmabytes * (1 - @@Weight)
    142                         @ewmapkts = qlen * @@Weight + @ewmapkts * (1 - @@Weight)
    143                     end
    144                     @last = ts
    145                 end
    146             end
    147             sleep 5
    148         end
    149     end
    150 
    151     def averages
    152         a = nil
    153         @lock.synchronize do
    154             a = [@ewmabytes, @ewmapkts]
    155         end
    156         return a
    157     end
    158 end
    159 }}}
    160   In the above code, function ''averages'' will return the weighted average number of bytes seen in the corresponding node, as well as the weighted average queue depth in terms of number of packets seen in the corresponding node. [[BR]]
    161   Here is some example code that makes use of this class:
    162 {{{
    163         @collectors = [DataCollector.new($leftip, @leftport),
    164                        DataCollector.new($rightip, @rightport)]
    165         @collectors.each do |collector|
    166             Thread.new(collector) do |c|
    167                 c.run
    168             end
    169         end
    170 
    171         left_monitor_value = @collectors[0].averages
    172         right_monitor_value = @collectors[1].averages
    173         #left_monitor_value[0] shows the average number of bytes seen on the left node
    174         #left_monitor_value[1] shows the average number of queued packets on the left node
    175 }}}
    176  
    177   '''Question: Implement your load-balancer.rb, run it on ''switch'', and display the number of bytes and queue length on both left and right node when a new TCP flow comes and path decision is made'''[[BR]]
    178   '''A sample output should be as follows: [[BR]]'''
    179 {{{
    180 left:  5302.5338056252 bytes, Q Depth: 20.5240502532964 packets
    181 right: 14193.5212452065 bytes, Q Depth: 27.3912943665466 packets
    182 so this new flow goes left. Total number of flows on left: 1
     76[stats_reply]-------------------------------------------
     77left path: packets 3890, bytes 5879268
     78right path: packets 7831, bytes 11852922
     79since there are more bytes going to the right path, let's go *LEFT* for this flow
    18380}}}
    18481  You can use iperf to generate TCP flows from ''outside'' node to ''inside'' node:
     
    18683  - On ''inside'', run:
    18784{{{
    188 /usr/local/etc/emulab/emulab-iperf -s
     85iperf -s
    18986}}}
    190   - On ''outside'' run the following multiple times, with about 6 seconds interval between each run:
     87  - On ''outside'' run the following command multiple times, with several seconds interval between each run:
    19188{{{
    192 /usr/local/etc/emulab/emulab-iperf -c 10.10.10.2 -t 100 &
     89iperf -c 10.10.10.2 -t 600 &
    19390}}}
    194     This will give the ''netinfo.py'' enough time to collect network usage statistics from both left and right node so that the load-balancer can make the right decision.
     91  The above command starts a new TCP flow from outside to inside, that runs for 600 seconds.
    19592
    196   '''If you really do not know where to start, you can find a semi-complete load-balancer.rb [http://www.gpolab.bbn.com/experiment-support/OpenFlowExampleExperiment/load-balancer-hint.rb HERE], you only need to complete the ''next_path'' function that prints out the statistics of each path and returns the path choice''' [[BR]]
    197   '''Hints: Want to get the complete load-balancer.rb? ask your instructor or visit here (you need a password to get it), or send an email (the solution code may be full of bugs, feel free to tweak it and report bugs/ask questions)''' [[BR]]
     93  '''Hints:'''
    19894  -
    199     - Remember that the TCP control loop is rathers low — on the order of several round trip times for the TCP connection. This means your load balancing control loop should be slow.
    200     - You may wish to review control theory, as well as TCP congestion control and avoidance principles.
    201     - Without rebalancing, “correcting” a severe imbalance may be difficult or impossible. For testing purposes, add flows to the path slowly and wait for things to stabilize.
    202     - Some thoughts on reducing the flow count when load balancing via Open- Flow can be found in [http://dl.acm.org/citation.cfm?id=1972438 Wang et al.]  You are not required to implement these techniques, but may find them helpful.
    20395    - Remember that the default OpenFlow policy for your switch or Open vSwitch instance will likely be to send any packets that do not match a flow spec to the controller, so you will have to handle or discard these packets.
    204     - You will want your load balancer to communicate with the traffic shaping nodes via their administrative IP address, available in the slice manifest.
    205     - If packet processing on the OpenFlow controller blocks for communication with the traffic shaping nodes, TCP performance may suffer. Use require ’threads’, Thread, and Mutex to fetch load information in a separate thread.
    206     - The OpenFlow debugging hints from Section 3.1 remain relevant for this exercise.
    20796
    208  - '''Simplified Load Balancer: ''' [[BR]]
    209  The above question requires the user to implement a web server on both left node and right node to report the querying results about token bucket buffer statistics. At the same time, the openflow controller that the experimenter implemented should pull the web page, parse the content to get those statistics, which seems to be too complicated. [[BR]]
    210 
    211  An alternative way to accomplish this is by querying Flow statistics directly from the OpenFlow switch. [[BR]]
    212  
    21397 '''Process: ''' Upon the arrival of a new TCP flow, the OpenFlow controller should send out a `FlowStatsRequest` message to the OpenFlow switch. The OpenFlow switch will reply with statistics information about all flows in its flow table.
    21498 This flow statistics message will be fetched by the `stats_reply` function in the openflow controller implemented by the user on node `switch`. Based on the statistics, experimenters can apply their own policy on which path to choose in different situations.
     
    230114)
    231115}}}
    232  For more information about !FlowStatsRequest and !FlowStatsReply, please refer to http://rubydoc.info/github/trema/trema/master/Trema/FlowStatsRequest and http://rubydoc.info/github/trema/trema/master/Trema/FlowStatsReply. [[BR]]
    233  The difference between this Load Balancer and the Load Balancer introduced in the previous section is, this Load Balancer only reports the cumulated statistics of each flow over-time while the previous Load Balancer fetches the real-time network traffic information from both paths.
    234116 
    235  We have already implemented a sample Load Balancer that decides path based on the accumulated number of bytes sent through left and right path (such that the new flow will go to the one path with less number of bytes sent). [[BR]]
    236  Experimenter can download the sample Load Balancer [http://www.gpolab.bbn.com/experiment-support/OpenFlowExampleExperiment/load-balancer-simple.rb HERE].
     117 Note: there is some delay in fetching the flow statistics. The OpenFlow controller may not be able to receive any !FlowStatsReply message before two flows are established.
    237118 
    238  '''Question: Try modify the downloaded load balancer so that it decides path based on the average per-flow throughput observed on each path'''
     119 '''If you really do not know where to start, you can find the answer [http://www.gpolab.bbn.com/experiment-support/OpenFlowExampleExperiment/ExoGENI/load-balancer.rb HERE]''' [[BR]]
     120 
     121 '''Question 2: Modify your load balancer so that it decides path based on the average per-flow throughput observed on each path'''
    239122 
    240123 '''Note: ''' since Trema does not yet support multi-thread mode, this simple implementation runs in one thread. As a result, users will experience some delay in fetching the flow statistics (i.e., `stats_reply` will not be called right after a !FlowStatsRequest message has been sent in `packet_in` handler).