Changes between Version 14 and Version 15 of NikySandbox/WebExample


Ignore:
Timestamp:
07/06/12 07:01:11 (12 years ago)
Author:
nriga@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • NikySandbox/WebExample

    v14 v15  
    4848    sudo /sbin/service httpd start
    4949    }}}
     50
    5051=== Command Line Web Transfers  ===
    5152
     
    5455 * You can download the web page using  this command
    5556   {{{
    56    [inki@Client1 ~]$ wget http://pc484.emulab.net
    57 --2012-07-06 04:32:59--  http://pc484.emulab.net/
    58 Resolving pc484.emulab.net... 155.98.38.84
    59 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
     57   [inki@Client1 ~]$ wget http://server
     58--2012-07-06 04:59:09--  http://server/
     59Resolving server... 10.10.1.1
     60Connecting to server|10.10.1.1|:80... connected.
    6061HTTP request sent, awaiting response... 200 OK
    6162Length: 548 [text/html]
    62 Saving to: “index.html
     63Saving to: “index.html.1
    6364
    6465100%[======================================>] 548         --.-K/s   in 0s     
    6566
    66 2012-07-06 04:32:59 (127 MB/s) - “index.html” saved [548/548]
     672012-07-06 04:59:09 (120 MB/s) - “index.html.1” saved [548/548]
    6768   }}}
     69   '''Note:''' In the above command we used `http://server` instead of `http://pc484.emulab.net` so that we can contact the web server over the private connection we have created, instead of the server's public interface. The private connections are the ones that are represented with lines between hosts in Flack.
    6870 
    6971 * The above command only downloads the `index.html` file from the webserver. As we are going to see later a web page might consist of multiple web pages or other objects such as pictures, videos etc. In order to force wget to download all dependencies of a page use the following options :
    7072   {{{
    71    [inki@Client1 ~]$ wget -m -p http://pc484.emulab.net
    72 --2012-07-06 04:33:49--  http://pc484.emulab.net/
    73 Resolving pc484.emulab.net... 155.98.38.84
    74 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    75 HTTP request sent, awaiting response... 200 OK
    76 Length: 548 [text/html]
    77 Saving to: “pc484.emulab.net/index.html”
    78 
    79 100%[======================================>] 548         --.-K/s   in 0s     
    80 
    81 2012-07-06 04:33:49 (118 MB/s) - “pc484.emulab.net/index.html” saved [548/548]
    82 
    83 Loading robots.txt; please ignore errors.
    84 --2012-07-06 04:33:49--  http://pc484.emulab.net/robots.txt
    85 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    86 HTTP request sent, awaiting response... 404 Not Found
    87 2012-07-06 04:33:49 ERROR 404: Not Found.
    88 
    89 --2012-07-06 04:33:49--  http://pc484.emulab.net/top.html
    90 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    91 HTTP request sent, awaiting response... 200 OK
    92 Length: 917 [text/html]
    93 Saving to: “pc484.emulab.net/top.html”
    94 
    95 100%[======================================>] 917         --.-K/s   in 0s     
    96 
    97 2012-07-06 04:33:49 (172 MB/s) - “pc484.emulab.net/top.html” saved [917/917]
    98 
    99 --2012-07-06 04:33:49--  http://pc484.emulab.net/home.html
    100 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    101 HTTP request sent, awaiting response... 200 OK
    102 Length: 822 [text/html]
    103 Saving to: “pc484.emulab.net/home.html”
    104 
    105 100%[======================================>] 822         --.-K/s   in 0s     
    106 
    107 2012-07-06 04:33:49 (215 MB/s) - “pc484.emulab.net/home.html” saved [822/822]
    108 
    109 --2012-07-06 04:33:49--  http://pc484.emulab.net/links.html
    110 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    111 HTTP request sent, awaiting response... 200 OK
    112 Length: 958 [text/html]
    113 Saving to: “pc484.emulab.net/links.html”
    114 
    115 100%[======================================>] 958         --.-K/s   in 0s     
    116 
    117 2012-07-06 04:33:49 (267 MB/s) - “pc484.emulab.net/links.html” saved [958/958]
    118 
    119 --2012-07-06 04:33:49--  http://pc484.emulab.net/media/GENILogo.png
    120 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    121 HTTP request sent, awaiting response... 200 OK
    122 Length: 22678 (22K) [image/png]
    123 Saving to: “pc484.emulab.net/media/GENILogo.png”
    124 
    125 100%[======================================>] 22,678      --.-K/s   in 0.001s 
    126 
    127 2012-07-06 04:33:49 (42.8 MB/s) - “pc484.emulab.net/media/GENILogo.png” saved [22678/22678]
    128 
    129 --2012-07-06 04:33:49--  http://pc484.emulab.net/media/topgeni.png
    130 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    131 HTTP request sent, awaiting response... 200 OK
    132 Length: 116224 (114K) [image/png]
    133 Saving to: “pc484.emulab.net/media/topgeni.png”
    134 
    135 100%[======================================>] 116,224     --.-K/s   in 0.002s 
    136 
    137 2012-07-06 04:33:49 (66.7 MB/s) - “pc484.emulab.net/media/topgeni.png” saved [116224/116224]
    138 
    139 --2012-07-06 04:33:49--  http://pc484.emulab.net/media/hello.png
    140 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    141 HTTP request sent, awaiting response... 200 OK
    142 Length: 11497 (11K) [image/png]
    143 Saving to: “pc484.emulab.net/media/hello.png”
    144 
    145 100%[======================================>] 11,497      --.-K/s   in 0s     
    146 
    147 2012-07-06 04:33:49 (45.8 MB/s) - “pc484.emulab.net/media/hello.png” saved [11497/11497]
    148 
    149 --2012-07-06 04:33:49--  http://pc484.emulab.net/media/experimenter.png
    150 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    151 HTTP request sent, awaiting response... 200 OK
    152 Length: 14020 (14K) [image/png]
    153 Saving to: “pc484.emulab.net/media/experimenter.png”
    154 
    155 100%[======================================>] 14,020      --.-K/s   in 0s     
    156 
    157 2012-07-06 04:33:49 (47.5 MB/s) - “pc484.emulab.net/media/experimenter.png” saved [14020/14020]
    158 
    159 --2012-07-06 04:33:49--  http://pc484.emulab.net/media/keyboard.png
    160 Connecting to pc484.emulab.net|155.98.38.84|:80... connected.
    161 HTTP request sent, awaiting response... 200 OK
    162 Length: 9533 (9.3K) [image/png]
    163 Saving to: “pc484.emulab.net/media/keyboard.png”
    164 
    165 100%[======================================>] 9,533       --.-K/s   in 0s     
    166 
    167 2012-07-06 04:33:49 (73.4 MB/s) - “pc484.emulab.net/media/keyboard.png” saved [9533/9533]
    168 
    169 FINISHED --2012-07-06 04:33:49--
    170 Downloaded: 9 files, 173K in 0.003s (59.7 MB/s)
    171 [inki@Client1 ~]$
     73   [inki@Client1 ~]$ wget -m -p http://server
    17274   }}}
    173 
    174  * Use the dataplane interface. Up to this point we have used the public interface of the webserver. As you are testing your environment, you should use the dataplane connections between the clients and the server. These are the private connections that are represented with a lines between the machines in Flack. To do that you first need to figure out the IP address of the server on each of these links. In the above example, the line that connects the Server with Client1 has a box that is called `lan0`. Press on the (i) button of that box and see the IP server that is assigned on the server. Use this IP to run the wget command :
    175   {{{
    176   [inki@Client1 ~]$ wget -m -p http://10.10.1.1
    177   }}}
    178 
    179 === Viewing and Adjusting link characteristics ===
     75   This will produce a directory with the followin data structure, run:
     76   {{{
     77  [inki@Client1 ~]$ ls server/
     78home.html  index.html  links.html  media  top.html
     79   }}}
     80
     81== Build your own Server ==
     82At a high level, a web server listens for connections on a socket (bound to a specific port on a host machine).  Clients connect to this socket and use a simple text-based protocol to retrieve files from the server.  For example, you might try the following command on `Client1`:
     83
     84{{{
     85% telnet server 80
     86GET /index.html HTTP/1.0
     87}}}
     88(Type two carriage returns after the "GET" command).  This will return to you (on the command line) the HTML representing the "front page" of the web server that is running on the `Server` host.)
     89
     90One of the key things to keep in mind in building your web server is that the server is translating relative filenames (such as index.html ) to absolute filenames in a local filesystem.  For example, you might decide to keep all the files for your server in ~10abc/cs339/server/files/, which we call the document root.  When your server gets a request for index.html (which is the default web page if no file is specified), it will prepend the document root to the specified file and determine if the file exists, and if the proper permissions are set on the file (typically the file has to be world readable).  If the file does not exist, a file not found error is returned.  If a file is present but the proper permissions are not set, a permission denied error is returned.  Otherwise, an HTTP OK message is returned along with the contents of a file.
     91
     92In our setup we are using the [http://httpd.apache.org/ Apache web server]. The default document root for Apache on a host running Fedora 10 is under `/var/www/html`.
     93  * Login to the `Server` host
     94  * Run
     95    {{{
     96    [inki@server ~]$ ls /var/www/html/*
     97    }}}
     98   
     99
     100You should also note that since index.html is the default file, web servers typically translate "GET /" to "GET /index.html".  That way index.html is assumed to be the filename if no explicit filename is present.  This is also why the two URLs http://www.cs.williams.edu and http://www.cs.williams.edu/index.html return equivalent results.
     101
     102When you type a URL into a web browser, the server retrieves the contents of the requested file.  If the file is of type text/html and HTTP/1.0 is being used, the browser will parse the html for embedded links (such as images) and then make separate connections to the web server to retrieve the embedded files.  If a web page contains 4 images, a total of five separate connections will be made to the web server to retrieve the html and the four image files.
     103
     104Using HTTP/1.0, a separate connection is used for each requested file. This implies that the TCP connections being used never get out of the slow start phase. HTTP/1.1 attempts to address this limitation. When using HTTP/1.1, the server keeps connections to clients open, allowing for "persistent" connections and pipelining of client requests. That is, after the results of a single request are returned (e.g., index.html), the server should by default leave the connection open for some period of time, allowing the client to reuse that connection to make subsequent requests. One key issue here is determining how long to keep the connection open. This timeout needs to be configured in the server and ideally should be dynamic based on the number of other active connections the server is currently supporting. Thus if the server is idle, it can afford to leave the connection open for a relatively long period of time. If the server is busy servicing several clients at once, it may not be able to afford to have an idle connection sitting around (consuming kernel/thread resources) for very long. You should develop a simple heuristic to determine this timeout in your server.
     105
     106For this assignment, you will need to support enough of the HTTP/1.0 and HTTP/1.1 protocols to allow an existing web browser (Firefox) to connect to your web server and retrieve the contents of the Willams CS front page from your server.  (Of course, this will require that you copy the appropriate files to your server's document directory.) Note that you DO NOT have to support script parsing (php, javascript), and you do not have to support HTTP POST requests. You should support images, and you should return appropriate HTTP error messages as needed.
     107
     108At a high level, your web server will be structured something like the following:
     109
     110Forever loop:
     111Listen for connections
     112    Accept new connection from incoming client
     113    Parse HTTP request
     114    Ensure well-formed request (return error otherwise)
     115    Determine if target file exists and if permissions are set properly (return error otherwise)
     116    Transmit contents of file to connect (by performing reads on the file and writes on the socket)
     117    Close the connection (if HTTP/1.0)
     118
     119You will have three main choices in how you structure your web server in the context of the above simple structure:
     120
     1211) A multi-threaded approach will spawn a new thread for each incoming connection.  That is, once the server accepts a connection, it will spawn a thread to parse the request, transmit the file, etc.
     122
     1232) A multi-process approach maintains a worker pool of active processes to hand requests off to from the main server.  This approach is largely appropriate because of its portability (relative to assuming the presence of a given threads package across multiple hardware/software platform).  It does face increased context-switch overhead relative to a multi-threaded approach.
     124
     1253) An event-driven architecture will keep a list of active connections and loop over them, performing a little bit of work on behalf of each connection.  For example, there might be a loop that first checks to see if any new connections are pending to the server (performing appropriate bookkeeping if so), and then it will loop overall all existing client connections and send a "block" of file data to each (e.g., 4096 bytes, or 8192 bytes, matching the granularity of disk block size).  This event-driven architecture has the primary advantage of avoiding any synchronization issues associated with a multi-threaded model (though synchronization effects should be limited in your simple web server) and avoids the performance overhead of context switching among a number of threads.
     126
     127You may choose from C or C++ to build your web server but you must do it in Linux (although the code should run on any Unix system).  In C/C++, you will want to become familiar with the interactions of the following system calls to build your system: socket(), select(), listen(), accept(), connect() .  We outline a number of resources below with additional information on these system calls.  A good book is also available on this topic (there is a reference copy of this in the lab).
    180128
    181129In this experiment, you'll be changing the characteristics of the link and measuring how they affect UDT and TCP performance.