Changes between Version 8 and Version 9 of GENIExperimenter/Tutorials/HadoopInASlice/ExecuteExperiment


Ignore:
Timestamp:
01/07/14 21:26:58 (10 years ago)
Author:
pruth@renci.org
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIExperimenter/Tutorials/HadoopInASlice/ExecuteExperiment

    v8 v9  
    2323
    2424== 1. Login to Hadoop Master ==
     25
    2526{{{
    2627#!html
     
    4243== 2. Check the status/properties of the VMs. ==
    4344
    44 {{{
    45 #!html
    46 
    47 
    48 <table border="0">
    49       <tr>
    50 
    51        <td >
    52          <ol type="A">
    53            
    54 <li> Observe the properties of the network interfaces </li>
    55 
    56 
    57 
    58 <pre><code>
     45=== A. Observe the properties of the network interfaces ===
     46
     47{{{
    5948# /sbin/ifconfig
    6049eth0      Link encap:Ethernet  HWaddr fa:16:3e:72:ad:a6 
     
    8574          collisions:0 txqueuelen:0
    8675          RX bytes:4010954 (3.8 MiB)  TX bytes:4010954 (3.8 MiB)
    87 </code></pre>
     76}}}
    8877 
    8978
    90 
    91 
    92 <li> Observe the contents of the NEuca user data file.  This file includes a script that will install and execute the script that you configured for the VM. </li>
    93 <pre><code>
     79===  Observe the contents of the NEuca user data file.  This file includes a script that will install and execute the script that you configured for the VM. ===
     80{{{
    9481# neuca-user-data
    9582[global]
     
    120107        fi
    121108        eval "/bin/sh -c \"chmod +x /tmp/master.sh; /tmp/master.sh\""
    122 </code></pre>
    123 
    124 
    125 <li> Observe the contents of the of the script that was installed and executed on the VM. </li>
    126 <pre><code>
     109}}}
     110
     111
     112=== Observe the contents of the of the script that was installed and executed on the VM. ===
     113{{{
    127114# cat /tmp/master.sh
    128115#!/bin/bash
     
    158145  /home/hadoop/hadoop-euca-init.sh 172.16.1.1 -master
    159146  echo "Done starting daemons" >> /home/hadoop/log
    160 </code></pre>
    161 
    162 
    163 <li>Test for connectivity between the VMs.</li>
    164 <pre><code>
     147}}}
     148
     149
     150=== Test for connectivity between the VMs. ===
     151
     152{{{
    165153# ping hadoop-worker-0
    166154PING hadoop-worker-0 (172.16.1.10) 56(84) bytes of data.
     
    1811693 packets transmitted, 3 received, 0% packet loss, time 1999ms
    182170rtt min/avg/max/mdev = 0.468/0.607/0.852/0.174 ms
    183 </code></pre>
    184 </op>
    185 
    186 </td></tr>
    187 </table>
    188171}}}
    189172
    190173== 3. Check the status of the Hadoop filesystem. ==
    191174
    192 {{{
    193 #!html
    194 
    195 
    196 <table border="0">
    197       <tr>
    198 
    199        <td >
    200          <ol type="A">
    201 
    202 
    203 <li> Query for the status of the filesystem and its associated workers. </li>
    204 <pre><code>
     175=== Query for the status of the filesystem and its associated workers. ===
     176
     177{{{
    205178# hadoop dfsadmin -report
    206179Configured Capacity: 54958481408 (51.18 GB)
     
    238211DFS Remaining%: 88.58%
    239212Last contact: Sat Jan 04 21:49:33 UTC 2014
    240 </code></pre>
    241 
    242 
    243 
    244 <li> Test the filesystem with a small file </li>
    245 
    246 <ol type="a">
    247 <li> Create a small test file </li>
    248 <pre><code>
     213}}}
     214
     215
     216
     217== Test the filesystem with a small file ==
     218
     219
     220=== Create a small test file ===
     221{{{
    249222# hadoop fs -put hello.txt hello.txt
    250 </code></pre>
    251 
    252 <li> Push the file into the Hadoop filesystem</li>
    253 <pre><code>
     223}}}
     224
     225=== Push the file into the Hadoop filesystem ===
     226{{{
    254227# hadoop fs -put hello.txt hello.txt
    255 </code></pre>
    256 
    257 <li>  Check for the file's existence </li>
    258 <pre><code>
     228}}}
     229
     230=== Check for the file's existence ===
     231{{{
    259232# hadoop fs -ls
    260233Found 1 items
    261234-rw-r--r--   3 root supergroup         12 2014-01-04 21:59 /user/root/hello.txt
    262 </code></pre>
    263 
    264 <li> Check the contents of the file </li>
    265 <pre><code>
     235}}}
     236
     237===  Check the contents of the file ===
     238{{{
    266239# hadoop fs -cat hello.txt
    267240Hello GENI World
    268 </code></pre>
    269 
    270 </ol>
    271 </op>
    272 
    273 </td></tr>
    274 </table>
    275241}}}
    276242
     
    279245 Test the true power of the Hadoop filesystem by creating and sorting a large random dataset.   It may be useful/interesting to login to the master and/or worker VMs and use tools like \verb$top$, \verb$iotop$, and \verb$iftop$ to observe the resource utilization on each of the VMs during the sort test.
    280246
    281 {{{
    282 #!html
    283 
    284 
    285 <table border="0">
    286       <tr>
    287 
    288        <td >
    289          <ol type="A">
    290 <li> Create a 1 GB random data set.  After the data is created, use the \verb$ls$ functionally to confirm the data exists.  Note that the data is composed of several files in a directory. </li>
    291 <pre><code>
     247==  Create a 1 GB random data set.  After the data is created, use the \verb$ls$ functionally to confirm the data exists.  Note that the data is composed of several files in a directory. ==
     248{{{
    292249#  hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar teragen 10000000 random.data.1G
    293250Generating 10000000 using 2 maps with step of 5000000
     
    30926614/01/05 18:48:28 INFO mapred.JobClient:     Map input bytes=10000000
    31026714/01/05 18:48:28 INFO mapred.JobClient:     Map output records=10000000
    311 </code></pre>
    312 
    313 <li> Sort the datasets.  On your own, you can use the \verb$cat$ and/or \verb$get$ functionally to look at the random and sorted files to confirm their size and that the sort actually worked.
    314 </li>
    315 <pre><code>
     268}}}
     269
     270== Sort the datasets.  On your own, you can use the \verb$cat$ and/or \verb$get$ functionally to look at the random and sorted files to confirm their size and that the sort actually worked. ==
     271
     272{{{
    316273# hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar terasort random.data.1G sorted.data.1G
    31727414/01/05 18:50:49 INFO terasort.TeraSort: starting
     
    37232914/01/05 18:52:48 INFO mapred.JobClient:     Reduce input records=10000000
    37333014/01/05 18:52:48 INFO terasort.TeraSort: done
    374 </code></pre>
    375 
    376 </ol>
    377 
    378 </td></tr>
    379 </table>
    380 }}}
     331}}}
     332
    381333
    382334== 5.   Advanced Example ==
     
    384336 Re-do the tutorial with a different number of workers, amount of bandwidth, and/or worker  instance types.  Warning:  Be courteous to  other users and do not take all the resources.
    385337
    386 {{{
    387 #!html
    388 
    389 
    390 <table border="0">
    391       <tr>
    392 
    393        <td >
    394          <ol type="A">
    395 
    396 
    397 <li> Time the performance of runs with different resources </li>
    398 <li> Observe largest size file you can create with different settings. </li>
    399 </ol>
    400 
    401 
    402 </ol>
    403 
    404 
    405 
    406 </ol>
    407 </td>
    408 </tr>
    409 
    410 
    411  </table>
    412 }}}
    413 
     338A. Time the performance of runs with different resources
     339B. Observe largest size file you can create with different settings.
    414340
    415341