Changes between Version 3 and Version 4 of GENIExperimenter/Tutorials/HadoopInASlice/ExecuteExperiment


Ignore:
Timestamp:
01/07/14 20:07:15 (10 years ago)
Author:
pruth@renci.org
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIExperimenter/Tutorials/HadoopInASlice/ExecuteExperiment

    v3 v4  
    2828       <td >
    2929         <ol>
    30            <li>Open a new terminal window. Type the login command for <i>Utah InstaGENI</i> into that terminal window.  You have now logged into your VM.</li>
    31            <li><FONT COLOR="black">Repeat the previous step for <i>GPO InstaGENI</i> in a second terminal window.</font></li>
    32 
    33 
    34 <table id="Table_03" border="0" cellpadding="5" cellspacing="0">
    35         <tr>
    36                 <td>
    37                         <img src="http://groups.geni.net/geni/attachment/wiki/GENIExperimenter/Tutorials/Graphics/Symbols-Tips-icon.png?format=raw" width="50" height="50" alt="Tip">
    38                </td>
    39                <td>
    40                     To find the login information again, go to the Slice page and press the <b>Details</b> button in the appropriate row of the slice table.
    41         </tr>
    42 </table>
    43 <table id="Table_01" border="0" cellpadding="5" cellspacing="0">
    44         <tr>
    45                 <td>
    46                         <img src="http://groups.geni.net/geni/attachment/wiki/GENIExperimenter/Tutorials/Graphics/warning-icon-hi.png?format=raw" width="50" height="50" alt="Warning">
    47                </td>
    48                <td>
    49                     Use this icon to identify warnings or important notes.
    50                </td>
    51         </tr>
    52 </table>
    53 
    54 
    55            <li>(Optionally) If your neighbor added you to their slice, login to your neighbor's slice. You will find your login information on the slice page for your neighbor's slice.</li>
    56           </ol>
    57        </td>
    58         <td>
    59         <img border="0" src="http://groups.geni.net/geni/attachment/wiki/GENIExperimenter/Tutorials/GREESC13/PortalSimpleLayer2Example/Graphics/log_in_v1.png?format=raw" alt="Login information for a VM"  height="200" title="Login information for a VM" />
    60 <br />
    61          <b>Figure 9-1</b> <i>The </i>Details<i> page at </i>Utah InstaGENI</i>.</i>
    62        </td>
    63     </tr>
     30           <li>Login (ssh) to the hadoop-master using a yourself using the key you associated with the
     31GENI Portal and the IP address displayed by Flack. The ssh application you use will
     32depend on the configuration of laptop/desktop that you are using.</li>
     33           <li>Check the status/properties of the VMs.</li>
     34
     35<ol type="a">
     36<li> Observe the properties of the network interfaces </li>
     37
     38<pre><code>
     39# /sbin/ifconfig
     40eth0      Link encap:Ethernet  HWaddr fa:16:3e:72:ad:a6 
     41          inet addr:10.103.0.20  Bcast:10.103.0.255  Mask:255.255.255.0
     42          inet6 addr: fe80::f816:3eff:fe72:ada6/64 Scope:Link
     43          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
     44          RX packets:1982 errors:0 dropped:0 overruns:0 frame:0
     45          TX packets:1246 errors:0 dropped:0 overruns:0 carrier:0
     46          collisions:0 txqueuelen:1000
     47          RX bytes:301066 (294.0 KiB)  TX bytes:140433 (137.1 KiB)
     48          Interrupt:11 Base address:0x2000
     49
     50eth1      Link encap:Ethernet  HWaddr fe:16:3e:00:6d:af 
     51          inet addr:172.16.1.1  Bcast:172.16.1.255  Mask:255.255.255.0
     52          inet6 addr: fe80::fc16:3eff:fe00:6daf/64 Scope:Link
     53          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
     54          RX packets:21704 errors:0 dropped:0 overruns:0 frame:0
     55          TX packets:4562 errors:0 dropped:0 overruns:0 carrier:0
     56          collisions:0 txqueuelen:1000
     57          RX bytes:3100262 (2.9 MiB)  TX bytes:824572 (805.2 KiB)
     58
     59lo        Link encap:Local Loopback 
     60          inet addr:127.0.0.1  Mask:255.0.0.0
     61          inet6 addr: ::1/128 Scope:Host
     62          UP LOOPBACK RUNNING  MTU:16436  Metric:1
     63          RX packets:19394 errors:0 dropped:0 overruns:0 frame:0
     64          TX packets:19394 errors:0 dropped:0 overruns:0 carrier:0
     65          collisions:0 txqueuelen:0
     66          RX bytes:4010954 (3.8 MiB)  TX bytes:4010954 (3.8 MiB)
     67</code></pre>
     68
     69<li> Observe the contents of the NEuca user data file.  This file includes a script that will install and execute the script that you configured for the VM. </li>
     70<pre><code>
     71# neuca-user-data
     72[global]
     73actor_id=67C4EFB4-7CBF-48C9-8195-934FF81434DC
     74slice_id=39672f6e-610a-4d86-8810-30e02d20cc99
     75reservation_id=55676541-5221-483d-bb60-429de025f275
     76unit_id=902709a4-32f2-41fc-b85c-b4791c779580
     77;router= Not Specified
     78;iscsi_initiator_iqn= Not Specified
     79slice_name=urn:publicid:IDN+ch.geni.net:ADAMANT+slice+pruth-winter-camp
     80unit_url=http://geni-orca.renci.org/owl/8210b4d7-4afc-4838-801f-c20a8f1f75ae#hadoop-master
     81host_name=hadoop-master
     82[interfaces]
     83fe163e006daf=up:ipv4:172.16.1.1/24
     84[storage]
     85[routes]
     86[scripts]
     87bootscript=#!/bin/bash
     88        # Automatically generated boot script
     89        # wget or curl must be installed on the image
     90        mkdir -p /tmp
     91        cd /tmp
     92        if [ -x `which wget 2>/dev/null` ]; then
     93          wget -q -O `basename http://geni-images.renci.org/images/GENIWinterCamp/master.sh` http://geni-images.renci.org/images/GENIWinterCamp/master.sh
     94        else if [ -x `which curl 2>/dev/null` ]; then
     95          curl http://geni-images.renci.org/images/GENIWinterCamp/master.sh > `basename http://geni-images.renci.org/images/GENIWinterCamp/master.sh`
     96        fi
     97        fi
     98        eval "/bin/sh -c \"chmod +x /tmp/master.sh; /tmp/master.sh\""
     99</code></pre>
     100
     101
     102<li> Observe the contents of the of the script that was installed and executed on the VM. </li>
     103<pre><code>
     104# cat /tmp/master.sh
     105#!/bin/bash
     106
     107 echo "Hello from neuca script" > /home/hadoop/log
     108 MY_HOSTNAME=hadoop-master
     109 hostname $MY_HOSTNAME
     110 echo 172.16.1.1  hadoop-master  >> /etc/hosts
     111  echo 172.16.1.10 hadoop-worker-0 >> /etc/hosts
     112  echo 172.16.1.11 hadoop-worker-1 >> /etc/hosts
     113  echo 172.16.1.12 hadoop-worker-2 >> /etc/hosts
     114  echo 172.16.1.13 hadoop-worker-3 >> /etc/hosts
     115  echo 172.16.1.14 hadoop-worker-4 >> /etc/hosts
     116  echo 172.16.1.15 hadoop-worker-5 >> /etc/hosts
     117  echo 172.16.1.16 hadoop-worker-6 >> /etc/hosts
     118  echo 172.16.1.17 hadoop-worker-7 >> /etc/hosts
     119  echo 172.16.1.18 hadoop-worker-8 >> /etc/hosts
     120  echo 172.16.1.19 hadoop-worker-9 >> /etc/hosts
     121  echo 172.16.1.20 hadoop-worker-10 >> /etc/hosts
     122  echo 172.16.1.21 hadoop-worker-11 >> /etc/hosts
     123  echo 172.16.1.22 hadoop-worker-12 >> /etc/hosts
     124  echo 172.16.1.23 hadoop-worker-13 >> /etc/hosts
     125  echo 172.16.1.24 hadoop-worker-14 >> /etc/hosts
     126  echo 172.16.1.25 hadoop-worker-15 >> /etc/hosts
     127  while true; do
     128      PING=`ping -c 1 172.16.1.1 > /dev/null 2>&1`
     129      if [ "$?" = "0" ]; then
     130          break
     131      fi
     132      sleep 5
     133  done
     134  echo '/home/hadoop/hadoop-euca-init.sh 172.16.1.1 -master' >> /home/hadoop/log
     135  /home/hadoop/hadoop-euca-init.sh 172.16.1.1 -master
     136  echo "Done starting daemons" >> /home/hadoop/log
     137</code></pre>
     138
     139
     140<li>Test for connectivity between the VMs.</li>
     141<pre><code>
     142# ping hadoop-worker-0
     143PING hadoop-worker-0 (172.16.1.10) 56(84) bytes of data.
     14464 bytes from hadoop-worker-0 (172.16.1.10): icmp_req=1 ttl=64 time=0.747 ms
     14564 bytes from hadoop-worker-0 (172.16.1.10): icmp_req=2 ttl=64 time=0.459 ms
     14664 bytes from hadoop-worker-0 (172.16.1.10): icmp_req=3 ttl=64 time=0.411 ms
     147^C
     148--- hadoop-worker-0 ping statistics ---
     1493 packets transmitted, 3 received, 0% packet loss, time 1998ms
     150rtt min/avg/max/mdev = 0.411/0.539/0.747/0.148 ms
     151# ping hadoop-worker-1
     152PING hadoop-worker-1 (172.16.1.11) 56(84) bytes of data.
     15364 bytes from hadoop-worker-1 (172.16.1.11): icmp_req=1 ttl=64 time=0.852 ms
     15464 bytes from hadoop-worker-1 (172.16.1.11): icmp_req=2 ttl=64 time=0.468 ms
     15564 bytes from hadoop-worker-1 (172.16.1.11): icmp_req=3 ttl=64 time=0.502 ms
     156^C
     157--- hadoop-worker-1 ping statistics ---
     1583 packets transmitted, 3 received, 0% packet loss, time 1999ms
     159rtt min/avg/max/mdev = 0.468/0.607/0.852/0.174 ms
     160</code></pre>
     161
     162
     163<li> Check the status of the Hadoop filesystem</li>
     164<pre><code>
     165# hadoop dfsadmin -report
     166Configured Capacity: 54958481408 (51.18 GB)
     167Present Capacity: 48681934878 (45.34 GB)
     168DFS Remaining: 48681885696 (45.34 GB)
     169DFS Used: 49182 (48.03 KB)
     170DFS Used%: 0%
     171Under replicated blocks: 1
     172Blocks with corrupt replicas: 0
     173Missing blocks: 0
     174
     175-------------------------------------------------
     176Datanodes available: 2 (2 total, 0 dead)
     177
     178Name: 172.16.1.11:50010
     179Rack: /default/rack0
     180Decommission Status : Normal
     181Configured Capacity: 27479240704 (25.59 GB)
     182DFS Used: 24591 (24.01 KB)
     183Non DFS Used: 3137957873 (2.92 GB)
     184DFS Remaining: 24341258240(22.67 GB)
     185DFS Used%: 0%
     186DFS Remaining%: 88.58%
     187Last contact: Sat Jan 04 21:49:32 UTC 2014
     188
     189
     190Name: 172.16.1.10:50010
     191Rack: /default/rack0
     192Decommission Status : Normal
     193Configured Capacity: 27479240704 (25.59 GB)
     194DFS Used: 24591 (24.01 KB)
     195Non DFS Used: 3138588657 (2.92 GB)
     196DFS Remaining: 24340627456(22.67 GB)
     197DFS Used%: 0%
     198DFS Remaining%: 88.58%
     199Last contact: Sat Jan 04 21:49:33 UTC 2014
     200</code></pre>
     201
     202
     203<li> Test the filesystem with a small file </li>
     204
     205<ol type="a">
     206<li> Create a small test file </li>
     207<pre><code>
     208# hadoop fs -put hello.txt hello.txt
     209</code></pre>
     210
     211<li> Push the file into the Hadoop filesystem</li>
     212<pre><code>
     213# hadoop fs -put hello.txt hello.txt
     214</code></pre>
     215
     216<li>  Check for the file's existence </li>
     217<pre><code>
     218# hadoop fs -ls
     219Found 1 items
     220-rw-r--r--   3 root supergroup         12 2014-01-04 21:59 /user/root/hello.txt
     221</code></pre>
     222
     223<li> Check the contents of the file </li>
     224<pre><code>
     225# hadoop fs -cat hello.txt
     226Hello GENI World
     227</code></pre>
     228
     229</ol>
     230
     231
     232<li> Test the true power of the Hadoop filesystem by creating and sorting a large random dataset.   It may be useful/interesting to login to the master and/or worker VMs and use tools like \verb$top$, \verb$iotop$, and \verb$iftop$ to observe the resource utilization on each of the VMs during the sort test.  </li>
     233
     234
     235<ol type="a">
     236<li> Create a 1 GB random data set.  After the data is created, use the \verb$ls$ functionally to confirm the data exists.  Note that the data is composed of several files in a directory. </li>
     237<pre><code>
     238#  hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar teragen 10000000 random.data.1G
     239Generating 10000000 using 2 maps with step of 5000000
     24014/01/05 18:47:58 INFO mapred.JobClient: Running job: job_201401051828_0003
     24114/01/05 18:47:59 INFO mapred.JobClient:  map 0% reduce 0%
     24214/01/05 18:48:14 INFO mapred.JobClient:  map 35% reduce 0%
     24314/01/05 18:48:17 INFO mapred.JobClient:  map 57% reduce 0%
     24414/01/05 18:48:20 INFO mapred.JobClient:  map 80% reduce 0%
     24514/01/05 18:48:26 INFO mapred.JobClient:  map 100% reduce 0%
     24614/01/05 18:48:28 INFO mapred.JobClient: Job complete: job_201401051828_0003
     24714/01/05 18:48:28 INFO mapred.JobClient: Counters: 6
     24814/01/05 18:48:28 INFO mapred.JobClient:   Job Counters
     24914/01/05 18:48:28 INFO mapred.JobClient:     Launched map tasks=2
     25014/01/05 18:48:28 INFO mapred.JobClient:   FileSystemCounters
     25114/01/05 18:48:28 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1000000000
     25214/01/05 18:48:28 INFO mapred.JobClient:   Map-Reduce Framework
     25314/01/05 18:48:28 INFO mapred.JobClient:     Map input records=10000000
     25414/01/05 18:48:28 INFO mapred.JobClient:     Spilled Records=0
     25514/01/05 18:48:28 INFO mapred.JobClient:     Map input bytes=10000000
     25614/01/05 18:48:28 INFO mapred.JobClient:     Map output records=10000000
     257</code></pre>
     258
     259<li> Sort the datasets.  On your own, you can use the \verb$cat$ and/or \verb$get$ functionally to look at the random and sorted files to confirm their size and that the sort actually worked.
     260</li>
     261<pre><code>
     262# hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar terasort random.data.1G sorted.data.1G
     26314/01/05 18:50:49 INFO terasort.TeraSort: starting
     26414/01/05 18:50:49 INFO mapred.FileInputFormat: Total input paths to process : 2
     26514/01/05 18:50:50 INFO util.NativeCodeLoader: Loaded the native-hadoop library
     26614/01/05 18:50:50 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
     26714/01/05 18:50:50 INFO compress.CodecPool: Got brand-new compressor
     268Making 1 from 100000 records
     269Step size is 100000.0
     27014/01/05 18:50:50 INFO mapred.JobClient: Running job: job_201401051828_0004
     27114/01/05 18:50:51 INFO mapred.JobClient:  map 0% reduce 0%
     27214/01/05 18:51:05 INFO mapred.JobClient:  map 6% reduce 0%
     27314/01/05 18:51:08 INFO mapred.JobClient:  map 20% reduce 0%
     27414/01/05 18:51:11 INFO mapred.JobClient:  map 33% reduce 0%
     27514/01/05 18:51:14 INFO mapred.JobClient:  map 37% reduce 0%
     27614/01/05 18:51:29 INFO mapred.JobClient:  map 55% reduce 0%
     27714/01/05 18:51:32 INFO mapred.JobClient:  map 65% reduce 6%
     27814/01/05 18:51:35 INFO mapred.JobClient:  map 71% reduce 6%
     27914/01/05 18:51:38 INFO mapred.JobClient:  map 72% reduce 8%
     28014/01/05 18:51:44 INFO mapred.JobClient:  map 74% reduce 8%
     28114/01/05 18:51:47 INFO mapred.JobClient:  map 74% reduce 10%
     28214/01/05 18:51:50 INFO mapred.JobClient:  map 87% reduce 12%
     28314/01/05 18:51:53 INFO mapred.JobClient:  map 92% reduce 12%
     28414/01/05 18:51:56 INFO mapred.JobClient:  map 93% reduce 12%
     28514/01/05 18:52:02 INFO mapred.JobClient:  map 100% reduce 14%
     28614/01/05 18:52:05 INFO mapred.JobClient:  map 100% reduce 22%
     28714/01/05 18:52:08 INFO mapred.JobClient:  map 100% reduce 29%
     28814/01/05 18:52:14 INFO mapred.JobClient:  map 100% reduce 33%
     28914/01/05 18:52:23 INFO mapred.JobClient:  map 100% reduce 67%
     29014/01/05 18:52:26 INFO mapred.JobClient:  map 100% reduce 70%
     29114/01/05 18:52:29 INFO mapred.JobClient:  map 100% reduce 75%
     29214/01/05 18:52:32 INFO mapred.JobClient:  map 100% reduce 80%
     29314/01/05 18:52:35 INFO mapred.JobClient:  map 100% reduce 85%
     29414/01/05 18:52:38 INFO mapred.JobClient:  map 100% reduce 90%
     29514/01/05 18:52:46 INFO mapred.JobClient:  map 100% reduce 100%
     29614/01/05 18:52:48 INFO mapred.JobClient: Job complete: job_201401051828_0004
     29714/01/05 18:52:48 INFO mapred.JobClient: Counters: 18
     29814/01/05 18:52:48 INFO mapred.JobClient:   Job Counters
     29914/01/05 18:52:48 INFO mapred.JobClient:     Launched reduce tasks=1
     30014/01/05 18:52:48 INFO mapred.JobClient:     Launched map tasks=16
     30114/01/05 18:52:48 INFO mapred.JobClient:     Data-local map tasks=16
     30214/01/05 18:52:48 INFO mapred.JobClient:   FileSystemCounters
     30314/01/05 18:52:48 INFO mapred.JobClient:     FILE_BYTES_READ=2382257412
     30414/01/05 18:52:48 INFO mapred.JobClient:     HDFS_BYTES_READ=1000057358
     30514/01/05 18:52:48 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=3402255956
     30614/01/05 18:52:48 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1000000000
     30714/01/05 18:52:48 INFO mapred.JobClient:   Map-Reduce Framework
     30814/01/05 18:52:48 INFO mapred.JobClient:     Reduce input groups=10000000
     30914/01/05 18:52:48 INFO mapred.JobClient:     Combine output records=0
     31014/01/05 18:52:48 INFO mapred.JobClient:     Map input records=10000000
     31114/01/05 18:52:48 INFO mapred.JobClient:     Reduce shuffle bytes=951549012
     31214/01/05 18:52:48 INFO mapred.JobClient:     Reduce output records=10000000
     31314/01/05 18:52:48 INFO mapred.JobClient:     Spilled Records=33355441
     31414/01/05 18:52:48 INFO mapred.JobClient:     Map output bytes=1000000000
     31514/01/05 18:52:48 INFO mapred.JobClient:     Map input bytes=1000000000
     31614/01/05 18:52:48 INFO mapred.JobClient:     Combine input records=0
     31714/01/05 18:52:48 INFO mapred.JobClient:     Map output records=10000000
     31814/01/05 18:52:48 INFO mapred.JobClient:     Reduce input records=10000000
     31914/01/05 18:52:48 INFO terasort.TeraSort: done
     320</code></pre>
     321
     322</ol>
     323
     324<li>Re-do tutorial with a different number of workers, amount of bandwidth, and/or worker  instance types.  Warning:  Be courteous to  other users and do not take all the resources.  </li>
     325
     326<ol type="a">
     327<li> Time the performance of runs with different resources </li>
     328<li> Observe largest size file you can create with different settings. </li>
     329</ol>
     330
     331
     332</ol>
     333
     334
     335
     336</ol>
     337</td>
     338</tr>
     339
     340
    64341 </table>
    65342}}}