| 1 | = [wiki:GENIExperimenter/Tutorials/HadoopInASlice Hadoop in a Slice] = |
| 2 | == Part II: Execute Experiment: Login to the nodes and execute the Hadoop experiment == |
| 3 | {{{ |
| 4 | #!html |
| 5 | |
| 6 | |
| 7 | |
| 8 | <div style="text-align:center; width:495px; margin-left:auto; margin-right:auto;"> |
| 9 | <img id="Image-Maps_5201305222028436" src="http://groups.geni.net/geni/attachment/wiki/GENIExperimenter/Tutorials/Graphics/Execute.jpg?format=raw" usemap="#Image-Maps_5201305222028436" border="0" width="495" height="138" alt="" /> |
| 10 | <map id="_Image-Maps_5201305222028436" name="Image-Maps_5201305222028436"> |
| 11 | <area shape="rect" coords="18,18,135,110" href="http://groups.geni.net/geni/wiki/GENIExperimenter/Tutorials/HadoopInASlice/ObtainResources" alt="" title="" /> |
| 12 | <area shape="rect" coords="180,18,297,111" href="http://groups.geni.net/geni/wiki/GENIExperimenter/Tutorials/HadoopInASlice/ExecuteExperiment" alt="" title="" /> |
| 13 | <area shape="rect" coords="344,17,460,110" href="http://groups.geni.net/geni/wiki/GENIExperimenter/Tutorials/HadoopInASlice/TeardownExperiment" alt="" title="" /> |
| 14 | <area shape="rect" coords="493,136,495,138" href="http://www.image-maps.com/index.php?aff=mapped_users_5201305222028436" alt="Image Map" title="Image Map" /> |
| 15 | </map> |
| 16 | <!-- Image map text links - End - --> |
| 17 | |
| 18 | </div> |
| 19 | }}} |
| 20 | = Instructions = |
| 21 | |
| 22 | Now that you have reserved your resources, you are ready to login to the slice and run some Hadoop examples. |
| 23 | |
| 24 | == 1. Login to Hadoop Master == |
| 25 | |
| 26 | {{{ |
| 27 | #!html |
| 28 | |
| 29 | |
| 30 | <table border="0"> |
| 31 | <tr> |
| 32 | |
| 33 | <td > |
| 34 | <ol type="A"> |
| 35 | <li>Login (ssh) to the hadoop-master using the credentials associated with the |
| 36 | GENI Portal and the IP address displayed by Flack. The ssh application you use will |
| 37 | depend on the configuration of your laptop/desktop.</li> |
| 38 | </op> |
| 39 | </td></tr> |
| 40 | </table> |
| 41 | }}} |
| 42 | |
| 43 | == 2. Check the status/properties of the VMs. == |
| 44 | |
| 45 | === A. Observe the properties of the network interfaces === |
| 46 | |
| 47 | {{{ |
| 48 | # /sbin/ifconfig |
| 49 | eth0 Link encap:Ethernet HWaddr fa:16:3e:72:ad:a6 |
| 50 | inet addr:10.103.0.20 Bcast:10.103.0.255 Mask:255.255.255.0 |
| 51 | inet6 addr: fe80::f816:3eff:fe72:ada6/64 Scope:Link |
| 52 | UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 |
| 53 | RX packets:1982 errors:0 dropped:0 overruns:0 frame:0 |
| 54 | TX packets:1246 errors:0 dropped:0 overruns:0 carrier:0 |
| 55 | collisions:0 txqueuelen:1000 |
| 56 | RX bytes:301066 (294.0 KiB) TX bytes:140433 (137.1 KiB) |
| 57 | Interrupt:11 Base address:0x2000 |
| 58 | |
| 59 | eth1 Link encap:Ethernet HWaddr fe:16:3e:00:6d:af |
| 60 | inet addr:172.16.1.1 Bcast:172.16.1.255 Mask:255.255.255.0 |
| 61 | inet6 addr: fe80::fc16:3eff:fe00:6daf/64 Scope:Link |
| 62 | UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 |
| 63 | RX packets:21704 errors:0 dropped:0 overruns:0 frame:0 |
| 64 | TX packets:4562 errors:0 dropped:0 overruns:0 carrier:0 |
| 65 | collisions:0 txqueuelen:1000 |
| 66 | RX bytes:3100262 (2.9 MiB) TX bytes:824572 (805.2 KiB) |
| 67 | |
| 68 | lo Link encap:Local Loopback |
| 69 | inet addr:127.0.0.1 Mask:255.0.0.0 |
| 70 | inet6 addr: ::1/128 Scope:Host |
| 71 | UP LOOPBACK RUNNING MTU:16436 Metric:1 |
| 72 | RX packets:19394 errors:0 dropped:0 overruns:0 frame:0 |
| 73 | TX packets:19394 errors:0 dropped:0 overruns:0 carrier:0 |
| 74 | collisions:0 txqueuelen:0 |
| 75 | RX bytes:4010954 (3.8 MiB) TX bytes:4010954 (3.8 MiB) |
| 76 | }}} |
| 77 | |
| 78 | |
| 79 | === B. Observe the contents of the NEuca user data file. This file includes a script that will install and execute the script that you configured for the VM. === |
| 80 | {{{ |
| 81 | # neuca-user-data |
| 82 | [global] |
| 83 | actor_id=67C4EFB4-7CBF-48C9-8195-934FF81434DC |
| 84 | slice_id=39672f6e-610a-4d86-8810-30e02d20cc99 |
| 85 | reservation_id=55676541-5221-483d-bb60-429de025f275 |
| 86 | unit_id=902709a4-32f2-41fc-b85c-b4791c779580 |
| 87 | ;router= Not Specified |
| 88 | ;iscsi_initiator_iqn= Not Specified |
| 89 | slice_name=urn:publicid:IDN+ch.geni.net:ADAMANT+slice+pruth-winter-camp |
| 90 | unit_url=http://geni-orca.renci.org/owl/8210b4d7-4afc-4838-801f-c20a8f1f75ae#hadoop-master |
| 91 | host_name=hadoop-master |
| 92 | [interfaces] |
| 93 | fe163e006daf=up:ipv4:172.16.1.1/24 |
| 94 | [storage] |
| 95 | [routes] |
| 96 | [scripts] |
| 97 | bootscript=#!/bin/bash |
| 98 | # Automatically generated boot script |
| 99 | # wget or curl must be installed on the image |
| 100 | mkdir -p /tmp |
| 101 | cd /tmp |
| 102 | if [ -x `which wget 2>/dev/null` ]; then |
| 103 | wget -q -O `basename http://geni-images.renci.org/images/GENIWinterCamp/master.sh` http://geni-images.renci.org/images/GENIWinterCamp/master.sh |
| 104 | else if [ -x `which curl 2>/dev/null` ]; then |
| 105 | curl http://geni-images.renci.org/images/GENIWinterCamp/master.sh > `basename http://geni-images.renci.org/images/GENIWinterCamp/master.sh` |
| 106 | fi |
| 107 | fi |
| 108 | eval "/bin/sh -c \"chmod +x /tmp/master.sh; /tmp/master.sh\"" |
| 109 | }}} |
| 110 | |
| 111 | |
| 112 | === C. Observe the contents of the of the script that was installed and executed on the VM. === |
| 113 | {{{ |
| 114 | # cat /tmp/master.sh |
| 115 | #!/bin/bash |
| 116 | |
| 117 | echo "Hello from neuca script" > /home/hadoop/log |
| 118 | MY_HOSTNAME=hadoop-master |
| 119 | hostname $MY_HOSTNAME |
| 120 | echo 172.16.1.1 hadoop-master >> /etc/hosts |
| 121 | echo 172.16.1.10 hadoop-worker-0 >> /etc/hosts |
| 122 | echo 172.16.1.11 hadoop-worker-1 >> /etc/hosts |
| 123 | echo 172.16.1.12 hadoop-worker-2 >> /etc/hosts |
| 124 | echo 172.16.1.13 hadoop-worker-3 >> /etc/hosts |
| 125 | echo 172.16.1.14 hadoop-worker-4 >> /etc/hosts |
| 126 | echo 172.16.1.15 hadoop-worker-5 >> /etc/hosts |
| 127 | echo 172.16.1.16 hadoop-worker-6 >> /etc/hosts |
| 128 | echo 172.16.1.17 hadoop-worker-7 >> /etc/hosts |
| 129 | echo 172.16.1.18 hadoop-worker-8 >> /etc/hosts |
| 130 | echo 172.16.1.19 hadoop-worker-9 >> /etc/hosts |
| 131 | echo 172.16.1.20 hadoop-worker-10 >> /etc/hosts |
| 132 | echo 172.16.1.21 hadoop-worker-11 >> /etc/hosts |
| 133 | echo 172.16.1.22 hadoop-worker-12 >> /etc/hosts |
| 134 | echo 172.16.1.23 hadoop-worker-13 >> /etc/hosts |
| 135 | echo 172.16.1.24 hadoop-worker-14 >> /etc/hosts |
| 136 | echo 172.16.1.25 hadoop-worker-15 >> /etc/hosts |
| 137 | while true; do |
| 138 | PING=`ping -c 1 172.16.1.1 > /dev/null 2>&1` |
| 139 | if [ "$?" = "0" ]; then |
| 140 | break |
| 141 | fi |
| 142 | sleep 5 |
| 143 | done |
| 144 | echo '/home/hadoop/hadoop-euca-init.sh 172.16.1.1 -master' >> /home/hadoop/log |
| 145 | /home/hadoop/hadoop-euca-init.sh 172.16.1.1 -master |
| 146 | echo "Done starting daemons" >> /home/hadoop/log |
| 147 | }}} |
| 148 | |
| 149 | |
| 150 | === D. Test for connectivity between the VMs. === |
| 151 | |
| 152 | {{{ |
| 153 | # ping hadoop-worker-0 |
| 154 | PING hadoop-worker-0 (172.16.1.10) 56(84) bytes of data. |
| 155 | 64 bytes from hadoop-worker-0 (172.16.1.10): icmp_req=1 ttl=64 time=0.747 ms |
| 156 | 64 bytes from hadoop-worker-0 (172.16.1.10): icmp_req=2 ttl=64 time=0.459 ms |
| 157 | 64 bytes from hadoop-worker-0 (172.16.1.10): icmp_req=3 ttl=64 time=0.411 ms |
| 158 | ^C |
| 159 | --- hadoop-worker-0 ping statistics --- |
| 160 | 3 packets transmitted, 3 received, 0% packet loss, time 1998ms |
| 161 | rtt min/avg/max/mdev = 0.411/0.539/0.747/0.148 ms |
| 162 | # ping hadoop-worker-1 |
| 163 | PING hadoop-worker-1 (172.16.1.11) 56(84) bytes of data. |
| 164 | 64 bytes from hadoop-worker-1 (172.16.1.11): icmp_req=1 ttl=64 time=0.852 ms |
| 165 | 64 bytes from hadoop-worker-1 (172.16.1.11): icmp_req=2 ttl=64 time=0.468 ms |
| 166 | 64 bytes from hadoop-worker-1 (172.16.1.11): icmp_req=3 ttl=64 time=0.502 ms |
| 167 | ^C |
| 168 | --- hadoop-worker-1 ping statistics --- |
| 169 | 3 packets transmitted, 3 received, 0% packet loss, time 1999ms |
| 170 | rtt min/avg/max/mdev = 0.468/0.607/0.852/0.174 ms |
| 171 | }}} |
| 172 | |
| 173 | == 3. Check the status of the Hadoop filesystem. == |
| 174 | |
| 175 | === A. Query for the status of the filesystem and its associated workers. === |
| 176 | |
| 177 | {{{ |
| 178 | # hadoop dfsadmin -report |
| 179 | Configured Capacity: 54958481408 (51.18 GB) |
| 180 | Present Capacity: 48681934878 (45.34 GB) |
| 181 | DFS Remaining: 48681885696 (45.34 GB) |
| 182 | DFS Used: 49182 (48.03 KB) |
| 183 | DFS Used%: 0% |
| 184 | Under replicated blocks: 1 |
| 185 | Blocks with corrupt replicas: 0 |
| 186 | Missing blocks: 0 |
| 187 | |
| 188 | ------------------------------------------------- |
| 189 | Datanodes available: 2 (2 total, 0 dead) |
| 190 | |
| 191 | Name: 172.16.1.11:50010 |
| 192 | Rack: /default/rack0 |
| 193 | Decommission Status : Normal |
| 194 | Configured Capacity: 27479240704 (25.59 GB) |
| 195 | DFS Used: 24591 (24.01 KB) |
| 196 | Non DFS Used: 3137957873 (2.92 GB) |
| 197 | DFS Remaining: 24341258240(22.67 GB) |
| 198 | DFS Used%: 0% |
| 199 | DFS Remaining%: 88.58% |
| 200 | Last contact: Sat Jan 04 21:49:32 UTC 2014 |
| 201 | |
| 202 | |
| 203 | Name: 172.16.1.10:50010 |
| 204 | Rack: /default/rack0 |
| 205 | Decommission Status : Normal |
| 206 | Configured Capacity: 27479240704 (25.59 GB) |
| 207 | DFS Used: 24591 (24.01 KB) |
| 208 | Non DFS Used: 3138588657 (2.92 GB) |
| 209 | DFS Remaining: 24340627456(22.67 GB) |
| 210 | DFS Used%: 0% |
| 211 | DFS Remaining%: 88.58% |
| 212 | Last contact: Sat Jan 04 21:49:33 UTC 2014 |
| 213 | }}} |
| 214 | |
| 215 | |
| 216 | |
| 217 | == 4. Test the filesystem with a small file == |
| 218 | |
| 219 | |
| 220 | === A. Create a small test file === |
| 221 | {{{ |
| 222 | # echo Hello GENI World > hello.txt |
| 223 | }}} |
| 224 | |
| 225 | === B. Push the file into the Hadoop filesystem === |
| 226 | {{{ |
| 227 | # hadoop fs -put hello.txt hello.txt |
| 228 | }}} |
| 229 | |
| 230 | === C. Check for the file's existence === |
| 231 | {{{ |
| 232 | # hadoop fs -ls |
| 233 | Found 1 items |
| 234 | -rw-r--r-- 3 root supergroup 12 2014-01-04 21:59 /user/root/hello.txt |
| 235 | }}} |
| 236 | |
| 237 | === D. Check the contents of the file === |
| 238 | {{{ |
| 239 | # hadoop fs -cat hello.txt |
| 240 | Hello GENI World |
| 241 | }}} |
| 242 | |
| 243 | == 4. Run the Hadoop Sort Testcase == |
| 244 | |
| 245 | Test the true power of the Hadoop filesystem by creating and sorting a large random dataset. It may be useful/interesting to login to the master and/or worker VMs and use tools like top, iotop, and iftop to observe the resource utilization on each of the VMs during the sort test. Note: on these VMs iotop and iftop must be run as root. |
| 246 | |
| 247 | === A. Create a 1 GB random data set. === |
| 248 | |
| 249 | After the data is created, use the ls functionally to confirm the data exists. Note that the data is composed of several files in a directory. |
| 250 | |
| 251 | {{{ |
| 252 | # hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar teragen 10000000 random.data.1G |
| 253 | Generating 10000000 using 2 maps with step of 5000000 |
| 254 | 14/01/05 18:47:58 INFO mapred.JobClient: Running job: job_201401051828_0003 |
| 255 | 14/01/05 18:47:59 INFO mapred.JobClient: map 0% reduce 0% |
| 256 | 14/01/05 18:48:14 INFO mapred.JobClient: map 35% reduce 0% |
| 257 | 14/01/05 18:48:17 INFO mapred.JobClient: map 57% reduce 0% |
| 258 | 14/01/05 18:48:20 INFO mapred.JobClient: map 80% reduce 0% |
| 259 | 14/01/05 18:48:26 INFO mapred.JobClient: map 100% reduce 0% |
| 260 | 14/01/05 18:48:28 INFO mapred.JobClient: Job complete: job_201401051828_0003 |
| 261 | 14/01/05 18:48:28 INFO mapred.JobClient: Counters: 6 |
| 262 | 14/01/05 18:48:28 INFO mapred.JobClient: Job Counters |
| 263 | 14/01/05 18:48:28 INFO mapred.JobClient: Launched map tasks=2 |
| 264 | 14/01/05 18:48:28 INFO mapred.JobClient: FileSystemCounters |
| 265 | 14/01/05 18:48:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1000000000 |
| 266 | 14/01/05 18:48:28 INFO mapred.JobClient: Map-Reduce Framework |
| 267 | 14/01/05 18:48:28 INFO mapred.JobClient: Map input records=10000000 |
| 268 | 14/01/05 18:48:28 INFO mapred.JobClient: Spilled Records=0 |
| 269 | 14/01/05 18:48:28 INFO mapred.JobClient: Map input bytes=10000000 |
| 270 | 14/01/05 18:48:28 INFO mapred.JobClient: Map output records=10000000 |
| 271 | }}} |
| 272 | |
| 273 | === B. Sort the dataset. === |
| 274 | |
| 275 | {{{ |
| 276 | # hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar terasort random.data.1G sorted.data.1G |
| 277 | 14/01/05 18:50:49 INFO terasort.TeraSort: starting |
| 278 | 14/01/05 18:50:49 INFO mapred.FileInputFormat: Total input paths to process : 2 |
| 279 | 14/01/05 18:50:50 INFO util.NativeCodeLoader: Loaded the native-hadoop library |
| 280 | 14/01/05 18:50:50 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library |
| 281 | 14/01/05 18:50:50 INFO compress.CodecPool: Got brand-new compressor |
| 282 | Making 1 from 100000 records |
| 283 | Step size is 100000.0 |
| 284 | 14/01/05 18:50:50 INFO mapred.JobClient: Running job: job_201401051828_0004 |
| 285 | 14/01/05 18:50:51 INFO mapred.JobClient: map 0% reduce 0% |
| 286 | 14/01/05 18:51:05 INFO mapred.JobClient: map 6% reduce 0% |
| 287 | 14/01/05 18:51:08 INFO mapred.JobClient: map 20% reduce 0% |
| 288 | 14/01/05 18:51:11 INFO mapred.JobClient: map 33% reduce 0% |
| 289 | 14/01/05 18:51:14 INFO mapred.JobClient: map 37% reduce 0% |
| 290 | 14/01/05 18:51:29 INFO mapred.JobClient: map 55% reduce 0% |
| 291 | 14/01/05 18:51:32 INFO mapred.JobClient: map 65% reduce 6% |
| 292 | 14/01/05 18:51:35 INFO mapred.JobClient: map 71% reduce 6% |
| 293 | 14/01/05 18:51:38 INFO mapred.JobClient: map 72% reduce 8% |
| 294 | 14/01/05 18:51:44 INFO mapred.JobClient: map 74% reduce 8% |
| 295 | 14/01/05 18:51:47 INFO mapred.JobClient: map 74% reduce 10% |
| 296 | 14/01/05 18:51:50 INFO mapred.JobClient: map 87% reduce 12% |
| 297 | 14/01/05 18:51:53 INFO mapred.JobClient: map 92% reduce 12% |
| 298 | 14/01/05 18:51:56 INFO mapred.JobClient: map 93% reduce 12% |
| 299 | 14/01/05 18:52:02 INFO mapred.JobClient: map 100% reduce 14% |
| 300 | 14/01/05 18:52:05 INFO mapred.JobClient: map 100% reduce 22% |
| 301 | 14/01/05 18:52:08 INFO mapred.JobClient: map 100% reduce 29% |
| 302 | 14/01/05 18:52:14 INFO mapred.JobClient: map 100% reduce 33% |
| 303 | 14/01/05 18:52:23 INFO mapred.JobClient: map 100% reduce 67% |
| 304 | 14/01/05 18:52:26 INFO mapred.JobClient: map 100% reduce 70% |
| 305 | 14/01/05 18:52:29 INFO mapred.JobClient: map 100% reduce 75% |
| 306 | 14/01/05 18:52:32 INFO mapred.JobClient: map 100% reduce 80% |
| 307 | 14/01/05 18:52:35 INFO mapred.JobClient: map 100% reduce 85% |
| 308 | 14/01/05 18:52:38 INFO mapred.JobClient: map 100% reduce 90% |
| 309 | 14/01/05 18:52:46 INFO mapred.JobClient: map 100% reduce 100% |
| 310 | 14/01/05 18:52:48 INFO mapred.JobClient: Job complete: job_201401051828_0004 |
| 311 | 14/01/05 18:52:48 INFO mapred.JobClient: Counters: 18 |
| 312 | 14/01/05 18:52:48 INFO mapred.JobClient: Job Counters |
| 313 | 14/01/05 18:52:48 INFO mapred.JobClient: Launched reduce tasks=1 |
| 314 | 14/01/05 18:52:48 INFO mapred.JobClient: Launched map tasks=16 |
| 315 | 14/01/05 18:52:48 INFO mapred.JobClient: Data-local map tasks=16 |
| 316 | 14/01/05 18:52:48 INFO mapred.JobClient: FileSystemCounters |
| 317 | 14/01/05 18:52:48 INFO mapred.JobClient: FILE_BYTES_READ=2382257412 |
| 318 | 14/01/05 18:52:48 INFO mapred.JobClient: HDFS_BYTES_READ=1000057358 |
| 319 | 14/01/05 18:52:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3402255956 |
| 320 | 14/01/05 18:52:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1000000000 |
| 321 | 14/01/05 18:52:48 INFO mapred.JobClient: Map-Reduce Framework |
| 322 | 14/01/05 18:52:48 INFO mapred.JobClient: Reduce input groups=10000000 |
| 323 | 14/01/05 18:52:48 INFO mapred.JobClient: Combine output records=0 |
| 324 | 14/01/05 18:52:48 INFO mapred.JobClient: Map input records=10000000 |
| 325 | 14/01/05 18:52:48 INFO mapred.JobClient: Reduce shuffle bytes=951549012 |
| 326 | 14/01/05 18:52:48 INFO mapred.JobClient: Reduce output records=10000000 |
| 327 | 14/01/05 18:52:48 INFO mapred.JobClient: Spilled Records=33355441 |
| 328 | 14/01/05 18:52:48 INFO mapred.JobClient: Map output bytes=1000000000 |
| 329 | 14/01/05 18:52:48 INFO mapred.JobClient: Map input bytes=1000000000 |
| 330 | 14/01/05 18:52:48 INFO mapred.JobClient: Combine input records=0 |
| 331 | 14/01/05 18:52:48 INFO mapred.JobClient: Map output records=10000000 |
| 332 | 14/01/05 18:52:48 INFO mapred.JobClient: Reduce input records=10000000 |
| 333 | 14/01/05 18:52:48 INFO terasort.TeraSort: done |
| 334 | }}} |
| 335 | |
| 336 | ==== C. Look at the output. ==== |
| 337 | |
| 338 | You can use Hadoop's cat and/or get functionally to look at the random and sorted files to confirm their size and that the sort actually worked. |
| 339 | |
| 340 | Try some or all of these commands. Does the output make sense to you? |
| 341 | |
| 342 | {{{ |
| 343 | hadoop fs -ls random.data.1G |
| 344 | hadoop fs -ls sorted.data.1G |
| 345 | hadoop fs -cat random.data.1G/part-00000 | less |
| 346 | hadoop fs -cat sorted.data.1G/part-00000 | less |
| 347 | }}} |
| 348 | |
| 349 | == 5. Advanced Example == |
| 350 | |
| 351 | Re-do the tutorial with a different number of workers, amount of bandwidth, and/or worker instance types. Warning: be courteous to other users and do not use too many of the resources. |
| 352 | |
| 353 | === A. Time the performance of runs with different resources. === |
| 354 | === B. Observe largest size file you can create with different resources. === |
| 355 | |
| 356 | |
| 357 | ---- |
| 358 | = [wiki:GENIExperimenter/Tutorials/HadoopInASlice Introduction] = |
| 359 | = [wiki:GENIExperimenter/Tutorials/HadoopInASlice/TeardownExperiment Next: Teardown Experiment] = |