30 | | <li>Open a new terminal window. Type the login command for <i>Utah InstaGENI</i> into that terminal window. You have now logged into your VM.</li> |
31 | | <li><FONT COLOR="black">Repeat the previous step for <i>GPO InstaGENI</i> in a second terminal window.</font></li> |
32 | | |
33 | | |
34 | | <table id="Table_03" border="0" cellpadding="5" cellspacing="0"> |
35 | | <tr> |
36 | | <td> |
37 | | <img src="http://groups.geni.net/geni/attachment/wiki/GENIExperimenter/Tutorials/Graphics/Symbols-Tips-icon.png?format=raw" width="50" height="50" alt="Tip"> |
38 | | </td> |
39 | | <td> |
40 | | To find the login information again, go to the Slice page and press the <b>Details</b> button in the appropriate row of the slice table. |
41 | | </tr> |
42 | | </table> |
43 | | <table id="Table_01" border="0" cellpadding="5" cellspacing="0"> |
44 | | <tr> |
45 | | <td> |
46 | | <img src="http://groups.geni.net/geni/attachment/wiki/GENIExperimenter/Tutorials/Graphics/warning-icon-hi.png?format=raw" width="50" height="50" alt="Warning"> |
47 | | </td> |
48 | | <td> |
49 | | Use this icon to identify warnings or important notes. |
50 | | </td> |
51 | | </tr> |
52 | | </table> |
53 | | |
54 | | |
55 | | <li>(Optionally) If your neighbor added you to their slice, login to your neighbor's slice. You will find your login information on the slice page for your neighbor's slice.</li> |
56 | | </ol> |
57 | | </td> |
58 | | <td> |
59 | | <img border="0" src="http://groups.geni.net/geni/attachment/wiki/GENIExperimenter/Tutorials/GREESC13/PortalSimpleLayer2Example/Graphics/log_in_v1.png?format=raw" alt="Login information for a VM" height="200" title="Login information for a VM" /> |
60 | | <br /> |
61 | | <b>Figure 9-1</b> <i>The </i>Details<i> page at </i>Utah InstaGENI</i>.</i> |
62 | | </td> |
63 | | </tr> |
| 30 | <li>Login (ssh) to the hadoop-master using a yourself using the key you associated with the |
| 31 | GENI Portal and the IP address displayed by Flack. The ssh application you use will |
| 32 | depend on the configuration of laptop/desktop that you are using.</li> |
| 33 | <li>Check the status/properties of the VMs.</li> |
| 34 | |
| 35 | <ol type="a"> |
| 36 | <li> Observe the properties of the network interfaces </li> |
| 37 | |
| 38 | <pre><code> |
| 39 | # /sbin/ifconfig |
| 40 | eth0 Link encap:Ethernet HWaddr fa:16:3e:72:ad:a6 |
| 41 | inet addr:10.103.0.20 Bcast:10.103.0.255 Mask:255.255.255.0 |
| 42 | inet6 addr: fe80::f816:3eff:fe72:ada6/64 Scope:Link |
| 43 | UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 |
| 44 | RX packets:1982 errors:0 dropped:0 overruns:0 frame:0 |
| 45 | TX packets:1246 errors:0 dropped:0 overruns:0 carrier:0 |
| 46 | collisions:0 txqueuelen:1000 |
| 47 | RX bytes:301066 (294.0 KiB) TX bytes:140433 (137.1 KiB) |
| 48 | Interrupt:11 Base address:0x2000 |
| 49 | |
| 50 | eth1 Link encap:Ethernet HWaddr fe:16:3e:00:6d:af |
| 51 | inet addr:172.16.1.1 Bcast:172.16.1.255 Mask:255.255.255.0 |
| 52 | inet6 addr: fe80::fc16:3eff:fe00:6daf/64 Scope:Link |
| 53 | UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 |
| 54 | RX packets:21704 errors:0 dropped:0 overruns:0 frame:0 |
| 55 | TX packets:4562 errors:0 dropped:0 overruns:0 carrier:0 |
| 56 | collisions:0 txqueuelen:1000 |
| 57 | RX bytes:3100262 (2.9 MiB) TX bytes:824572 (805.2 KiB) |
| 58 | |
| 59 | lo Link encap:Local Loopback |
| 60 | inet addr:127.0.0.1 Mask:255.0.0.0 |
| 61 | inet6 addr: ::1/128 Scope:Host |
| 62 | UP LOOPBACK RUNNING MTU:16436 Metric:1 |
| 63 | RX packets:19394 errors:0 dropped:0 overruns:0 frame:0 |
| 64 | TX packets:19394 errors:0 dropped:0 overruns:0 carrier:0 |
| 65 | collisions:0 txqueuelen:0 |
| 66 | RX bytes:4010954 (3.8 MiB) TX bytes:4010954 (3.8 MiB) |
| 67 | </code></pre> |
| 68 | |
| 69 | <li> Observe the contents of the NEuca user data file. This file includes a script that will install and execute the script that you configured for the VM. </li> |
| 70 | <pre><code> |
| 71 | # neuca-user-data |
| 72 | [global] |
| 73 | actor_id=67C4EFB4-7CBF-48C9-8195-934FF81434DC |
| 74 | slice_id=39672f6e-610a-4d86-8810-30e02d20cc99 |
| 75 | reservation_id=55676541-5221-483d-bb60-429de025f275 |
| 76 | unit_id=902709a4-32f2-41fc-b85c-b4791c779580 |
| 77 | ;router= Not Specified |
| 78 | ;iscsi_initiator_iqn= Not Specified |
| 79 | slice_name=urn:publicid:IDN+ch.geni.net:ADAMANT+slice+pruth-winter-camp |
| 80 | unit_url=http://geni-orca.renci.org/owl/8210b4d7-4afc-4838-801f-c20a8f1f75ae#hadoop-master |
| 81 | host_name=hadoop-master |
| 82 | [interfaces] |
| 83 | fe163e006daf=up:ipv4:172.16.1.1/24 |
| 84 | [storage] |
| 85 | [routes] |
| 86 | [scripts] |
| 87 | bootscript=#!/bin/bash |
| 88 | # Automatically generated boot script |
| 89 | # wget or curl must be installed on the image |
| 90 | mkdir -p /tmp |
| 91 | cd /tmp |
| 92 | if [ -x `which wget 2>/dev/null` ]; then |
| 93 | wget -q -O `basename http://geni-images.renci.org/images/GENIWinterCamp/master.sh` http://geni-images.renci.org/images/GENIWinterCamp/master.sh |
| 94 | else if [ -x `which curl 2>/dev/null` ]; then |
| 95 | curl http://geni-images.renci.org/images/GENIWinterCamp/master.sh > `basename http://geni-images.renci.org/images/GENIWinterCamp/master.sh` |
| 96 | fi |
| 97 | fi |
| 98 | eval "/bin/sh -c \"chmod +x /tmp/master.sh; /tmp/master.sh\"" |
| 99 | </code></pre> |
| 100 | |
| 101 | |
| 102 | <li> Observe the contents of the of the script that was installed and executed on the VM. </li> |
| 103 | <pre><code> |
| 104 | # cat /tmp/master.sh |
| 105 | #!/bin/bash |
| 106 | |
| 107 | echo "Hello from neuca script" > /home/hadoop/log |
| 108 | MY_HOSTNAME=hadoop-master |
| 109 | hostname $MY_HOSTNAME |
| 110 | echo 172.16.1.1 hadoop-master >> /etc/hosts |
| 111 | echo 172.16.1.10 hadoop-worker-0 >> /etc/hosts |
| 112 | echo 172.16.1.11 hadoop-worker-1 >> /etc/hosts |
| 113 | echo 172.16.1.12 hadoop-worker-2 >> /etc/hosts |
| 114 | echo 172.16.1.13 hadoop-worker-3 >> /etc/hosts |
| 115 | echo 172.16.1.14 hadoop-worker-4 >> /etc/hosts |
| 116 | echo 172.16.1.15 hadoop-worker-5 >> /etc/hosts |
| 117 | echo 172.16.1.16 hadoop-worker-6 >> /etc/hosts |
| 118 | echo 172.16.1.17 hadoop-worker-7 >> /etc/hosts |
| 119 | echo 172.16.1.18 hadoop-worker-8 >> /etc/hosts |
| 120 | echo 172.16.1.19 hadoop-worker-9 >> /etc/hosts |
| 121 | echo 172.16.1.20 hadoop-worker-10 >> /etc/hosts |
| 122 | echo 172.16.1.21 hadoop-worker-11 >> /etc/hosts |
| 123 | echo 172.16.1.22 hadoop-worker-12 >> /etc/hosts |
| 124 | echo 172.16.1.23 hadoop-worker-13 >> /etc/hosts |
| 125 | echo 172.16.1.24 hadoop-worker-14 >> /etc/hosts |
| 126 | echo 172.16.1.25 hadoop-worker-15 >> /etc/hosts |
| 127 | while true; do |
| 128 | PING=`ping -c 1 172.16.1.1 > /dev/null 2>&1` |
| 129 | if [ "$?" = "0" ]; then |
| 130 | break |
| 131 | fi |
| 132 | sleep 5 |
| 133 | done |
| 134 | echo '/home/hadoop/hadoop-euca-init.sh 172.16.1.1 -master' >> /home/hadoop/log |
| 135 | /home/hadoop/hadoop-euca-init.sh 172.16.1.1 -master |
| 136 | echo "Done starting daemons" >> /home/hadoop/log |
| 137 | </code></pre> |
| 138 | |
| 139 | |
| 140 | <li>Test for connectivity between the VMs.</li> |
| 141 | <pre><code> |
| 142 | # ping hadoop-worker-0 |
| 143 | PING hadoop-worker-0 (172.16.1.10) 56(84) bytes of data. |
| 144 | 64 bytes from hadoop-worker-0 (172.16.1.10): icmp_req=1 ttl=64 time=0.747 ms |
| 145 | 64 bytes from hadoop-worker-0 (172.16.1.10): icmp_req=2 ttl=64 time=0.459 ms |
| 146 | 64 bytes from hadoop-worker-0 (172.16.1.10): icmp_req=3 ttl=64 time=0.411 ms |
| 147 | ^C |
| 148 | --- hadoop-worker-0 ping statistics --- |
| 149 | 3 packets transmitted, 3 received, 0% packet loss, time 1998ms |
| 150 | rtt min/avg/max/mdev = 0.411/0.539/0.747/0.148 ms |
| 151 | # ping hadoop-worker-1 |
| 152 | PING hadoop-worker-1 (172.16.1.11) 56(84) bytes of data. |
| 153 | 64 bytes from hadoop-worker-1 (172.16.1.11): icmp_req=1 ttl=64 time=0.852 ms |
| 154 | 64 bytes from hadoop-worker-1 (172.16.1.11): icmp_req=2 ttl=64 time=0.468 ms |
| 155 | 64 bytes from hadoop-worker-1 (172.16.1.11): icmp_req=3 ttl=64 time=0.502 ms |
| 156 | ^C |
| 157 | --- hadoop-worker-1 ping statistics --- |
| 158 | 3 packets transmitted, 3 received, 0% packet loss, time 1999ms |
| 159 | rtt min/avg/max/mdev = 0.468/0.607/0.852/0.174 ms |
| 160 | </code></pre> |
| 161 | |
| 162 | |
| 163 | <li> Check the status of the Hadoop filesystem</li> |
| 164 | <pre><code> |
| 165 | # hadoop dfsadmin -report |
| 166 | Configured Capacity: 54958481408 (51.18 GB) |
| 167 | Present Capacity: 48681934878 (45.34 GB) |
| 168 | DFS Remaining: 48681885696 (45.34 GB) |
| 169 | DFS Used: 49182 (48.03 KB) |
| 170 | DFS Used%: 0% |
| 171 | Under replicated blocks: 1 |
| 172 | Blocks with corrupt replicas: 0 |
| 173 | Missing blocks: 0 |
| 174 | |
| 175 | ------------------------------------------------- |
| 176 | Datanodes available: 2 (2 total, 0 dead) |
| 177 | |
| 178 | Name: 172.16.1.11:50010 |
| 179 | Rack: /default/rack0 |
| 180 | Decommission Status : Normal |
| 181 | Configured Capacity: 27479240704 (25.59 GB) |
| 182 | DFS Used: 24591 (24.01 KB) |
| 183 | Non DFS Used: 3137957873 (2.92 GB) |
| 184 | DFS Remaining: 24341258240(22.67 GB) |
| 185 | DFS Used%: 0% |
| 186 | DFS Remaining%: 88.58% |
| 187 | Last contact: Sat Jan 04 21:49:32 UTC 2014 |
| 188 | |
| 189 | |
| 190 | Name: 172.16.1.10:50010 |
| 191 | Rack: /default/rack0 |
| 192 | Decommission Status : Normal |
| 193 | Configured Capacity: 27479240704 (25.59 GB) |
| 194 | DFS Used: 24591 (24.01 KB) |
| 195 | Non DFS Used: 3138588657 (2.92 GB) |
| 196 | DFS Remaining: 24340627456(22.67 GB) |
| 197 | DFS Used%: 0% |
| 198 | DFS Remaining%: 88.58% |
| 199 | Last contact: Sat Jan 04 21:49:33 UTC 2014 |
| 200 | </code></pre> |
| 201 | |
| 202 | |
| 203 | <li> Test the filesystem with a small file </li> |
| 204 | |
| 205 | <ol type="a"> |
| 206 | <li> Create a small test file </li> |
| 207 | <pre><code> |
| 208 | # hadoop fs -put hello.txt hello.txt |
| 209 | </code></pre> |
| 210 | |
| 211 | <li> Push the file into the Hadoop filesystem</li> |
| 212 | <pre><code> |
| 213 | # hadoop fs -put hello.txt hello.txt |
| 214 | </code></pre> |
| 215 | |
| 216 | <li> Check for the file's existence </li> |
| 217 | <pre><code> |
| 218 | # hadoop fs -ls |
| 219 | Found 1 items |
| 220 | -rw-r--r-- 3 root supergroup 12 2014-01-04 21:59 /user/root/hello.txt |
| 221 | </code></pre> |
| 222 | |
| 223 | <li> Check the contents of the file </li> |
| 224 | <pre><code> |
| 225 | # hadoop fs -cat hello.txt |
| 226 | Hello GENI World |
| 227 | </code></pre> |
| 228 | |
| 229 | </ol> |
| 230 | |
| 231 | |
| 232 | <li> Test the true power of the Hadoop filesystem by creating and sorting a large random dataset. It may be useful/interesting to login to the master and/or worker VMs and use tools like \verb$top$, \verb$iotop$, and \verb$iftop$ to observe the resource utilization on each of the VMs during the sort test. </li> |
| 233 | |
| 234 | |
| 235 | <ol type="a"> |
| 236 | <li> Create a 1 GB random data set. After the data is created, use the \verb$ls$ functionally to confirm the data exists. Note that the data is composed of several files in a directory. </li> |
| 237 | <pre><code> |
| 238 | # hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar teragen 10000000 random.data.1G |
| 239 | Generating 10000000 using 2 maps with step of 5000000 |
| 240 | 14/01/05 18:47:58 INFO mapred.JobClient: Running job: job_201401051828_0003 |
| 241 | 14/01/05 18:47:59 INFO mapred.JobClient: map 0% reduce 0% |
| 242 | 14/01/05 18:48:14 INFO mapred.JobClient: map 35% reduce 0% |
| 243 | 14/01/05 18:48:17 INFO mapred.JobClient: map 57% reduce 0% |
| 244 | 14/01/05 18:48:20 INFO mapred.JobClient: map 80% reduce 0% |
| 245 | 14/01/05 18:48:26 INFO mapred.JobClient: map 100% reduce 0% |
| 246 | 14/01/05 18:48:28 INFO mapred.JobClient: Job complete: job_201401051828_0003 |
| 247 | 14/01/05 18:48:28 INFO mapred.JobClient: Counters: 6 |
| 248 | 14/01/05 18:48:28 INFO mapred.JobClient: Job Counters |
| 249 | 14/01/05 18:48:28 INFO mapred.JobClient: Launched map tasks=2 |
| 250 | 14/01/05 18:48:28 INFO mapred.JobClient: FileSystemCounters |
| 251 | 14/01/05 18:48:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1000000000 |
| 252 | 14/01/05 18:48:28 INFO mapred.JobClient: Map-Reduce Framework |
| 253 | 14/01/05 18:48:28 INFO mapred.JobClient: Map input records=10000000 |
| 254 | 14/01/05 18:48:28 INFO mapred.JobClient: Spilled Records=0 |
| 255 | 14/01/05 18:48:28 INFO mapred.JobClient: Map input bytes=10000000 |
| 256 | 14/01/05 18:48:28 INFO mapred.JobClient: Map output records=10000000 |
| 257 | </code></pre> |
| 258 | |
| 259 | <li> Sort the datasets. On your own, you can use the \verb$cat$ and/or \verb$get$ functionally to look at the random and sorted files to confirm their size and that the sort actually worked. |
| 260 | </li> |
| 261 | <pre><code> |
| 262 | # hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar terasort random.data.1G sorted.data.1G |
| 263 | 14/01/05 18:50:49 INFO terasort.TeraSort: starting |
| 264 | 14/01/05 18:50:49 INFO mapred.FileInputFormat: Total input paths to process : 2 |
| 265 | 14/01/05 18:50:50 INFO util.NativeCodeLoader: Loaded the native-hadoop library |
| 266 | 14/01/05 18:50:50 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library |
| 267 | 14/01/05 18:50:50 INFO compress.CodecPool: Got brand-new compressor |
| 268 | Making 1 from 100000 records |
| 269 | Step size is 100000.0 |
| 270 | 14/01/05 18:50:50 INFO mapred.JobClient: Running job: job_201401051828_0004 |
| 271 | 14/01/05 18:50:51 INFO mapred.JobClient: map 0% reduce 0% |
| 272 | 14/01/05 18:51:05 INFO mapred.JobClient: map 6% reduce 0% |
| 273 | 14/01/05 18:51:08 INFO mapred.JobClient: map 20% reduce 0% |
| 274 | 14/01/05 18:51:11 INFO mapred.JobClient: map 33% reduce 0% |
| 275 | 14/01/05 18:51:14 INFO mapred.JobClient: map 37% reduce 0% |
| 276 | 14/01/05 18:51:29 INFO mapred.JobClient: map 55% reduce 0% |
| 277 | 14/01/05 18:51:32 INFO mapred.JobClient: map 65% reduce 6% |
| 278 | 14/01/05 18:51:35 INFO mapred.JobClient: map 71% reduce 6% |
| 279 | 14/01/05 18:51:38 INFO mapred.JobClient: map 72% reduce 8% |
| 280 | 14/01/05 18:51:44 INFO mapred.JobClient: map 74% reduce 8% |
| 281 | 14/01/05 18:51:47 INFO mapred.JobClient: map 74% reduce 10% |
| 282 | 14/01/05 18:51:50 INFO mapred.JobClient: map 87% reduce 12% |
| 283 | 14/01/05 18:51:53 INFO mapred.JobClient: map 92% reduce 12% |
| 284 | 14/01/05 18:51:56 INFO mapred.JobClient: map 93% reduce 12% |
| 285 | 14/01/05 18:52:02 INFO mapred.JobClient: map 100% reduce 14% |
| 286 | 14/01/05 18:52:05 INFO mapred.JobClient: map 100% reduce 22% |
| 287 | 14/01/05 18:52:08 INFO mapred.JobClient: map 100% reduce 29% |
| 288 | 14/01/05 18:52:14 INFO mapred.JobClient: map 100% reduce 33% |
| 289 | 14/01/05 18:52:23 INFO mapred.JobClient: map 100% reduce 67% |
| 290 | 14/01/05 18:52:26 INFO mapred.JobClient: map 100% reduce 70% |
| 291 | 14/01/05 18:52:29 INFO mapred.JobClient: map 100% reduce 75% |
| 292 | 14/01/05 18:52:32 INFO mapred.JobClient: map 100% reduce 80% |
| 293 | 14/01/05 18:52:35 INFO mapred.JobClient: map 100% reduce 85% |
| 294 | 14/01/05 18:52:38 INFO mapred.JobClient: map 100% reduce 90% |
| 295 | 14/01/05 18:52:46 INFO mapred.JobClient: map 100% reduce 100% |
| 296 | 14/01/05 18:52:48 INFO mapred.JobClient: Job complete: job_201401051828_0004 |
| 297 | 14/01/05 18:52:48 INFO mapred.JobClient: Counters: 18 |
| 298 | 14/01/05 18:52:48 INFO mapred.JobClient: Job Counters |
| 299 | 14/01/05 18:52:48 INFO mapred.JobClient: Launched reduce tasks=1 |
| 300 | 14/01/05 18:52:48 INFO mapred.JobClient: Launched map tasks=16 |
| 301 | 14/01/05 18:52:48 INFO mapred.JobClient: Data-local map tasks=16 |
| 302 | 14/01/05 18:52:48 INFO mapred.JobClient: FileSystemCounters |
| 303 | 14/01/05 18:52:48 INFO mapred.JobClient: FILE_BYTES_READ=2382257412 |
| 304 | 14/01/05 18:52:48 INFO mapred.JobClient: HDFS_BYTES_READ=1000057358 |
| 305 | 14/01/05 18:52:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3402255956 |
| 306 | 14/01/05 18:52:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1000000000 |
| 307 | 14/01/05 18:52:48 INFO mapred.JobClient: Map-Reduce Framework |
| 308 | 14/01/05 18:52:48 INFO mapred.JobClient: Reduce input groups=10000000 |
| 309 | 14/01/05 18:52:48 INFO mapred.JobClient: Combine output records=0 |
| 310 | 14/01/05 18:52:48 INFO mapred.JobClient: Map input records=10000000 |
| 311 | 14/01/05 18:52:48 INFO mapred.JobClient: Reduce shuffle bytes=951549012 |
| 312 | 14/01/05 18:52:48 INFO mapred.JobClient: Reduce output records=10000000 |
| 313 | 14/01/05 18:52:48 INFO mapred.JobClient: Spilled Records=33355441 |
| 314 | 14/01/05 18:52:48 INFO mapred.JobClient: Map output bytes=1000000000 |
| 315 | 14/01/05 18:52:48 INFO mapred.JobClient: Map input bytes=1000000000 |
| 316 | 14/01/05 18:52:48 INFO mapred.JobClient: Combine input records=0 |
| 317 | 14/01/05 18:52:48 INFO mapred.JobClient: Map output records=10000000 |
| 318 | 14/01/05 18:52:48 INFO mapred.JobClient: Reduce input records=10000000 |
| 319 | 14/01/05 18:52:48 INFO terasort.TeraSort: done |
| 320 | </code></pre> |
| 321 | |
| 322 | </ol> |
| 323 | |
| 324 | <li>Re-do tutorial with a different number of workers, amount of bandwidth, and/or worker instance types. Warning: Be courteous to other users and do not take all the resources. </li> |
| 325 | |
| 326 | <ol type="a"> |
| 327 | <li> Time the performance of runs with different resources </li> |
| 328 | <li> Observe largest size file you can create with different settings. </li> |
| 329 | </ol> |
| 330 | |
| 331 | |
| 332 | </ol> |
| 333 | |
| 334 | |
| 335 | |
| 336 | </ol> |
| 337 | </td> |
| 338 | </tr> |
| 339 | |
| 340 | |