Changes between Version 5 and Version 6 of GENIExperimenter/Tutorials/jacks/HadoopInASlice/ExecuteExperiment
- Timestamp:
- 09/17/15 08:13:15 (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GENIExperimenter/Tutorials/jacks/HadoopInASlice/ExecuteExperiment
v5 v6 153 153 start-yarn.sh 154 154 }}} 155 156 == 3. Check the status of the Hadoop filesystem. == 157 158 === A. Query for the status of the filesystem and its associated workers. === 159 160 {{{ 161 # hadoop dfsadmin -report 162 Configured Capacity: 54958481408 (51.18 GB) 163 Present Capacity: 48681934878 (45.34 GB) 164 DFS Remaining: 48681885696 (45.34 GB) 165 DFS Used: 49182 (48.03 KB) 166 DFS Used%: 0% 167 Under replicated blocks: 1 155 a. Check the status of the Hadoop filesystem. == 156 {{{ 157 # hdfs dfsadmin -report 158 Configured Capacity: 54824083456 (51.06 GB) 159 Present Capacity: 48522035200 (45.19 GB) 160 DFS Remaining: 48521986048 (45.19 GB) 161 DFS Used: 49152 (48 KB) 162 DFS Used%: 0.00% 163 Under replicated blocks: 0 168 164 Blocks with corrupt replicas: 0 169 165 Missing blocks: 0 166 Missing blocks (with replication factor 1): 0 170 167 171 168 ------------------------------------------------- 172 Datanodes available: 2 (2 total, 0 dead) 173 174 Name: 172.16.1.1 1:50010175 Rack: /default/rack0169 Live datanodes (2): 170 171 Name: 172.16.1.10:50010 (worker-0) 172 Hostname: worker-0 176 173 Decommission Status : Normal 177 Configured Capacity: 27479240704 (25.59 GB) 178 DFS Used: 24591 (24.01 KB) 179 Non DFS Used: 3137957873 (2.92 GB) 180 DFS Remaining: 24341258240(22.67 GB) 181 DFS Used%: 0% 182 DFS Remaining%: 88.58% 183 Last contact: Sat Jan 04 21:49:32 UTC 2014 184 185 186 Name: 172.16.1.10:50010 187 Rack: /default/rack0 174 Configured Capacity: 27412041728 (25.53 GB) 175 DFS Used: 24576 (24 KB) 176 Non DFS Used: 3151020032 (2.93 GB) 177 DFS Remaining: 24260997120 (22.59 GB) 178 DFS Used%: 0.00% 179 DFS Remaining%: 88.50% 180 Configured Cache Capacity: 0 (0 B) 181 Cache Used: 0 (0 B) 182 Cache Remaining: 0 (0 B) 183 Cache Used%: 100.00% 184 Cache Remaining%: 0.00% 185 Xceivers: 1 186 Last contact: Thu Sep 17 12:04:32 UTC 2015 187 188 189 Name: 172.16.1.11:50010 (worker-1) 190 Hostname: worker-1 188 191 Decommission Status : Normal 189 Configured Capacity: 27479240704 (25.59 GB) 190 DFS Used: 24591 (24.01 KB) 191 Non DFS Used: 3138588657 (2.92 GB) 192 DFS Remaining: 24340627456(22.67 GB) 193 DFS Used%: 0% 194 DFS Remaining%: 88.58% 195 Last contact: Sat Jan 04 21:49:33 UTC 2014 196 }}} 197 198 199 200 == 4. Test the filesystem with a small file == 201 202 203 === A. Create a small test file === 204 {{{ 205 # echo Hello GENI World > hello.txt 206 }}} 207 208 === B. Push the file into the Hadoop filesystem === 209 {{{ 210 # hadoop fs -put hello.txt hello.txt 211 }}} 212 213 === C. Check for the file's existence === 214 {{{ 215 # hadoop fs -ls 192 Configured Capacity: 27412041728 (25.53 GB) 193 DFS Used: 24576 (24 KB) 194 Non DFS Used: 3151028224 (2.93 GB) 195 DFS Remaining: 24260988928 (22.59 GB) 196 DFS Used%: 0.00% 197 DFS Remaining%: 88.50% 198 Configured Cache Capacity: 0 (0 B) 199 Cache Used: 0 (0 B) 200 Cache Remaining: 0 (0 B) 201 Cache Used%: 100.00% 202 Cache Remaining%: 0.00% 203 Xceivers: 1 204 Last contact: Thu Sep 17 12:04:32 UTC 2015 205 }}} 206 207 208 == 2. Run the experiment == 209 210 === 2.1 Test the hadoop cluster with a small file === 211 a. Create a small test file 212 {{{ 213 # echo Hello GENI World > /tmp/hello.txt 214 }}} 215 a. Push the file into the Hadoop filesystem === 216 {{{ 217 # hdfs dfs -put /tmp/hello.txt /hello.txt 218 }}} 219 a. Check for the file's existence === 220 {{{ 221 # hdfs dfs -ls / 216 222 Found 1 items 217 -rw-r--r-- 3 root supergroup 12 2014-01-04 21:59 /user/root/hello.txt 218 }}} 219 220 === D. Check the contents of the file === 221 {{{ 222 # hadoop fs -cat hello.txt 223 -rw-r--r-- 2 hadoop supergroup 17 2015-09-17 12:09 /hello.txt 224 }}} 225 a. Check the contents of the file === 226 {{{ 227 # hdfs dfs -cat /hello.txt 223 228 Hello GENI World 224 229 }}} 225 230 226 == 4. Run the Hadoop Sort Testcase==231 === 2.2 Run the Hadoop Sort Testcase === 227 232 228 233 Test the true power of the Hadoop filesystem by creating and sorting a large random dataset. It may be useful/interesting to login to the master and/or worker VMs and use tools like top, iotop, and iftop to observe the resource utilization on each of the VMs during the sort test. Note: on these VMs iotop and iftop must be run as root. 229 234 230 === A. Create a 1 GB random data set. === 231 232 After the data is created, use the ls functionally to confirm the data exists. Note that the data is composed of several files in a directory. 233 234 {{{ 235 # hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar teragen 10000000 random.data.1G 236 Generating 10000000 using 2 maps with step of 5000000 235 a. Create a 1 GB random data set 236 {{{ 237 # hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar teragen 10000000 /input 237 238 14/01/05 18:47:58 INFO mapred.JobClient: Running job: job_201401051828_0003 238 239 14/01/05 18:47:59 INFO mapred.JobClient: map 0% reduce 0% … … 253 254 14/01/05 18:48:28 INFO mapred.JobClient: Map output records=10000000 254 255 }}} 255 256 === B. Sort the dataset. === 257 258 {{{ 259 # hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar terasort random.data.1G sorted.data.1G 256 After the data is created, use the ls functionally to confirm the data exists. Note that the data is composed of several files in a directory. 257 a. Sort the dataset: 258 {{{ 259 # hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar terasort /input /output 260 260 14/01/05 18:50:49 INFO terasort.TeraSort: starting 261 261 14/01/05 18:50:49 INFO mapred.FileInputFormat: Total input paths to process : 2 … … 316 316 14/01/05 18:52:48 INFO terasort.TeraSort: done 317 317 }}} 318 319 ==== C. Look at the output. ==== 320 321 You can use Hadoop's cat and/or get functionally to look at the random and sorted files to confirm their size and that the sort actually worked. 322 323 Try some or all of these commands. Does the output make sense to you? 324 325 {{{ 318 a. Look at the output: You can use Hadoop's cat and/or get functionally to look at the random and sorted files to confirm their size and that the sort actually worked. 319 Try some or all of these commands. Does the output make sense to you? 320 {{{ 326 321 hadoop fs -ls random.data.1G 327 322 hadoop fs -ls sorted.data.1G … … 330 325 }}} 331 326 332 == 5. Advanced Example ==327 == 3. Advanced Example == 333 328 334 329 Re-do the tutorial with a different number of workers, amount of bandwidth, and/or worker instance types. Warning: be courteous to other users and do not use too many of the resources.