Changes between Version 5 and Version 6 of GENIExperimenter/Tutorials/jacks/HadoopInASlice/ExecuteExperiment


Ignore:
Timestamp:
09/17/15 08:13:15 (9 years ago)
Author:
nriga@bbn.com
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GENIExperimenter/Tutorials/jacks/HadoopInASlice/ExecuteExperiment

    v5 v6  
    153153       start-yarn.sh
    154154}}}
    155 
    156 == 3. Check the status of the Hadoop filesystem. ==
    157 
    158 ===  A. Query for the status of the filesystem and its associated workers. ===
    159 
    160 {{{
    161 # hadoop dfsadmin -report
    162 Configured Capacity: 54958481408 (51.18 GB)
    163 Present Capacity: 48681934878 (45.34 GB)
    164 DFS Remaining: 48681885696 (45.34 GB)
    165 DFS Used: 49182 (48.03 KB)
    166 DFS Used%: 0%
    167 Under replicated blocks: 1
     155 a. Check the status of the Hadoop filesystem. ==
     156 {{{
     157# hdfs dfsadmin -report
     158Configured Capacity: 54824083456 (51.06 GB)
     159Present Capacity: 48522035200 (45.19 GB)
     160DFS Remaining: 48521986048 (45.19 GB)
     161DFS Used: 49152 (48 KB)
     162DFS Used%: 0.00%
     163Under replicated blocks: 0
    168164Blocks with corrupt replicas: 0
    169165Missing blocks: 0
     166Missing blocks (with replication factor 1): 0
    170167
    171168-------------------------------------------------
    172 Datanodes available: 2 (2 total, 0 dead)
    173 
    174 Name: 172.16.1.11:50010
    175 Rack: /default/rack0
     169Live datanodes (2):
     170
     171Name: 172.16.1.10:50010 (worker-0)
     172Hostname: worker-0
    176173Decommission Status : Normal
    177 Configured Capacity: 27479240704 (25.59 GB)
    178 DFS Used: 24591 (24.01 KB)
    179 Non DFS Used: 3137957873 (2.92 GB)
    180 DFS Remaining: 24341258240(22.67 GB)
    181 DFS Used%: 0%
    182 DFS Remaining%: 88.58%
    183 Last contact: Sat Jan 04 21:49:32 UTC 2014
    184 
    185 
    186 Name: 172.16.1.10:50010
    187 Rack: /default/rack0
     174Configured Capacity: 27412041728 (25.53 GB)
     175DFS Used: 24576 (24 KB)
     176Non DFS Used: 3151020032 (2.93 GB)
     177DFS Remaining: 24260997120 (22.59 GB)
     178DFS Used%: 0.00%
     179DFS Remaining%: 88.50%
     180Configured Cache Capacity: 0 (0 B)
     181Cache Used: 0 (0 B)
     182Cache Remaining: 0 (0 B)
     183Cache Used%: 100.00%
     184Cache Remaining%: 0.00%
     185Xceivers: 1
     186Last contact: Thu Sep 17 12:04:32 UTC 2015
     187
     188
     189Name: 172.16.1.11:50010 (worker-1)
     190Hostname: worker-1
    188191Decommission Status : Normal
    189 Configured Capacity: 27479240704 (25.59 GB)
    190 DFS Used: 24591 (24.01 KB)
    191 Non DFS Used: 3138588657 (2.92 GB)
    192 DFS Remaining: 24340627456(22.67 GB)
    193 DFS Used%: 0%
    194 DFS Remaining%: 88.58%
    195 Last contact: Sat Jan 04 21:49:33 UTC 2014
    196 }}}
    197 
    198 
    199 
    200 == 4. Test the filesystem with a small file ==
    201 
    202 
    203 === A. Create a small test file ===
    204 {{{
    205 # echo Hello GENI World > hello.txt
    206 }}}
    207 
    208 === B. Push the file into the Hadoop filesystem ===
    209 {{{
    210 # hadoop fs -put hello.txt hello.txt
    211 }}}
    212 
    213 === C. Check for the file's existence ===
    214 {{{
    215 # hadoop fs -ls
     192Configured Capacity: 27412041728 (25.53 GB)
     193DFS Used: 24576 (24 KB)
     194Non DFS Used: 3151028224 (2.93 GB)
     195DFS Remaining: 24260988928 (22.59 GB)
     196DFS Used%: 0.00%
     197DFS Remaining%: 88.50%
     198Configured Cache Capacity: 0 (0 B)
     199Cache Used: 0 (0 B)
     200Cache Remaining: 0 (0 B)
     201Cache Used%: 100.00%
     202Cache Remaining%: 0.00%
     203Xceivers: 1
     204Last contact: Thu Sep 17 12:04:32 UTC 2015
     205}}}
     206
     207
     208== 2. Run the experiment ==
     209
     210=== 2.1 Test the hadoop cluster with a small file ===
     211 a. Create a small test file
     212 {{{
     213# echo Hello GENI World > /tmp/hello.txt
     214}}}
     215 a. Push the file into the Hadoop filesystem ===
     216 {{{
     217# hdfs dfs -put  /tmp/hello.txt /hello.txt
     218}}}
     219 a. Check for the file's existence ===
     220 {{{
     221# hdfs dfs -ls /
    216222Found 1 items
    217 -rw-r--r--   3 root supergroup         12 2014-01-04 21:59 /user/root/hello.txt
    218 }}}
    219 
    220 ===  D. Check the contents of the file ===
    221 {{{
    222 # hadoop fs -cat hello.txt
     223-rw-r--r--   2 hadoop supergroup         17 2015-09-17 12:09 /hello.txt
     224}}}
     225 a. Check the contents of the file ===
     226 {{{
     227# hdfs dfs -cat /hello.txt
    223228Hello GENI World
    224229}}}
    225230
    226 == 4.   Run the Hadoop Sort Testcase ==
     231=== 2.2 Run the Hadoop Sort Testcase ===
    227232
    228233 Test the true power of the Hadoop filesystem by creating and sorting a large random dataset.   It may be useful/interesting to login to the master and/or worker VMs and use tools like top, iotop, and iftop to observe the resource utilization on each of the VMs during the sort test.  Note: on these VMs iotop and iftop must be run as root.
    229234
    230 ===  A. Create a 1 GB random data set.   ===
    231 
    232 After the data is created, use the ls functionally to confirm the data exists.  Note that the data is composed of several files in a directory.
    233 
    234 {{{
    235 #  hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar teragen 10000000 random.data.1G
    236 Generating 10000000 using 2 maps with step of 5000000
     235 a. Create a 1 GB random data set
     236 {{{
     237#  hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar teragen 10000000 /input
    23723814/01/05 18:47:58 INFO mapred.JobClient: Running job: job_201401051828_0003
    23823914/01/05 18:47:59 INFO mapred.JobClient:  map 0% reduce 0%
     
    25325414/01/05 18:48:28 INFO mapred.JobClient:     Map output records=10000000
    254255}}}
    255 
    256 === B. Sort the dataset. === 
    257 
    258 {{{
    259 # hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar terasort random.data.1G sorted.data.1G
     256 After the data is created, use the ls functionally to confirm the data exists.  Note that the data is composed of several files in a directory.
     257 a. Sort the dataset:
     258 {{{
     259# hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar terasort /input /output
    26026014/01/05 18:50:49 INFO terasort.TeraSort: starting
    26126114/01/05 18:50:49 INFO mapred.FileInputFormat: Total input paths to process : 2
     
    31631614/01/05 18:52:48 INFO terasort.TeraSort: done
    317317}}}
    318 
    319 ==== C. Look at the output. ==== 
    320 
    321 You can use Hadoop's cat and/or get functionally to look at the random and sorted files to confirm their size and that the sort actually worked.
    322 
    323 Try some or all of these commands.  Does the output make sense to you?
    324 
    325 {{{
     318 a. Look at the output: You can use Hadoop's cat and/or get functionally to look at the random and sorted files to confirm their size and that the sort actually worked.
     319 Try some or all of these commands.  Does the output make sense to you?
     320 {{{
    326321hadoop fs -ls random.data.1G
    327322hadoop fs -ls sorted.data.1G
     
    330325}}}
    331326
    332 == 5.   Advanced Example ==
     327== 3.   Advanced Example ==
    333328
    334329 Re-do the tutorial with a different number of workers, amount of bandwidth, and/or worker  instance types.  Warning:  be courteous to  other users and do not use too many of the resources.