Context Navigation

ExecuteExperiment

Timestamp:: 01/07/14 21:30:48 (10 years ago)
Author:: pruth@renci.org
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

GENIExperimenter/Tutorials/HadoopInASlice/ExecuteExperiment

-                      v9
+                      v10
 ===  Observe the contents of the NEuca user data file.  This file includes a script that will install and execute the script that you configured for the VM. ===
+===  B. Observe the contents of the NEuca user data file.  This file includes a script that will install and execute the script that you configured for the VM. ===
 {{{
 # neuca-user-data
 …
 === Observe the contents of the of the script that was installed and executed on the VM. ===
+=== C. Observe the contents of the of the script that was installed and executed on the VM. ===
 {{{
 # cat /tmp/master.sh
 …
 === Test for connectivity between the VMs. ===
+=== D. Test for connectivity between the VMs. ===
 {{{
 …
 == 3. Check the status of the Hadoop filesystem. ==
 === Query for the status of the filesystem and its associated workers. ===
+===  A. Query for the status of the filesystem and its associated workers. ===
 {{{
 …
 == Test the filesystem with a small file ==
 === Create a small test file ===
+== 4. Test the filesystem with a small file ==
+=== A. Create a small test file ===
 {{{
 # hadoop fs -put hello.txt hello.txt
 }}}
 === Push the file into the Hadoop filesystem ===
+=== B. Push the file into the Hadoop filesystem ===
 {{{
 # hadoop fs -put hello.txt hello.txt
 }}}
 === Check for the file's existence ===
+=== C. Check for the file's existence ===
 {{{
 # hadoop fs -ls
 …
 }}}
 ===  Check the contents of the file ===
+===  D. Check the contents of the file ===
 {{{
 # hadoop fs -cat hello.txt
 …
  Test the true power of the Hadoop filesystem by creating and sorting a large random dataset.   It may be useful/interesting to login to the master and/or worker VMs and use tools like \verb$top$, \verb$iotop$, and \verb$iftop$ to observe the resource utilization on each of the VMs during the sort test.
+==  Create a 1 GB random data set.  After the data is created, use the \verb$ls$ functionally to confirm the data exists.  Note that the data is composed of several files in a directory. ==
+===  A. Create a 1 GB random data set.   ===
+After the data is created, use the \verb$ls$ functionally to confirm the data exists.  Note that the data is composed of several files in a directory.
 {{{
 #  hadoop jar /usr/local/hadoop-0.20.2/hadoop-0.20.2-examples.jar teragen 10000000 random.data.1G
 …
 }}}
+== Sort the datasets.  On your own, you can use the \verb$cat$ and/or \verb$get$ functionally to look at the random and sorted files to confirm their size and that the sort actually worked. ==
+=== B. Sort the datasets. ===
+On your own, you can use Hadoop's cat and/or get  functionally to look at the random and sorted files to confirm their size and that the sort actually worked.
 {{{
 …
  Re-do the tutorial with a different number of workers, amount of bandwidth, and/or worker  instance types.  Warning:  Be courteous to  other users and do not take all the resources.
+A. Time the performance of runs with different resources
+B. Observe largest size file you can create with different settings.
+=== A. Time the performance of runs with different resources  ===
+=== B. Observe largest size file you can create with different settings. ===