Changes between Initial Version and Version 1 of JBSsandbox/PlasticSlices


Ignore:
Timestamp:
03/18/14 21:14:21 (10 years ago)
Author:
Josh Smift
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • JBSsandbox/PlasticSlices

    v1 v1  
     1[[PageOutline]]
     2
     3= Plastic Slices sandbox page =
     4
     5Random notes for Plastic Slices stuff.
     6
     7A lot of things that used to be here are now on my general [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page]. What's left should in theory be pretty specific to Plastic Slices.
     8
     9= Ending and starting a run =
     10
     11This is how I end one Plastic Slices run, and start the next. These commands use techniques from my [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page], so before doing all this, I should double-check that this copy of those techniques is still accurate.
     12
     13== Ending ==
     14
     15Set the list of slices:
     16
     17{{{
     18slices=$(echo ps{103..110})
     19}}}
     20
     21Fetch my user and slice credentials:
     22
     23{{{
     24(cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done)
     25}}}
     26
     27Deactivate Zoing (so it won't launch another set of experiments at the top of the hour):
     28
     29{{{
     30logins cat
     31shmux -c "zoing deactivate" $logins
     32}}}
     33
     34Wait for the current run to finish, typically 56 minutes past the hour.
     35
     36Check that all sources are shut down ("-a" nodes):
     37
     38{{{
     39logins grep -- -a
     40shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins
     41}}}
     42
     43Reset everything, and make sure that everything is shut down:
     44
     45{{{
     46logins cat
     47shmux -c "zoing reset" $logins
     48shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins
     49}}}
     50
     51[#Fetchlogs Fetch logs] one last time, and upload them to the webserver.
     52
     53Delete all of the slivers, to start the next run with a clean slate:
     54
     55{{{
     56declare -A rspecs
     57for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done
     58for slicename in $slices ; do echo ${rspecs[$slicename]} ; done
     59for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am deletesliver $slicename & done ; sleep 30s ; done
     60}}}
     61
     62Confirm that everything's gone:
     63
     64{{{
     65for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am sliverstatus $slicename |& egrep -q -i '(code 12|code 2)' || echo "unexpected sliver in $slicename at $am" & done ; sleep 5s ; done | grep unexpected | grep -v omni
     66}}}
     67
     68Update the wiki page for this run with any final details (e.g. when the run ended).
     69
     70== Starting ==
     71
     72Update ~/slices/plastic-slices/config/slices.json with any changes for this run.
     73
     74Generate a new pairmap:
     75
     76{{{
     77cd ~/slices/plastic-slices
     78python ~/tarvek/generate-pairmap.py ./config/slices.json ./config/pairmap.json
     79}}}
     80
     81Review and edit that if necessary.
     82
     83Generate the rest of the configuration:
     84
     85{{{
     86python ~/tarvek/generate-experiment-config.py ./config/slices.json ./config/pairmap.json ./wiki-source.txt
     87svn rm $(svn st | grep ^! | awk '{ print $2; }')
     88svn add $(svn st | grep ? | awk '{ print $2; }')
     89}}}
     90
     91Review to make sure that things look right, then commit that to Subversion.
     92
     93Set the list of slices:
     94
     95{{{
     96slices=$(echo ps{103..110})
     97}}}
     98
     99Fetch my user and slice credentials:
     100
     101{{{
     102(cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done)
     103}}}
     104
     105Create the slivers:
     106
     107{{{
     108declare -A rspecs
     109for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done
     110for slicename in $slices ; do echo ${rspecs[$slicename]} ; done
     111for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am createsliver $slicename $rspec & done ; sleep 5m ; done
     112}}}
     113
     114Confirm that all of the slivers' expiration dates are as expected, and [wiki:JBSsandbox/SliceNotes#Forlotsofslivers renew anything that isn't] using my general slice notes.
     115
     116Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Getlogininfo get login info].
     117
     118Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Loginstuff do other login-related stuff].
     119
     120Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Pingteststuff test connectivity]. Trying "the fast way" from one node in each slice is probably good enough, but "the reliable way" will work too if you're not in a hurry.
     121
     122Copy in Zoing stuff:
     123
     124{{{
     125shmux -c 'mkdir -p bin' $logins
     126for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/zoing/zoing bin/zoing ; done
     127for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a ~/slices/plastic-slices/zoing/zoingrc-$login $login:.zoingrc && echo $login ; done & done
     128}}}
     129
     130Copy in traffic-shaping stuff:
     131
     132{{{
     133for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/tc-shape-eth1-ten-mbps tc-shape-eth1-ten-mbps ; done
     134shmux -c 'sudo chown root:root tc-shape-eth1-ten-mbps' $logins
     135shmux -c 'sudo mv tc-shape-eth1-ten-mbps /etc/init.d/tc-shape-eth1-ten-mbps' $logins
     136shmux -c 'sudo ln -s ../init.d/tc-shape-eth1-ten-mbps /etc/rc2.d/S99tc-shape-eth1-ten-mbps' $logins
     137shmux -c 'sudo service tc-shape-eth1-ten-mbps start' $logins
     138}}}
     139
     140Fire up Zoing:
     141
     142{{{
     143shmux -c "zoing activate" $logins
     144}}}
     145
     146Create a directory for logs, and copy other files into it:
     147
     148{{{
     149subdir=<a subdirectory>
     150
     151mkdir -p ~/tmp/plastic-slices/$subdir/logs
     152
     153cp ~/slices/plastic-slices/config/*json ~/tmp/plastic-slices/$subdir
     154rscpc ~/slices/plastic-slices/hosts/ ~/tmp/plastic-slices/$subdir/00hosts
     155rscpc ~/slices/plastic-slices/logins/ ~/tmp/plastic-slices/$subdir/00logins
     156rscpc ~/slices/plastic-slices/ssh_config/ ~/tmp/plastic-slices/$subdir/00ssh_config
     157}}}
     158
     159Create the wiki page.
     160
     161Send mail to gpo-tech letting folks know.
     162
     163= To do =
     164
     165Here are some random things I've jotted down that I'd like to do:
     166
     167 * Add a way to positively confirm that slivers *don't* exist
     168 * Add a way to show more concise sliver status -- not four+ lines per sliver
     169 * Add a way to supply a paramter to test against, like "this date" for expiry
     170 * Add a way to save all omni output in files, so I can look up what happened if something goes wrong
     171 * Maybe use vxargs to parallelize omni for some things? Sliver deletion takes freakin' forever. Or just a loop, do ten slices in parallel, although this won't help for single big slices. Maybe parallelize across one slice would be better, so it hits all the aggregates once, then again, etc.
     172
     173Some of those would end up on my [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page], but they affect Plastic Slices the most (because of its scale), so they're here for now. Or I might add it to Tarvek, we'll see.
     174
     175= Fetch logs =
     176
     177I run all this stuff on anubis.
     178
     179Pull them into a subdirectory of my temp log processing directory:
     180
     181{{{
     182subdir=<a subdirectory>
     183
     184mkdir -p ~/tmp/plastic-slices/$subdir/logs
     185
     186logins cat
     187shmux -c "sed -i -e '/nanosleep failed:/d' zoing-logs/zoing*log" $logins
     188for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a $login:zoing-logs/ ~/tmp/plastic-slices/$subdir/logs/$login && echo $login ; done & done
     189}}}
     190
     191Remove the last day's PNG file and the all PNG file, to make sure we re-generate it:
     192
     193{{{
     194lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/')
     195rm ~/tmp/plastic-slices/$subdir/pngs/*/*/*all*png ~/tmp/plastic-slices/$subdir/pngs/*/*/*daily-$lastday*png
     196}}}
     197
     198Plot graphs:
     199
     200{{{
     201firstlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | head -1 | sed -e 's/zoing-\(.*\).log/\1/')
     202lastlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-\(.*\).log/\1/')
     203time python ~/tarvek/generate-graphs.py --progress --mainconfig=~/tmp/plastic-slices/$subdir/slices.json --pairmap=~/tmp/plastic-slices/$subdir/pairmap.json --rootdir=~/tmp/plastic-slices/$subdir --starttime=$firstlog --endtime=$lastlog
     204}}}
     205
     206Push everything up to the webserver:
     207
     208{{{
     209rsync -av ~/tmp/plastic-slices/$subdir www.gpolab.bbn.com:/srv/www/plastic-slices/continuation
     210}}}
     211
     212== Checking in ==
     213
     214On my laptop, copy down the graphs:
     215
     216{{{
     217subdir=<a directory>
     218
     219rscpd anubis:tmp/plastic-slices/$subdir/pngs ~/tmp/plastic-slices/$subdir
     220}}}
     221
     222Identify the last day we have graphs for:
     223
     224{{{
     225lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/')
     226}}}
     227
     228Show the per-slice graphs of the most recent day:
     229
     230{{{
     231gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-daily-$lastday.png
     232}}}
     233
     234Show the per-host daily graphs for the most recent day:
     235
     236{{{
     237gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily-$lastday.png
     238}}}
     239
     240Show the per-slice graphs of the whole run:
     241
     242{{{
     243gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-all.png
     244}}}
     245
     246Show the per-host graphs of the whole run:
     247
     248{{{
     249gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-all.png
     250}}}
     251
     252Show the per-host daily graphs for all of the days:
     253
     254{{{
     255gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily*.png
     256}}}
     257
     258=== The old way ===
     259
     260This is how I used to check in, using grep to scan log files; nowadays I'm using the graphs.
     261
     262Get a quick summary of the current state of things (based on the last completed run; or change $timestamp to get a different run):
     263
     264{{{
     265timestamp=$(date -d "now - 1 hour" +%Y%m%d.%H)
     266
     267for subnet in {103..106}
     268do
     269  echo -e "--> plastic $subnet\n"
     270  for login in $(awk 'NR%2==1' ~/slices/plastic-slices/logins/logins-ps$subnet.txt)
     271  do
     272    echo -n "$login to "
     273    grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }'
     274    grep /sec logs/$login/zoing-$timestamp*.log || echo no results
     275    echo ""
     276  done
     277done
     278
     279for subnet in {107..110}
     280do
     281  echo -e "--> plastic $subnet\n"
     282  for login in $(awk 'NR%2==0' ~/slices/plastic-slices/logins/logins-ps$subnet.txt)
     283  do
     284    echo -n $(grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }')
     285    echo " to $login"
     286    egrep " 0.0-[^ ].+/sec" logs/$login/zoing-$timestamp*.log || echo no results
     287    echo ""
     288  done
     289done
     290}}}
     291
     292= Use NOX =
     293
     294Run NOX for plastic-101, with the learning switch ('switch') module and LAVI:
     295
     296{{{
     297subnet=101
     298port=33$subnet ; (cd /usr/bin && /usr/bin/nox_core --info=/home/jbs/nox/nox-${port}.info -i ptcp:$port switch lavi_switches jsonmessenger=tcpport=11$subnet,sslport=0)
     299}}}
     300
     301In another window, ask the plastic-101 NOX (via LAVI) what datapaths are connected:
     302
     303{{{
     304subnet=101 ; nox-console -n localhost -p 11$subnet getnodes
     305}}}