[[PageOutline]] = Plastic Slices sandbox page = Random notes for Plastic Slices stuff. A lot of things that used to be here are now on my general [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page]. What's left should in theory be pretty specific to Plastic Slices. = Environment = The tools we use to wrangle Plastic Slices have a variety of requirements for your environment on the system where you want to use them: * You should have an up-to-date copy of the syseng Subversion repository. * ~/rspecs should be a symlink to .../syseng/geni/share/rspecs. * ~/slices/plastic-slices should be a symlink to .../syseng/geni/share/experiment-setup/plastic-slices. * ~/bin/omni and ~/bin/readyToLogin should be copies of (or symlinks to) the current GCF release. * ~/bin/shmux should be a copy of the 'shmux' executable. * Your ~/.ssh/config file should include "!StrictHostKeyChecking no". (FIXME: It'd be better if this were in the ~/.ssh/config section for each host, instead of being a global requirement.) * ~/.gcf should be your Omni/GCF directory, and you should not mind if cached user and slice credentials are stored there. * Your default project in your Omni config file should be 'gpo-infra'. * Your 'users' list in your Omni config file should include the gpo-infra users. * You should run the various commands all in one shell, because some of the later steps assume that you've run the commands in some of the previous steps. You can run some things in other windows if you know what you're doing, but if you're wrong, things won't work as you expect. * You should have the following shell functions or aliases (e.g. in your .bashrc): {{{ alias shmux='shmux -Sall -m -B -M 20' logins () { for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; $* ~/slices/*/logins/logins-$slicename.txt >| $loginfile ; done ; logins=$(for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; cat $loginfile ; done) ; } somni () { slicename=$1 ; rspec=$2 ; am=$(grep AM: $rspec | sed -e 's/^AM: //') ; } }}} This list is intended to be complete, but if we've forgotten something, you may get an error when you try to use some of those tools -- so corollary, if you do get an error when you try to use some of those tools, check with someone else to see if it works for them, and look for ways in which your environment might be different (and if they're not on this list, add them). = Ending and starting a run = This is how I end one Plastic Slices run, and start the next. These commands use techniques from my [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page], so before doing all this, I should double-check that this copy of those techniques is still accurate. == Ending == Make sure your copy of the syseng Subversion repository is up to date and that you don't have uncommitted changes there. Change into your .../syseng directory, and run {{{ svn update svn status }}} Set the list of slices: {{{ slices=$(echo ps{103..110}) }}} Fetch my user and slice credentials: {{{ (cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done) }}} Deactivate Zoing (so it won't launch another set of experiments at the top of the hour): {{{ logins cat shmux -c "zoing deactivate" $logins }}} Wait for the current run to finish, typically 56 minutes past the hour. Check that all sources are shut down ("-a" nodes): {{{ logins grep -- -a shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins }}} Reset everything, and make sure that everything is shut down: {{{ logins cat shmux -c "zoing reset" $logins shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins }}} [#Fetchlogs Fetch logs] one last time, and upload them to the webserver. Delete all of the slivers, to start the next run with a clean slate: {{{ declare -A rspecs for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done for slicename in $slices ; do echo ${rspecs[$slicename]} ; done for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am deletesliver $slicename & done ; sleep 30s ; done }}} Confirm that everything's gone: {{{ for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am sliverstatus $slicename |& egrep -q -i '(code 12|code 2)' || echo "unexpected sliver in $slicename at $am" & done ; sleep 5s ; done | grep unexpected | grep -v omni }}} Update the wiki page for this run with any final details (e.g. when the run ended). == Starting == Make sure your copy of the syseng Subversion repository is up to date and that you don't have uncommitted changes there. Change into your .../syseng directory, and run {{{ svn update svn status }}} Update ~/slices/plastic-slices/config/slices.json with any changes for this run. Likely changes to think about include: * Adding or removing aggregates. * Changing which aggregates are in which slices. * Changing openflow_controller to point to your personal controller. * Changing rspec_template_root to point to the directory where you personally have the rspec templates. Update ~/slices/plastic-slices/config/pairmap.json with any changs for this run. At this point, we're maintaining the file by hand, so that we can preserve specific pairs from run to run. The pairs we're preserving are: || source || destination || TCP || UDP || || bbn-exogeni || max-instageni || ps103 || ps108 || || clemson-instageni || wisconsin-instageni || ps105 || ps110 || || fiu-exogeni || bbn-exogeni || ps104 || ps107 || || fiu-exogeni || bbn-instageni || ps103 || ps108 || || gatech-instageni || northwestern-instageni || ps106 || ps107 || || kansas-instageni || northwestern-instageni || ps105 || ps108 || || nyu-instageni || utahddc-instageni || ps106 || ps109 || || sox-instageni || illinois-instageni || ps104 || ps109 || || stanford-instageni || bbn-instageni || ps106 || ps109 || If you add a new aggregate, make sure not to break up those pairs. If for some reason you want to generate a new random pairmap, the Tarvek 00README file has docs for how to do that. Generate the rest of the configuration: {{{ cd ~/slices/plastic-slices python ~/tarvek/generate-experiment-config.py ./config/slices.json ./config/pairmap.json ./wiki-source.txt svn rm $(svn st | grep ^! | awk '{ print $2; }') svn add $(svn st | grep ? | awk '{ print $2; }') }}} Review to make sure that things look right, then commit that to Subversion. Set the list of slices: {{{ slices=$(echo ps{103..110}) }}} Fetch my user and slice credentials: {{{ (cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done) }}} Set up variables to create the slivers: {{{ declare -A rspecs for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done for slicename in $slices ; do echo ${rspecs[$slicename]} ; done for slicename in $slices ; do echo ${rspecs[$slicename]} ; done | wc }}} The last two echo lines are a good place to sanity-check that things are as you expect: The first should list an rspec for every sliver you expect to create, and the second should list a count of them. There should be one line per slice, and probably a few hundred rspecs, but the exact number will depend on how many aggregates you have in each slice. Actually create the slivers: {{{ for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am createsliver $slicename $rspec & done ; sleep 5m ; done }}} Some notes about that: * The combination of (a) the ampersand; and (b) the sleep 5m at the end; means that this (a) fires off a createsliver for every aggregate in the slice and runs them all in parallel in the background; (b) sleeps for five minutes between slices, to avoid swamping any aggregates with too many requests at once. That 5m seems to work well for not crashing FV and not overloading InstaGENI, but it could potentially be cranked down if both of those improve. * This doesn't capture output at all. We could potentially add something to stuff the output into one giant file, but it might be a little hard to sort out, since output is coming back from all of the slivers at once, all intermingled together. We could have each createsliver write an output file, but we'd need to be careful to name them and save them so that the output file from an aggregate in one slice wouldn't overwrite the output from the same aggregate in another slice. For now, we just check later to see what worked and what didn't, and try again by hand if it's not obvious why some things didn't work. Renew the Utah slivers, which default to expiring in six hours: {{{ declare -A rspecs for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done for slicename in $slices ; do echo ${rspecs[$slicename]} ; done renewdate="$(date +%Y-%m-%d -d 'now + 4 days') 23:00 UTC" for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am renewsliver $slicename "$renewdate" & done ; sleep 5s ; done }}} Set a reminder for yourself to renew those in four days. (Something in your calendar, a cron job, a mental note to watch your e-mail for expiration warnings the day before they expire, etc.) Gather up expiration information for everything, and stuff it into a results file: {{{ for slicename in $slices do cd rm -rf ~/tmp/renewsliver/$slicename mkdir -p ~/tmp/renewsliver/$slicename cd ~/tmp/renewsliver/$slicename for rspec in ${rspecs[$slicename]} ; do outfile=$(echo $(basename $rspec) | sed -e 's/.rspec$//') ; somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am sliverstatus $slicename >& $outfile ; done cd ~/tmp/renewsliver/$slicename grep -h _expires * >> results.txt for i in * ; do grep _expires $i > /dev/null || echo "no 'expires' lines in $i" ; done >> results.txt done }}} Set some variables to match the dates you expect things to expire on (these are just examples, and may need to be edited): {{{ mm_dd="05-15" mon_day="Apr 28" }}} Look for anomalies in the results files: {{{ cd ~/tmp/renewsliver for slicename in $slices ; do echo "==> $slicename" ; grep foam_expires $slicename/results.txt ; done | grep -v "$mm_dd" for slicename in $slices ; do echo "==> $slicename" ; grep orca_expires $slicename/results.txt ; done | grep -v "$mon_day" for slicename in $slices ; do echo "==> $slicename" ; grep pg_expires $slicename/results.txt ; done | grep -v "$mm_dd" for slicename in $slices ; do echo "==> $slicename" ; grep "no 'expires' lines" $slicename/results.txt ; done }}} If you find anomalies, you'll probably need to go back to the original output files to figure out where they came from. Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Getlogininfo get login info]. Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Loginstuff do other login-related stuff]. Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Pingteststuff test connectivity]. Trying "the fast way" from one node in each slice is probably good enough, but "the reliable way" will work too if you're not in a hurry. Copy in Zoing stuff: {{{ shmux -c 'mkdir -p bin' $logins for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/zoing/zoing bin/zoing ; done for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a ~/slices/plastic-slices/zoing/zoingrc-$login $login:.zoingrc && echo $login ; done & done }}} Copy in traffic-shaping stuff: {{{ for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/tc-shape-eth1-ten-mbps tc-shape-eth1-ten-mbps ; done shmux -c 'sudo chown root:root tc-shape-eth1-ten-mbps' $logins shmux -c 'sudo mv tc-shape-eth1-ten-mbps /etc/init.d/tc-shape-eth1-ten-mbps' $logins shmux -c 'sudo ln -s ../init.d/tc-shape-eth1-ten-mbps /etc/rc2.d/S99tc-shape-eth1-ten-mbps' $logins shmux -c 'sudo service tc-shape-eth1-ten-mbps start' $logins }}} Fire up Zoing: {{{ shmux -c "zoing activate" $logins }}} Create a directory for logs, and copy other files into it: {{{ subdir= mkdir -p ~/tmp/plastic-slices/$subdir/logs cp ~/slices/plastic-slices/config/*json ~/tmp/plastic-slices/$subdir rscpc ~/slices/plastic-slices/hosts/ ~/tmp/plastic-slices/$subdir/00hosts rscpc ~/slices/plastic-slices/logins/ ~/tmp/plastic-slices/$subdir/00logins rscpc ~/slices/plastic-slices/ssh_config/ ~/tmp/plastic-slices/$subdir/00ssh_config }}} Create the wiki page. Send mail to gpo-tech letting folks know. = To do = Here are some random things I've jotted down that I'd like to do: * Add a way to positively confirm that slivers *don't* exist * Add a way to show more concise sliver status -- not four+ lines per sliver * Add a way to supply a paramter to test against, like "this date" for expiry * Add a way to save all omni output in files, so I can look up what happened if something goes wrong * Maybe use vxargs to parallelize omni for some things? Sliver deletion takes freakin' forever. Or just a loop, do ten slices in parallel, although this won't help for single big slices. Maybe parallelize across one slice would be better, so it hits all the aggregates once, then again, etc. Some of those would end up on my [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page], but they affect Plastic Slices the most (because of its scale), so they're here for now. Or I might add it to Tarvek, we'll see. = Fetch logs = I run all this stuff on anubis. Pull them into a subdirectory of my temp log processing directory: {{{ subdir= mkdir -p ~/tmp/plastic-slices/$subdir/logs logins grep -- -a shmux -c "sed -i -e '/nanosleep failed:/d' zoing-logs/zoing*log" $logins logins cat for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a $login:zoing-logs/ ~/tmp/plastic-slices/$subdir/logs/$login && echo $login ; done & done }}} Remove the last day's PNG file and the all PNG file, to make sure we re-generate it: {{{ lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/') rm ~/tmp/plastic-slices/$subdir/pngs/*/*/*all*png ~/tmp/plastic-slices/$subdir/pngs/*/*/*daily-$lastday*png }}} Plot graphs: {{{ firstlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | head -1 | sed -e 's/zoing-\(.*\).log/\1/') lastlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-\(.*\).log/\1/') time python ~/tarvek/generate-graphs.py --progress --mainconfig=~/tmp/plastic-slices/$subdir/slices.json --pairmap=~/tmp/plastic-slices/$subdir/pairmap.json --rootdir=~/tmp/plastic-slices/$subdir --starttime=$firstlog --endtime=$lastlog }}} Push everything up to the webserver: {{{ rsync -av ~/tmp/plastic-slices/$subdir www.gpolab.bbn.com:/srv/www/plastic-slices/continuation }}} == Checking in == On my laptop, copy down the graphs: {{{ subdir= rscpd anubis:tmp/plastic-slices/$subdir/pngs ~/tmp/plastic-slices/$subdir }}} Identify the last day we have graphs for: {{{ lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/') }}} Show the per-slice graphs of the most recent day: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-daily-$lastday.png }}} Show the per-host daily graphs for the most recent day: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily-$lastday.png }}} Show the per-slice graphs of the whole run: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-all.png }}} Show the per-host graphs of the whole run: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-all.png }}} Show the per-host daily graphs for all of the days: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily*.png }}} === The old way === This is how I used to check in, using grep to scan log files; nowadays I'm using the graphs. Get a quick summary of the current state of things (based on the last completed run; or change $timestamp to get a different run): {{{ timestamp=$(date -d "now - 1 hour" +%Y%m%d.%H) for subnet in {103..106} do echo -e "--> plastic $subnet\n" for login in $(awk 'NR%2==1' ~/slices/plastic-slices/logins/logins-ps$subnet.txt) do echo -n "$login to " grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }' grep /sec logs/$login/zoing-$timestamp*.log || echo no results echo "" done done for subnet in {107..110} do echo -e "--> plastic $subnet\n" for login in $(awk 'NR%2==0' ~/slices/plastic-slices/logins/logins-ps$subnet.txt) do echo -n $(grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }') echo " to $login" egrep " 0.0-[^ ].+/sec" logs/$login/zoing-$timestamp*.log || echo no results echo "" done done }}} = Use NOX = Run NOX for plastic-101, with the learning switch ('switch') module and LAVI: {{{ subnet=101 port=33$subnet ; (cd /usr/bin && /usr/bin/nox_core --info=/home/jbs/nox/nox-${port}.info -i ptcp:$port switch lavi_switches jsonmessenger=tcpport=11$subnet,sslport=0) }}} In another window, ask the plastic-101 NOX (via LAVI) what datapaths are connected: {{{ subnet=101 ; nox-console -n localhost -p 11$subnet getnodes }}}