[[PageOutline]] = Plastic Slices sandbox page = Random notes for Plastic Slices stuff. A lot of things that used to be here are now on my general [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page]. What's left should in theory be pretty specific to Plastic Slices. = Ending and starting a run = This is how I end one Plastic Slices run, and start the next. These commands use techniques from my [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page], so before doing all this, I should double-check that this copy of those techniques is still accurate. == Ending == Set the list of slices: {{{ slices=$(echo ps{103..110}) }}} Fetch my user and slice credentials: {{{ (cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done) }}} Deactivate Zoing (so it won't launch another set of experiments at the top of the hour): {{{ logins cat shmux -c "zoing deactivate" $logins }}} Wait for the current run to finish, typically 56 minutes past the hour. Check that all sources are shut down ("-a" nodes): {{{ logins grep -- -a shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins }}} Reset everything, and make sure that everything is shut down: {{{ logins cat shmux -c "zoing reset" $logins shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins }}} [#Fetchlogs Fetch logs] one last time, and upload them to the webserver. Delete all of the slivers, to start the next run with a clean slate: {{{ declare -A rspecs for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done for slicename in $slices ; do echo ${rspecs[$slicename]} ; done for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am deletesliver $slicename & done ; sleep 30s ; done }}} Confirm that everything's gone: {{{ for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am sliverstatus $slicename |& egrep -q -i '(code 12|code 2)' || echo "unexpected sliver in $slicename at $am" & done ; sleep 5s ; done | grep unexpected | grep -v omni }}} Update the wiki page for this run with any final details (e.g. when the run ended). == Starting == Update ~/slices/plastic-slices/config/slices.json with any changes for this run. Generate a new pairmap: {{{ cd ~/slices/plastic-slices python ~/tarvek/generate-pairmap.py ./config/slices.json ./config/pairmap.json }}} Review and edit that if necessary. Generate the rest of the configuration: {{{ python ~/tarvek/generate-experiment-config.py ./config/slices.json ./config/pairmap.json ./wiki-source.txt svn rm $(svn st | grep ^! | awk '{ print $2; }') svn add $(svn st | grep ? | awk '{ print $2; }') }}} Review to make sure that things look right, then commit that to Subversion. Set the list of slices: {{{ slices=$(echo ps{103..110}) }}} Fetch my user and slice credentials: {{{ (cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done) }}} Create the slivers: {{{ declare -A rspecs for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done for slicename in $slices ; do echo ${rspecs[$slicename]} ; done for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am createsliver $slicename $rspec & done ; sleep 5m ; done }}} Confirm that all of the slivers' expiration dates are as expected, and [wiki:JBSsandbox/SliceNotes#Forlotsofslivers renew anything that isn't] using my general slice notes. Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Getlogininfo get login info]. Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Loginstuff do other login-related stuff]. Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Pingteststuff test connectivity]. Trying "the fast way" from one node in each slice is probably good enough, but "the reliable way" will work too if you're not in a hurry. Copy in Zoing stuff: {{{ shmux -c 'mkdir -p bin' $logins for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/zoing/zoing bin/zoing ; done for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a ~/slices/plastic-slices/zoing/zoingrc-$login $login:.zoingrc && echo $login ; done & done }}} Copy in traffic-shaping stuff: {{{ for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/tc-shape-eth1-ten-mbps tc-shape-eth1-ten-mbps ; done shmux -c 'sudo chown root:root tc-shape-eth1-ten-mbps' $logins shmux -c 'sudo mv tc-shape-eth1-ten-mbps /etc/init.d/tc-shape-eth1-ten-mbps' $logins shmux -c 'sudo ln -s ../init.d/tc-shape-eth1-ten-mbps /etc/rc2.d/S99tc-shape-eth1-ten-mbps' $logins shmux -c 'sudo service tc-shape-eth1-ten-mbps start' $logins }}} Fire up Zoing: {{{ shmux -c "zoing activate" $logins }}} Create a directory for logs, and copy other files into it: {{{ subdir= mkdir -p ~/tmp/plastic-slices/$subdir/logs cp ~/slices/plastic-slices/config/*json ~/tmp/plastic-slices/$subdir rscpc ~/slices/plastic-slices/hosts/ ~/tmp/plastic-slices/$subdir/00hosts rscpc ~/slices/plastic-slices/logins/ ~/tmp/plastic-slices/$subdir/00logins rscpc ~/slices/plastic-slices/ssh_config/ ~/tmp/plastic-slices/$subdir/00ssh_config }}} Create the wiki page. Send mail to gpo-tech letting folks know. = To do = Here are some random things I've jotted down that I'd like to do: * Add a way to positively confirm that slivers *don't* exist * Add a way to show more concise sliver status -- not four+ lines per sliver * Add a way to supply a paramter to test against, like "this date" for expiry * Add a way to save all omni output in files, so I can look up what happened if something goes wrong * Maybe use vxargs to parallelize omni for some things? Sliver deletion takes freakin' forever. Or just a loop, do ten slices in parallel, although this won't help for single big slices. Maybe parallelize across one slice would be better, so it hits all the aggregates once, then again, etc. Some of those would end up on my [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page], but they affect Plastic Slices the most (because of its scale), so they're here for now. Or I might add it to Tarvek, we'll see. = Fetch logs = I run all this stuff on anubis. Pull them into a subdirectory of my temp log processing directory: {{{ subdir= mkdir -p ~/tmp/plastic-slices/$subdir/logs logins cat shmux -c "sed -i -e '/nanosleep failed:/d' zoing-logs/zoing*log" $logins for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a $login:zoing-logs/ ~/tmp/plastic-slices/$subdir/logs/$login && echo $login ; done & done }}} Remove the last day's PNG file and the all PNG file, to make sure we re-generate it: {{{ lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/') rm ~/tmp/plastic-slices/$subdir/pngs/*/*/*all*png ~/tmp/plastic-slices/$subdir/pngs/*/*/*daily-$lastday*png }}} Plot graphs: {{{ firstlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | head -1 | sed -e 's/zoing-\(.*\).log/\1/') lastlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-\(.*\).log/\1/') time python ~/tarvek/generate-graphs.py --progress --mainconfig=~/tmp/plastic-slices/$subdir/slices.json --pairmap=~/tmp/plastic-slices/$subdir/pairmap.json --rootdir=~/tmp/plastic-slices/$subdir --starttime=$firstlog --endtime=$lastlog }}} Push everything up to the webserver: {{{ rsync -av ~/tmp/plastic-slices/$subdir www.gpolab.bbn.com:/srv/www/plastic-slices/continuation }}} == Checking in == On my laptop, copy down the graphs: {{{ subdir= rscpd anubis:tmp/plastic-slices/$subdir/pngs ~/tmp/plastic-slices/$subdir }}} Identify the last day we have graphs for: {{{ lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/') }}} Show the per-slice graphs of the most recent day: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-daily-$lastday.png }}} Show the per-host daily graphs for the most recent day: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily-$lastday.png }}} Show the per-slice graphs of the whole run: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-all.png }}} Show the per-host graphs of the whole run: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-all.png }}} Show the per-host daily graphs for all of the days: {{{ gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily*.png }}} === The old way === This is how I used to check in, using grep to scan log files; nowadays I'm using the graphs. Get a quick summary of the current state of things (based on the last completed run; or change $timestamp to get a different run): {{{ timestamp=$(date -d "now - 1 hour" +%Y%m%d.%H) for subnet in {103..106} do echo -e "--> plastic $subnet\n" for login in $(awk 'NR%2==1' ~/slices/plastic-slices/logins/logins-ps$subnet.txt) do echo -n "$login to " grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }' grep /sec logs/$login/zoing-$timestamp*.log || echo no results echo "" done done for subnet in {107..110} do echo -e "--> plastic $subnet\n" for login in $(awk 'NR%2==0' ~/slices/plastic-slices/logins/logins-ps$subnet.txt) do echo -n $(grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }') echo " to $login" egrep " 0.0-[^ ].+/sec" logs/$login/zoing-$timestamp*.log || echo no results echo "" done done }}} = Use NOX = Run NOX for plastic-101, with the learning switch ('switch') module and LAVI: {{{ subnet=101 port=33$subnet ; (cd /usr/bin && /usr/bin/nox_core --info=/home/jbs/nox/nox-${port}.info -i ptcp:$port switch lavi_switches jsonmessenger=tcpport=11$subnet,sslport=0) }}} In another window, ask the plastic-101 NOX (via LAVI) what datapaths are connected: {{{ subnet=101 ; nox-console -n localhost -p 11$subnet getnodes }}}