Version 2 (modified by 11 years ago) (diff) | ,
---|
Plastic Slices sandbox page
Random notes for Plastic Slices stuff.
A lot of things that used to be here are now on my general "slice notes" sandbox page. What's left should in theory be pretty specific to Plastic Slices.
Ending and starting a run
This is how I end one Plastic Slices run, and start the next. These commands use techniques from my "slice notes" sandbox page, so before doing all this, I should double-check that this copy of those techniques is still accurate.
Ending
Set the list of slices:
slices=$(echo ps{103..110})
Fetch my user and slice credentials:
(cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done)
Deactivate Zoing (so it won't launch another set of experiments at the top of the hour):
logins cat shmux -c "zoing deactivate" $logins
Wait for the current run to finish, typically 56 minutes past the hour.
Check that all sources are shut down ("-a" nodes):
logins grep -- -a shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins
Reset everything, and make sure that everything is shut down:
logins cat shmux -c "zoing reset" $logins shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins
Fetch logs one last time, and upload them to the webserver.
Delete all of the slivers, to start the next run with a clean slate:
declare -A rspecs for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done for slicename in $slices ; do echo ${rspecs[$slicename]} ; done for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am deletesliver $slicename & done ; sleep 30s ; done
Confirm that everything's gone:
for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am sliverstatus $slicename |& egrep -q -i '(code 12|code 2)' || echo "unexpected sliver in $slicename at $am" & done ; sleep 5s ; done | grep unexpected | grep -v omni
Update the wiki page for this run with any final details (e.g. when the run ended).
Starting
Update ~/slices/plastic-slices/config/slices.json with any changes for this run.
Generate a new pairmap:
cd ~/slices/plastic-slices python ~/tarvek/generate-pairmap.py ./config/slices.json ./config/pairmap.json
Review and edit that if necessary.
Generate the rest of the configuration:
python ~/tarvek/generate-experiment-config.py ./config/slices.json ./config/pairmap.json ./wiki-source.txt svn rm $(svn st | grep ^! | awk '{ print $2; }') svn add $(svn st | grep ? | awk '{ print $2; }')
Review to make sure that things look right, then commit that to Subversion.
Set the list of slices:
slices=$(echo ps{103..110})
Fetch my user and slice credentials:
(cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done)
Create the slivers:
declare -A rspecs for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done for slicename in $slices ; do echo ${rspecs[$slicename]} ; done for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am createsliver $slicename $rspec & done ; sleep 5m ; done
Confirm that all of the slivers' expiration dates are as expected, and renew anything that isn't using my general slice notes.
Using my general slice notes, get login info.
Using my general slice notes, do other login-related stuff.
Using my general slice notes, test connectivity. Trying "the fast way" from one node in each slice is probably good enough, but "the reliable way" will work too if you're not in a hurry.
Copy in Zoing stuff:
shmux -c 'mkdir -p bin' $logins for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/zoing/zoing bin/zoing ; done for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a ~/slices/plastic-slices/zoing/zoingrc-$login $login:.zoingrc && echo $login ; done & done
Copy in traffic-shaping stuff:
for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/tc-shape-eth1-ten-mbps tc-shape-eth1-ten-mbps ; done shmux -c 'sudo chown root:root tc-shape-eth1-ten-mbps' $logins shmux -c 'sudo mv tc-shape-eth1-ten-mbps /etc/init.d/tc-shape-eth1-ten-mbps' $logins shmux -c 'sudo ln -s ../init.d/tc-shape-eth1-ten-mbps /etc/rc2.d/S99tc-shape-eth1-ten-mbps' $logins shmux -c 'sudo service tc-shape-eth1-ten-mbps start' $logins
Fire up Zoing:
shmux -c "zoing activate" $logins
Create a directory for logs, and copy other files into it:
subdir=<a subdirectory> mkdir -p ~/tmp/plastic-slices/$subdir/logs cp ~/slices/plastic-slices/config/*json ~/tmp/plastic-slices/$subdir rscpc ~/slices/plastic-slices/hosts/ ~/tmp/plastic-slices/$subdir/00hosts rscpc ~/slices/plastic-slices/logins/ ~/tmp/plastic-slices/$subdir/00logins rscpc ~/slices/plastic-slices/ssh_config/ ~/tmp/plastic-slices/$subdir/00ssh_config
Create the wiki page.
Send mail to gpo-tech letting folks know.
To do
Here are some random things I've jotted down that I'd like to do:
- Add a way to positively confirm that slivers *don't* exist
- Add a way to show more concise sliver status -- not four+ lines per sliver
- Add a way to supply a paramter to test against, like "this date" for expiry
- Add a way to save all omni output in files, so I can look up what happened if something goes wrong
- Maybe use vxargs to parallelize omni for some things? Sliver deletion takes freakin' forever. Or just a loop, do ten slices in parallel, although this won't help for single big slices. Maybe parallelize across one slice would be better, so it hits all the aggregates once, then again, etc.
Some of those would end up on my "slice notes" sandbox page, but they affect Plastic Slices the most (because of its scale), so they're here for now. Or I might add it to Tarvek, we'll see.
Fetch logs
I run all this stuff on anubis.
Pull them into a subdirectory of my temp log processing directory:
subdir=<a subdirectory> mkdir -p ~/tmp/plastic-slices/$subdir/logs logins grep -- -a shmux -c "sed -i -e '/nanosleep failed:/d' zoing-logs/zoing*log" $logins logins cat for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a $login:zoing-logs/ ~/tmp/plastic-slices/$subdir/logs/$login && echo $login ; done & done
Remove the last day's PNG file and the all PNG file, to make sure we re-generate it:
lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/') rm ~/tmp/plastic-slices/$subdir/pngs/*/*/*all*png ~/tmp/plastic-slices/$subdir/pngs/*/*/*daily-$lastday*png
Plot graphs:
firstlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | head -1 | sed -e 's/zoing-\(.*\).log/\1/') lastlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-\(.*\).log/\1/') time python ~/tarvek/generate-graphs.py --progress --mainconfig=~/tmp/plastic-slices/$subdir/slices.json --pairmap=~/tmp/plastic-slices/$subdir/pairmap.json --rootdir=~/tmp/plastic-slices/$subdir --starttime=$firstlog --endtime=$lastlog
Push everything up to the webserver:
rsync -av ~/tmp/plastic-slices/$subdir www.gpolab.bbn.com:/srv/www/plastic-slices/continuation
Checking in
On my laptop, copy down the graphs:
subdir=<a directory> rscpd anubis:tmp/plastic-slices/$subdir/pngs ~/tmp/plastic-slices/$subdir
Identify the last day we have graphs for:
lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/')
Show the per-slice graphs of the most recent day:
gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-daily-$lastday.png
Show the per-host daily graphs for the most recent day:
gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily-$lastday.png
Show the per-slice graphs of the whole run:
gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-all.png
Show the per-host graphs of the whole run:
gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-all.png
Show the per-host daily graphs for all of the days:
gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily*.png
The old way
This is how I used to check in, using grep to scan log files; nowadays I'm using the graphs.
Get a quick summary of the current state of things (based on the last completed run; or change $timestamp to get a different run):
timestamp=$(date -d "now - 1 hour" +%Y%m%d.%H) for subnet in {103..106} do echo -e "--> plastic $subnet\n" for login in $(awk 'NR%2==1' ~/slices/plastic-slices/logins/logins-ps$subnet.txt) do echo -n "$login to " grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }' grep /sec logs/$login/zoing-$timestamp*.log || echo no results echo "" done done for subnet in {107..110} do echo -e "--> plastic $subnet\n" for login in $(awk 'NR%2==0' ~/slices/plastic-slices/logins/logins-ps$subnet.txt) do echo -n $(grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }') echo " to $login" egrep " 0.0-[^ ].+/sec" logs/$login/zoing-$timestamp*.log || echo no results echo "" done done
Use NOX
Run NOX for plastic-101, with the learning switch ('switch') module and LAVI:
subnet=101 port=33$subnet ; (cd /usr/bin && /usr/bin/nox_core --info=/home/jbs/nox/nox-${port}.info -i ptcp:$port switch lavi_switches jsonmessenger=tcpport=11$subnet,sslport=0)
In another window, ask the plastic-101 NOX (via LAVI) what datapaths are connected:
subnet=101 ; nox-console -n localhost -p 11$subnet getnodes