wiki:JBSsandbox/PlasticSlices

Version 3 (modified by Josh Smift, 8 years ago) (diff)

--

Plastic Slices sandbox page

Random notes for Plastic Slices stuff.

A lot of things that used to be here are now on my general "slice notes" sandbox page. What's left should in theory be pretty specific to Plastic Slices.

Ending and starting a run

This is how I end one Plastic Slices run, and start the next. These commands use techniques from my "slice notes" sandbox page, so before doing all this, I should double-check that this copy of those techniques is still accurate.

Ending

Set the list of slices:

slices=$(echo ps{103..110})

Fetch my user and slice credentials:

(cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done)

Deactivate Zoing (so it won't launch another set of experiments at the top of the hour):

logins cat
shmux -c "zoing deactivate" $logins

Wait for the current run to finish, typically 56 minutes past the hour.

Check that all sources are shut down ("-a" nodes):

logins grep -- -a
shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins

Reset everything, and make sure that everything is shut down:

logins cat
shmux -c "zoing reset" $logins
shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins

Fetch logs one last time, and upload them to the webserver.

Delete all of the slivers, to start the next run with a clean slate:

declare -A rspecs
for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done
for slicename in $slices ; do echo ${rspecs[$slicename]} ; done
for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am deletesliver $slicename & done ; sleep 30s ; done

Confirm that everything's gone:

for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am sliverstatus $slicename |& egrep -q -i '(code 12|code 2)' || echo "unexpected sliver in $slicename at $am" & done ; sleep 5s ; done | grep unexpected | grep -v omni

Update the wiki page for this run with any final details (e.g. when the run ended).

Starting

Update ~/slices/plastic-slices/config/slices.json with any changes for this run.

Update ~/slices/plastic-slices/config/pairmap.json with any changs for this run. Either edit the file by hand, or generate a new random one:

cd ~/slices/plastic-slices
python ~/tarvek/generate-pairmap.py ./config/slices.json ./config/pairmap.json

Review and edit that if necessary.

Generate the rest of the configuration:

python ~/tarvek/generate-experiment-config.py ./config/slices.json ./config/pairmap.json ./wiki-source.txt
svn rm $(svn st | grep ^! | awk '{ print $2; }')
svn add $(svn st | grep ? | awk '{ print $2; }')

Review to make sure that things look right, then commit that to Subversion.

Set the list of slices:

slices=$(echo ps{103..110})

Fetch my user and slice credentials:

(cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done)

Create the slivers:

declare -A rspecs
for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done
for slicename in $slices ; do echo ${rspecs[$slicename]} ; done
for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am createsliver $slicename $rspec & done ; sleep 5m ; done

Confirm that all of the slivers' expiration dates are as expected, and renew anything that isn't using my general slice notes.

Using my general slice notes, get login info.

Using my general slice notes, do other login-related stuff.

Using my general slice notes, test connectivity. Trying "the fast way" from one node in each slice is probably good enough, but "the reliable way" will work too if you're not in a hurry.

Copy in Zoing stuff:

shmux -c 'mkdir -p bin' $logins
for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/zoing/zoing bin/zoing ; done
for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a ~/slices/plastic-slices/zoing/zoingrc-$login $login:.zoingrc && echo $login ; done & done

Copy in traffic-shaping stuff:

for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/tc-shape-eth1-ten-mbps tc-shape-eth1-ten-mbps ; done
shmux -c 'sudo chown root:root tc-shape-eth1-ten-mbps' $logins
shmux -c 'sudo mv tc-shape-eth1-ten-mbps /etc/init.d/tc-shape-eth1-ten-mbps' $logins
shmux -c 'sudo ln -s ../init.d/tc-shape-eth1-ten-mbps /etc/rc2.d/S99tc-shape-eth1-ten-mbps' $logins
shmux -c 'sudo service tc-shape-eth1-ten-mbps start' $logins

Fire up Zoing:

shmux -c "zoing activate" $logins

Create a directory for logs, and copy other files into it:

subdir=<a subdirectory>

mkdir -p ~/tmp/plastic-slices/$subdir/logs

cp ~/slices/plastic-slices/config/*json ~/tmp/plastic-slices/$subdir 
rscpc ~/slices/plastic-slices/hosts/ ~/tmp/plastic-slices/$subdir/00hosts
rscpc ~/slices/plastic-slices/logins/ ~/tmp/plastic-slices/$subdir/00logins
rscpc ~/slices/plastic-slices/ssh_config/ ~/tmp/plastic-slices/$subdir/00ssh_config

Create the wiki page.

Send mail to gpo-tech letting folks know.

To do

Here are some random things I've jotted down that I'd like to do:

  • Add a way to positively confirm that slivers *don't* exist
  • Add a way to show more concise sliver status -- not four+ lines per sliver
  • Add a way to supply a paramter to test against, like "this date" for expiry
  • Add a way to save all omni output in files, so I can look up what happened if something goes wrong
  • Maybe use vxargs to parallelize omni for some things? Sliver deletion takes freakin' forever. Or just a loop, do ten slices in parallel, although this won't help for single big slices. Maybe parallelize across one slice would be better, so it hits all the aggregates once, then again, etc.

Some of those would end up on my "slice notes" sandbox page, but they affect Plastic Slices the most (because of its scale), so they're here for now. Or I might add it to Tarvek, we'll see.

Fetch logs

I run all this stuff on anubis.

Pull them into a subdirectory of my temp log processing directory:

subdir=<a subdirectory>

mkdir -p ~/tmp/plastic-slices/$subdir/logs

logins grep -- -a
shmux -c "sed -i -e '/nanosleep failed:/d' zoing-logs/zoing*log" $logins
logins cat
for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a $login:zoing-logs/ ~/tmp/plastic-slices/$subdir/logs/$login && echo $login ; done & done

Remove the last day's PNG file and the all PNG file, to make sure we re-generate it:

lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/')
rm ~/tmp/plastic-slices/$subdir/pngs/*/*/*all*png ~/tmp/plastic-slices/$subdir/pngs/*/*/*daily-$lastday*png

Plot graphs:

firstlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | head -1 | sed -e 's/zoing-\(.*\).log/\1/')
lastlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-\(.*\).log/\1/')
time python ~/tarvek/generate-graphs.py --progress --mainconfig=~/tmp/plastic-slices/$subdir/slices.json --pairmap=~/tmp/plastic-slices/$subdir/pairmap.json --rootdir=~/tmp/plastic-slices/$subdir --starttime=$firstlog --endtime=$lastlog

Push everything up to the webserver:

rsync -av ~/tmp/plastic-slices/$subdir www.gpolab.bbn.com:/srv/www/plastic-slices/continuation

Checking in

On my laptop, copy down the graphs:

subdir=<a directory>

rscpd anubis:tmp/plastic-slices/$subdir/pngs ~/tmp/plastic-slices/$subdir

Identify the last day we have graphs for:

lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/')

Show the per-slice graphs of the most recent day:

gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-daily-$lastday.png

Show the per-host daily graphs for the most recent day:

gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily-$lastday.png

Show the per-slice graphs of the whole run:

gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-all.png

Show the per-host graphs of the whole run:

gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-all.png

Show the per-host daily graphs for all of the days:

gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily*.png

The old way

This is how I used to check in, using grep to scan log files; nowadays I'm using the graphs.

Get a quick summary of the current state of things (based on the last completed run; or change $timestamp to get a different run):

timestamp=$(date -d "now - 1 hour" +%Y%m%d.%H)

for subnet in {103..106}
do
  echo -e "--> plastic $subnet\n"
  for login in $(awk 'NR%2==1' ~/slices/plastic-slices/logins/logins-ps$subnet.txt)
  do
    echo -n "$login to "
    grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }'
    grep /sec logs/$login/zoing-$timestamp*.log || echo no results
    echo ""
  done
done

for subnet in {107..110}
do
  echo -e "--> plastic $subnet\n"
  for login in $(awk 'NR%2==0' ~/slices/plastic-slices/logins/logins-ps$subnet.txt)
  do
    echo -n $(grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }')
    echo " to $login"
    egrep " 0.0-[^ ].+/sec" logs/$login/zoing-$timestamp*.log || echo no results
    echo ""
  done
done

Use NOX

Run NOX for plastic-101, with the learning switch ('switch') module and LAVI:

subnet=101
port=33$subnet ; (cd /usr/bin && /usr/bin/nox_core --info=/home/jbs/nox/nox-${port}.info -i ptcp:$port switch lavi_switches jsonmessenger=tcpport=11$subnet,sslport=0)

In another window, ask the plastic-101 NOX (via LAVI) what datapaths are connected:

subnet=101 ; nox-console -n localhost -p 11$subnet getnodes