| 1 | [[PageOutline]] |
| 2 | |
| 3 | = Plastic Slices sandbox page = |
| 4 | |
| 5 | Random notes for Plastic Slices stuff. |
| 6 | |
| 7 | A lot of things that used to be here are now on my general [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page]. What's left should in theory be pretty specific to Plastic Slices. |
| 8 | |
| 9 | = Ending and starting a run = |
| 10 | |
| 11 | This is how I end one Plastic Slices run, and start the next. These commands use techniques from my [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page], so before doing all this, I should double-check that this copy of those techniques is still accurate. |
| 12 | |
| 13 | == Ending == |
| 14 | |
| 15 | Set the list of slices: |
| 16 | |
| 17 | {{{ |
| 18 | slices=$(echo ps{103..110}) |
| 19 | }}} |
| 20 | |
| 21 | Fetch my user and slice credentials: |
| 22 | |
| 23 | {{{ |
| 24 | (cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done) |
| 25 | }}} |
| 26 | |
| 27 | Deactivate Zoing (so it won't launch another set of experiments at the top of the hour): |
| 28 | |
| 29 | {{{ |
| 30 | logins cat |
| 31 | shmux -c "zoing deactivate" $logins |
| 32 | }}} |
| 33 | |
| 34 | Wait for the current run to finish, typically 56 minutes past the hour. |
| 35 | |
| 36 | Check that all sources are shut down ("-a" nodes): |
| 37 | |
| 38 | {{{ |
| 39 | logins grep -- -a |
| 40 | shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins |
| 41 | }}} |
| 42 | |
| 43 | Reset everything, and make sure that everything is shut down: |
| 44 | |
| 45 | {{{ |
| 46 | logins cat |
| 47 | shmux -c "zoing reset" $logins |
| 48 | shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins |
| 49 | }}} |
| 50 | |
| 51 | [#Fetchlogs Fetch logs] one last time, and upload them to the webserver. |
| 52 | |
| 53 | Delete all of the slivers, to start the next run with a clean slate: |
| 54 | |
| 55 | {{{ |
| 56 | declare -A rspecs |
| 57 | for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done |
| 58 | for slicename in $slices ; do echo ${rspecs[$slicename]} ; done |
| 59 | for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am deletesliver $slicename & done ; sleep 30s ; done |
| 60 | }}} |
| 61 | |
| 62 | Confirm that everything's gone: |
| 63 | |
| 64 | {{{ |
| 65 | for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am sliverstatus $slicename |& egrep -q -i '(code 12|code 2)' || echo "unexpected sliver in $slicename at $am" & done ; sleep 5s ; done | grep unexpected | grep -v omni |
| 66 | }}} |
| 67 | |
| 68 | Update the wiki page for this run with any final details (e.g. when the run ended). |
| 69 | |
| 70 | == Starting == |
| 71 | |
| 72 | Update ~/slices/plastic-slices/config/slices.json with any changes for this run. |
| 73 | |
| 74 | Generate a new pairmap: |
| 75 | |
| 76 | {{{ |
| 77 | cd ~/slices/plastic-slices |
| 78 | python ~/tarvek/generate-pairmap.py ./config/slices.json ./config/pairmap.json |
| 79 | }}} |
| 80 | |
| 81 | Review and edit that if necessary. |
| 82 | |
| 83 | Generate the rest of the configuration: |
| 84 | |
| 85 | {{{ |
| 86 | python ~/tarvek/generate-experiment-config.py ./config/slices.json ./config/pairmap.json ./wiki-source.txt |
| 87 | svn rm $(svn st | grep ^! | awk '{ print $2; }') |
| 88 | svn add $(svn st | grep ? | awk '{ print $2; }') |
| 89 | }}} |
| 90 | |
| 91 | Review to make sure that things look right, then commit that to Subversion. |
| 92 | |
| 93 | Set the list of slices: |
| 94 | |
| 95 | {{{ |
| 96 | slices=$(echo ps{103..110}) |
| 97 | }}} |
| 98 | |
| 99 | Fetch my user and slice credentials: |
| 100 | |
| 101 | {{{ |
| 102 | (cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done) |
| 103 | }}} |
| 104 | |
| 105 | Create the slivers: |
| 106 | |
| 107 | {{{ |
| 108 | declare -A rspecs |
| 109 | for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done |
| 110 | for slicename in $slices ; do echo ${rspecs[$slicename]} ; done |
| 111 | for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am createsliver $slicename $rspec & done ; sleep 5m ; done |
| 112 | }}} |
| 113 | |
| 114 | Confirm that all of the slivers' expiration dates are as expected, and [wiki:JBSsandbox/SliceNotes#Forlotsofslivers renew anything that isn't] using my general slice notes. |
| 115 | |
| 116 | Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Getlogininfo get login info]. |
| 117 | |
| 118 | Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Loginstuff do other login-related stuff]. |
| 119 | |
| 120 | Using my general slice notes, [wiki:JBSsandbox/SliceNotes#Pingteststuff test connectivity]. Trying "the fast way" from one node in each slice is probably good enough, but "the reliable way" will work too if you're not in a hurry. |
| 121 | |
| 122 | Copy in Zoing stuff: |
| 123 | |
| 124 | {{{ |
| 125 | shmux -c 'mkdir -p bin' $logins |
| 126 | for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/zoing/zoing bin/zoing ; done |
| 127 | for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a ~/slices/plastic-slices/zoing/zoingrc-$login $login:.zoingrc && echo $login ; done & done |
| 128 | }}} |
| 129 | |
| 130 | Copy in traffic-shaping stuff: |
| 131 | |
| 132 | {{{ |
| 133 | for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; export PSSH_ERRDIR=~/tmp/prsync-errors/$slicename ; prsync -h $loginfile -a ~/slices/plastic-slices/tc-shape-eth1-ten-mbps tc-shape-eth1-ten-mbps ; done |
| 134 | shmux -c 'sudo chown root:root tc-shape-eth1-ten-mbps' $logins |
| 135 | shmux -c 'sudo mv tc-shape-eth1-ten-mbps /etc/init.d/tc-shape-eth1-ten-mbps' $logins |
| 136 | shmux -c 'sudo ln -s ../init.d/tc-shape-eth1-ten-mbps /etc/rc2.d/S99tc-shape-eth1-ten-mbps' $logins |
| 137 | shmux -c 'sudo service tc-shape-eth1-ten-mbps start' $logins |
| 138 | }}} |
| 139 | |
| 140 | Fire up Zoing: |
| 141 | |
| 142 | {{{ |
| 143 | shmux -c "zoing activate" $logins |
| 144 | }}} |
| 145 | |
| 146 | Create a directory for logs, and copy other files into it: |
| 147 | |
| 148 | {{{ |
| 149 | subdir=<a subdirectory> |
| 150 | |
| 151 | mkdir -p ~/tmp/plastic-slices/$subdir/logs |
| 152 | |
| 153 | cp ~/slices/plastic-slices/config/*json ~/tmp/plastic-slices/$subdir |
| 154 | rscpc ~/slices/plastic-slices/hosts/ ~/tmp/plastic-slices/$subdir/00hosts |
| 155 | rscpc ~/slices/plastic-slices/logins/ ~/tmp/plastic-slices/$subdir/00logins |
| 156 | rscpc ~/slices/plastic-slices/ssh_config/ ~/tmp/plastic-slices/$subdir/00ssh_config |
| 157 | }}} |
| 158 | |
| 159 | Create the wiki page. |
| 160 | |
| 161 | Send mail to gpo-tech letting folks know. |
| 162 | |
| 163 | = To do = |
| 164 | |
| 165 | Here are some random things I've jotted down that I'd like to do: |
| 166 | |
| 167 | * Add a way to positively confirm that slivers *don't* exist |
| 168 | * Add a way to show more concise sliver status -- not four+ lines per sliver |
| 169 | * Add a way to supply a paramter to test against, like "this date" for expiry |
| 170 | * Add a way to save all omni output in files, so I can look up what happened if something goes wrong |
| 171 | * Maybe use vxargs to parallelize omni for some things? Sliver deletion takes freakin' forever. Or just a loop, do ten slices in parallel, although this won't help for single big slices. Maybe parallelize across one slice would be better, so it hits all the aggregates once, then again, etc. |
| 172 | |
| 173 | Some of those would end up on my [wiki:JBSsandbox/SliceNotes "slice notes" sandbox page], but they affect Plastic Slices the most (because of its scale), so they're here for now. Or I might add it to Tarvek, we'll see. |
| 174 | |
| 175 | = Fetch logs = |
| 176 | |
| 177 | I run all this stuff on anubis. |
| 178 | |
| 179 | Pull them into a subdirectory of my temp log processing directory: |
| 180 | |
| 181 | {{{ |
| 182 | subdir=<a subdirectory> |
| 183 | |
| 184 | mkdir -p ~/tmp/plastic-slices/$subdir/logs |
| 185 | |
| 186 | logins cat |
| 187 | shmux -c "sed -i -e '/nanosleep failed:/d' zoing-logs/zoing*log" $logins |
| 188 | for slicename in $slices ; do loginfile=~/tmp/logins-$slicename.txt ; for login in $(cat $loginfile) ; do rsync -a $login:zoing-logs/ ~/tmp/plastic-slices/$subdir/logs/$login && echo $login ; done & done |
| 189 | }}} |
| 190 | |
| 191 | Remove the last day's PNG file and the all PNG file, to make sure we re-generate it: |
| 192 | |
| 193 | {{{ |
| 194 | lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/') |
| 195 | rm ~/tmp/plastic-slices/$subdir/pngs/*/*/*all*png ~/tmp/plastic-slices/$subdir/pngs/*/*/*daily-$lastday*png |
| 196 | }}} |
| 197 | |
| 198 | Plot graphs: |
| 199 | |
| 200 | {{{ |
| 201 | firstlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | head -1 | sed -e 's/zoing-\(.*\).log/\1/') |
| 202 | lastlog=$(ls -1 ~/tmp/plastic-slices/$subdir/logs/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-\(.*\).log/\1/') |
| 203 | time python ~/tarvek/generate-graphs.py --progress --mainconfig=~/tmp/plastic-slices/$subdir/slices.json --pairmap=~/tmp/plastic-slices/$subdir/pairmap.json --rootdir=~/tmp/plastic-slices/$subdir --starttime=$firstlog --endtime=$lastlog |
| 204 | }}} |
| 205 | |
| 206 | Push everything up to the webserver: |
| 207 | |
| 208 | {{{ |
| 209 | rsync -av ~/tmp/plastic-slices/$subdir www.gpolab.bbn.com:/srv/www/plastic-slices/continuation |
| 210 | }}} |
| 211 | |
| 212 | == Checking in == |
| 213 | |
| 214 | On my laptop, copy down the graphs: |
| 215 | |
| 216 | {{{ |
| 217 | subdir=<a directory> |
| 218 | |
| 219 | rscpd anubis:tmp/plastic-slices/$subdir/pngs ~/tmp/plastic-slices/$subdir |
| 220 | }}} |
| 221 | |
| 222 | Identify the last day we have graphs for: |
| 223 | |
| 224 | {{{ |
| 225 | lastday=$(ls -1 ~/tmp/plastic-slices/$subdir/pngs/hosts/bbn-ig-ps104-b | tail -1 | sed -e 's/zoing-daily-\(.*\).png/\1/') |
| 226 | }}} |
| 227 | |
| 228 | Show the per-slice graphs of the most recent day: |
| 229 | |
| 230 | {{{ |
| 231 | gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-daily-$lastday.png |
| 232 | }}} |
| 233 | |
| 234 | Show the per-host daily graphs for the most recent day: |
| 235 | |
| 236 | {{{ |
| 237 | gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily-$lastday.png |
| 238 | }}} |
| 239 | |
| 240 | Show the per-slice graphs of the whole run: |
| 241 | |
| 242 | {{{ |
| 243 | gq ~/tmp/plastic-slices/$subdir/pngs/slices/*/zoing-all.png |
| 244 | }}} |
| 245 | |
| 246 | Show the per-host graphs of the whole run: |
| 247 | |
| 248 | {{{ |
| 249 | gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-all.png |
| 250 | }}} |
| 251 | |
| 252 | Show the per-host daily graphs for all of the days: |
| 253 | |
| 254 | {{{ |
| 255 | gq ~/tmp/plastic-slices/$subdir/pngs/hosts/*-b/zoing-daily*.png |
| 256 | }}} |
| 257 | |
| 258 | === The old way === |
| 259 | |
| 260 | This is how I used to check in, using grep to scan log files; nowadays I'm using the graphs. |
| 261 | |
| 262 | Get a quick summary of the current state of things (based on the last completed run; or change $timestamp to get a different run): |
| 263 | |
| 264 | {{{ |
| 265 | timestamp=$(date -d "now - 1 hour" +%Y%m%d.%H) |
| 266 | |
| 267 | for subnet in {103..106} |
| 268 | do |
| 269 | echo -e "--> plastic $subnet\n" |
| 270 | for login in $(awk 'NR%2==1' ~/slices/plastic-slices/logins/logins-ps$subnet.txt) |
| 271 | do |
| 272 | echo -n "$login to " |
| 273 | grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }' |
| 274 | grep /sec logs/$login/zoing-$timestamp*.log || echo no results |
| 275 | echo "" |
| 276 | done |
| 277 | done |
| 278 | |
| 279 | for subnet in {107..110} |
| 280 | do |
| 281 | echo -e "--> plastic $subnet\n" |
| 282 | for login in $(awk 'NR%2==0' ~/slices/plastic-slices/logins/logins-ps$subnet.txt) |
| 283 | do |
| 284 | echo -n $(grep "connected with" logs/$login/zoing-$timestamp*.log | awk '{ print $(NF-2); }') |
| 285 | echo " to $login" |
| 286 | egrep " 0.0-[^ ].+/sec" logs/$login/zoing-$timestamp*.log || echo no results |
| 287 | echo "" |
| 288 | done |
| 289 | done |
| 290 | }}} |
| 291 | |
| 292 | = Use NOX = |
| 293 | |
| 294 | Run NOX for plastic-101, with the learning switch ('switch') module and LAVI: |
| 295 | |
| 296 | {{{ |
| 297 | subnet=101 |
| 298 | port=33$subnet ; (cd /usr/bin && /usr/bin/nox_core --info=/home/jbs/nox/nox-${port}.info -i ptcp:$port switch lavi_switches jsonmessenger=tcpport=11$subnet,sslport=0) |
| 299 | }}} |
| 300 | |
| 301 | In another window, ask the plastic-101 NOX (via LAVI) what datapaths are connected: |
| 302 | |
| 303 | {{{ |
| 304 | subnet=101 ; nox-console -n localhost -p 11$subnet getnodes |
| 305 | }}} |