38 | | |
39 | | == Ending == |
40 | | |
41 | | Make sure your copy of the syseng Subversion repository is up to date and that you don't have uncommitted changes there. Change into your .../syseng directory, and run |
42 | | |
43 | | {{{ |
44 | | svn update |
45 | | svn status |
46 | | }}} |
47 | | |
48 | | Set the list of slices: |
49 | | |
50 | | {{{ |
51 | | slices=$(echo ps{103..110}) |
52 | | }}} |
53 | | |
54 | | Fetch your user and slice credentials: |
55 | | |
56 | | {{{ |
57 | | (cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done) |
58 | | }}} |
59 | | |
60 | | Deactivate Zoing (so it won't launch another set of experiments at the top of the hour): |
61 | | |
62 | | {{{ |
63 | | logins cat |
64 | | shmux -c "zoing deactivate" $logins |
65 | | }}} |
66 | | |
67 | | Wait for the current run to finish, typically 56 minutes past the hour. |
68 | | |
69 | | Check that all sources are shut down ("-a" nodes): |
70 | | |
71 | | {{{ |
72 | | logins grep -- -a |
73 | | shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins |
74 | | }}} |
75 | | |
76 | | Reset everything, and make sure that everything is shut down: |
77 | | |
78 | | {{{ |
79 | | logins cat |
80 | | shmux -c "zoing reset" $logins |
81 | | shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins |
82 | | }}} |
83 | | |
84 | | [#Fetchlogs Fetch logs] one last time, and upload them to the webserver. |
85 | | |
86 | | Delete all of the slivers, to start the next run with a clean slate: |
87 | | |
88 | | {{{ |
89 | | declare -A rspecs |
90 | | for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done |
91 | | for slicename in $slices ; do echo ${rspecs[$slicename]} ; done |
92 | | for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am deletesliver $slicename & done ; sleep 30s ; done |
93 | | }}} |
94 | | |
95 | | Confirm that everything's gone: |
96 | | |
97 | | {{{ |
98 | | for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am sliverstatus $slicename |& egrep -q -i '(code 12|code 2)' || echo "unexpected sliver in $slicename at $am" & done ; sleep 5s ; done | grep unexpected | grep -v omni |
99 | | }}} |
100 | | |
101 | | Update the wiki page for this run with any final details (e.g. when the run ended). |
| 335 | == Ending == |
| 336 | |
| 337 | Make sure your copy of the syseng Subversion repository is up to date and that you don't have uncommitted changes there. Change into your .../syseng directory, and run |
| 338 | |
| 339 | {{{ |
| 340 | svn update |
| 341 | svn status |
| 342 | }}} |
| 343 | |
| 344 | Set the list of slices: |
| 345 | |
| 346 | {{{ |
| 347 | slices=$(echo ps{103..110}) |
| 348 | }}} |
| 349 | |
| 350 | Fetch your user and slice credentials: |
| 351 | |
| 352 | {{{ |
| 353 | (cd ~/.gcf ; omni getusercred -o ; for slicename in $slices ; do omni getslicecred $slicename -o ; done) |
| 354 | }}} |
| 355 | |
| 356 | Deactivate Zoing (so it won't launch another set of experiments at the top of the hour): |
| 357 | |
| 358 | {{{ |
| 359 | logins cat |
| 360 | shmux -c "zoing deactivate" $logins |
| 361 | }}} |
| 362 | |
| 363 | Wait for the current run to finish, typically 56 minutes past the hour. |
| 364 | |
| 365 | Check that all sources are shut down ("-a" nodes): |
| 366 | |
| 367 | {{{ |
| 368 | logins grep -- -a |
| 369 | shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins |
| 370 | }}} |
| 371 | |
| 372 | Reset everything, and make sure that everything is shut down: |
| 373 | |
| 374 | {{{ |
| 375 | logins cat |
| 376 | shmux -c "zoing reset" $logins |
| 377 | shmux -c "zoing status | grep -v -- '-active -cron -running -processes' || true" $logins |
| 378 | }}} |
| 379 | |
| 380 | [#Fetchlogs Fetch logs] one last time, and upload them to the webserver. |
| 381 | |
| 382 | Delete all of the slivers, to start the next run with a clean slate: |
| 383 | |
| 384 | {{{ |
| 385 | declare -A rspecs |
| 386 | for slicename in $slices ; do rspecs[$slicename]=$(ls -1 ~/rspecs/request/$slicename/*.rspec) ; done |
| 387 | for slicename in $slices ; do echo ${rspecs[$slicename]} ; done |
| 388 | for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am deletesliver $slicename & done ; sleep 30s ; done |
| 389 | }}} |
| 390 | |
| 391 | Confirm that everything's gone: |
| 392 | |
| 393 | {{{ |
| 394 | for slicename in $slices ; do for rspec in ${rspecs[$slicename]} ; do somni $slicename $rspec ; omni --usercredfile=$HOME/.gcf/$USER-geni-usercred.xml --slicecredfile=$HOME/.gcf/$slicename-cred.xml -a $am sliverstatus $slicename |& egrep -q -i '(code 12|code 2)' || echo "unexpected sliver in $slicename at $am" & done ; sleep 5s ; done | grep unexpected | grep -v omni |
| 395 | }}} |
| 396 | |
| 397 | Update the wiki page for this run with any final details (e.g. when the run ended). |
| 398 | |
| 452 | From the wiki page for a run, browse to the directory with graphs and logs for that run. Look at: |
| 453 | |
| 454 | * The page with a graph for traffic to each destination host for the whole run, organized by aggregate. This is a good way to identify aggregates that aren't working well in any slice, due to some aggregate-wide problem (like a connectivity issue). |
| 455 | |
| 456 | * The page with a graph for traffic to each destination host for the whole run, organized by slice. This is a good way to identify slices that aren't working well at any aggregate, due to some slice-wide problem (like a controller issue). |
| 457 | |
| 458 | Some notes about the graphs: |
| 459 | |
| 460 | * On the TCP graphs, large blocks of green with a flat top are good -- they show good throughput flowing. Jagged tops and gaps of white are a bad sign. |
| 461 | * On the UDP graphs, large blocks of white are good -- they show zero packet loss. Any red is a bad sign; red *below* zero indicates a log file with no data, which may be a bad sign, or may be a known issue. |
| 462 | * The graphs with many hosts on a single graph are pretty hard to read, but they can sometimes help you spot other things you might want to look at. |
| 463 | |
| 464 | Don't forget to reload the page after pushing new graphs. |
| 465 | |
| 466 | === The old way === |
| 467 | |
| 468 | This is how I used to check in, downloading graphs to my laptop to view (with an image viewer, 'gq' was an alias to one I liked). |
| 469 | |