21 | | || '''Step''' || '''State''' || '''Date completed''' || '''Tickets''' || '''Comments''' || |
22 | | || 1 || || || || ready to test || |
23 | | || 2 || || || || ready to test || |
24 | | || 3 || [[Color(orange,Blocked)]] || || || blocked on ability to request OpenFlow resources from InstaGENI AM || |
25 | | || 4 || [[Color(orange,Blocked)]] || || || blocked on 3 || |
26 | | || 5 || [[Color(orange,Blocked)]] || || || blocked on 3 || |
27 | | || 6 || [[Color(orange,Blocked)]] || || || blocked on 3, availability of FOAM || |
28 | | || 7 || [[Color(orange,Blocked)]] || || || blocked on 3 || |
29 | | || 8 || [[Color(orange,Blocked)]] || || || blocked on 3 || |
| 21 | || '''Step''' || '''State''' || '''Date completed''' || '''Tickets''' || '''Comments''' || |
| 22 | || 1 || [[Color(yellow,Completed)]] || || || needs retesting when 3 is retested || |
| 23 | || 2 || || || || needs retesting when 3 is retested || |
| 24 | || 3 || [[Color(yellow,Completed)]] || || || needs retesting once OpenFlow resources are available from InstaGENI AM || |
| 25 | || 4 || [[Color(orange,Blocked)]] || || instaticket:26 || blocked on resolution of MAC reporting issue || |
| 26 | || 5 || [[Color(orange,Blocked)]] || || || blocked on 3 || |
| 27 | || 6 || [[Color(orange,Blocked)]] || || || blocked on 3, availability of FOAM || |
| 28 | || 7 || [[Color(orange,Blocked)]] || || || blocked on 3 || |
| 29 | || 8 || [[Color(orange,Blocked)]] || || || blocked on 3 || |
| 347 | * Per-host view of current state: |
| 348 | * From [https://boss.utah.geniracks.net/nodecontrol_list.php3?showtype=dl360] in red dot mode, i can once again see that pc3 is allocated as phys1 to `pgeni-gpolab-bbn-com/ecgtest`. |
| 349 | * I can see that pc5 is configured as an OpenVZ shared host, but i can't see how many experiments it is running. |
| 350 | * Per-experiment view of current state: |
| 351 | * Browse to [https://boss.utah.geniracks.net/genislices.php] and find one slice running on the Component Manager: |
| 352 | {{{ |
| 353 | ID HRN Created Expires |
| 354 | 362 bbn-pgeni.ecgtest (ecgtest) 2012-05-17 08:12:37 2012-05-18 18:00:00 |
| 355 | }}} |
| 356 | * Click `(ecgtest)` to view the details of that experiment at [https://boss.utah.geniracks.net/showexp.php3?experiment=363#details]. |
| 357 | * This shows what nodes it's using, including that its VM has been put on pc5: |
| 358 | {{{ |
| 359 | Physical Node Mapping: |
| 360 | ID Type OS Physical |
| 361 | --------------- ------------ --------------- ------------ |
| 362 | phys1 dl360 FEDORA15-STD pc3 |
| 363 | virt1 pcvm OPENVZ-STD pcvm5-1 (pc5) |
| 364 | }}} |
| 365 | * Here are some other interesting things: |
| 366 | {{{ |
| 367 | IP Port allocation: |
| 368 | Low High |
| 369 | --------------- ------------ |
| 370 | 30000 30255 |
| 371 | |
| 372 | SSHD Port allocation ('ssh -p portnum'): |
| 373 | ID Port SSH command |
| 374 | --------------- ---------- ---------------------- |
| 375 | |
| 376 | Physical Lan/Link Mapping: |
| 377 | ID Member IP MAC NodeID |
| 378 | --------------- --------------- --------------- -------------------- --------- |
| 379 | phys1-virt1-0 phys1:0 10.10.1.1 e8:39:35:b1:4e:8a pc3 |
| 380 | 1/1 <-> 1/34 procurve2 |
| 381 | phys1-virt1-0 virt1:0 10.10.1.2 pcvm5-1 |
| 382 | }}} |
| 383 | * That last one is mysterious, because the experimenter's sliverstatus command contains: |
| 384 | {{{ |
| 385 | { 'attributes': |
| 386 | { 'client_id': 'phys1:if0', |
| 387 | 'component_id': 'urn:publicid:IDN+utah.geniracks.net+interface+pc3:eth1', |
| 388 | 'mac_address': 'e83935b14e8a', |
| 389 | ... |
| 390 | { 'attributes': |
| 391 | { 'client_id': 'virt1:if0', |
| 392 | 'component_id': 'urn:publicid:IDN+utah.geniracks.net+interface+pc5:eth1', |
| 393 | 'mac_address': '00000a0a0102', |
| 394 | }}} |
| 395 | * So i think it should be possible for the admin interface to know that virtual mac address too. |
| 396 | * Huh, but also, that mac address reported in sliverstatus is in fact wrong. Let me summarize: |
| 397 | {{{ |
| 398 | MAC addrs reported for phys1:0 == 10.10.1.1 |
| 399 | E8:39:35:B1:4E:8A: from /sbin/ifconfig eth1 run on phys1 (authoritative) |
| 400 | e83935b14e8a: from sliverstatus as experimenter (correct) |
| 401 | e8:39:35:b1:4e:8a: from: https://boss.utah.geniracks.net/showexp.php3?experiment=363#details (correct) |
| 402 | |
| 403 | MAC addrs reported for virt1:0 == 10.10.1.2 |
| 404 | 82:01:0A:0A:01:02: from /sbin/ifconfig mv1.1 run on virt1 (authoritative) |
| 405 | 00000a0a0102: from sliverstatus as experimenter (incorrect: first four digits are wrong) |
| 406 | - : from https://boss.utah.geniracks.net/showexp.php3?experiment=363#details (not reported) |
| 407 | }}} |
| 408 | I opened [instaticket:26] for this issue. |
| 409 | * Now, use the OpenVZ host itself to view activity: |
| 410 | * As an admin, login to pc5.utah.geniracks.net |
| 411 | * Poking around, i was led to a couple of prospective data sources: |
| 412 | * Logs in `/var/emulab` |
| 413 | * The `vzctl` RPM, containing a number of OpenVZ control commands |
| 414 | * The latter seems to give a list of running VMs easily: |
| 415 | {{{ |
| 416 | vhost1,[/var/emulab],05:00(1)$ sudo vzlist -a |
| 417 | CTID NPROC STATUS IP_ADDR HOSTNAME |
| 418 | 1 15 running - virt1.ecgtest.pgeni-gpolab-bbn-com.utah.geniracks.net |
| 419 | }}} |
| 420 | * I also see a command to figure out which container is running a given PID. Suppose i run top and am concerned about an sshd process chewing up all system CPU: |
| 421 | {{{ |
| 422 | PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND |
| 423 | 51817 20001 20 0 116m 3780 872 R 94.4 0.0 0:05.74 sshd |
| 424 | }}} |
| 425 | * Since the user is numeric, i can assume this process is probably running in a container, so find out which one: |
| 426 | {{{ |
| 427 | vhost1,[/var/emulab],05:05(0)$ sudo vzpid 51766 |
| 428 | Pid CTID Name |
| 429 | 51766 1 sshd |
| 430 | chaos 51804 51163 0 05:04 pts/0 00:00:00 grep --color=auto ssh |
| 431 | }}} |
| 432 | * and then look up the container info as above. |
| 433 | * The files in `/var/emulab` give details about how each experiment was created. In particular: |
| 434 | {{{ |
| 435 | Information about experiment startup attributes: |
| 436 | /var/emulab/boot/tmcc.pcvm5-1/ |
| 437 | /var/emulab/boot/tmcc.pcvm5-2/ |
| 438 | |
| 439 | Logs of experiment progress: |
| 440 | /var/emulab/logs/tbvnode-pcvm5-1.log |
| 441 | /var/emulab/logs/tbvnode-pcvm5-2.log |
| 442 | /var/emulab/logs/tmccproxy.pcvm5-1.log |
| 443 | /var/emulab/logs/tmccproxy.pcvm5-2.log |
| 444 | }}} |
| 445 | * These may be useful for running and terminated experiments ''if'' the context IDs are unique. |
| 446 | |
| 447 | ==== Side test: are experiment context IDs unique over time on an OpenVZ server? ==== |
| 448 | |
| 449 | * rspec to create a single OpenVZ container: |
| 450 | {{{ |
| 451 | jericho,[~],07:12(0)$ cat IG-MON-nodes-E.rspec |
| 452 | <?xml version="1.0" encoding="UTF-8"?> |
| 453 | <!-- This rspec will reserve one openvz node. It should work on any |
| 454 | Emulab which has nodes available and supports OpenVZ. --> |
| 455 | <rspec xmlns="http://www.geni.net/resources/rspec/3" |
| 456 | xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" |
| 457 | xsi:schemaLocation="http://www.geni.net/resources/rspec/3 |
| 458 | http://www.geni.net/resources/rspec/3/request.xsd" |
| 459 | type="request"> |
| 460 | |
| 461 | <node client_id="virt1" exclusive="false"> |
| 462 | <sliver_type name="emulab-openvz" /> |
| 463 | </node> |
| 464 | </rspec> |
| 465 | }}} |
| 466 | * use existing slice `ecgtest2` to create a sliver: |
| 467 | {{{ |
| 468 | jericho,[~],07:13(0)$ omni -a http://www.utah.geniracks.net/protogeni/xmlrpc/am |
| 469 | createsliver ecgtest2 IG-MON-nodes-E.rspec |
| 470 | INFO:omni:Loading config file /home/chaos/omni/omni_pgeni |
| 471 | INFO:omni:Using control framework pg |
| 472 | INFO:omni:Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires within 1 day on 2012-05-19 10:30:51 UTC |
| 473 | INFO:omni:Creating sliver(s) from rspec file IG-MON-nodes-E.rspec for slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 |
| 474 | INFO:omni:Asked http://www.utah.geniracks.net/protogeni/xmlrpc/am to reserve resources. Result: |
| 475 | INFO:omni:<?xml version="1.0" ?> |
| 476 | INFO:omni:<!-- Reserved resources for: |
| 477 | Slice: ecgtest2 |
| 478 | At AM: |
| 479 | URL: http://www.utah.geniracks.net/protogeni/xmlrpc/am |
| 480 | --> |
| 481 | INFO:omni:<rspec type="manifest" xmlns="http://www.geni.net/resources/rspec/3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.geni.net/resources/rspec/3 http://www.geni.net/resources/rspec/3/manifest.xsd"> |
| 482 | |
| 483 | <node client_id="virt1" component_id="urn:publicid:IDN+utah.geniracks.net+node+pc5" component_manager_id="urn:publicid:IDN+utah.geniracks.net+authority+cm" exclusive="false" sliver_id="urn:publicid:IDN+utah.geniracks.net+sliver+384"> |
| 484 | <sliver_type name="emulab-openvz"/> |
| 485 | <rs:vnode name="pcvm5-2" xmlns:rs="http://www.protogeni.net/resources/rspec/ext/emulab/1"/> <host name="virt1.ecgtest2.pgeni-gpolab-bbn-com.utah.geniracks.net"/> <services> <login authentication="ssh-keys" hostname="pc5.utah.geniracks.net" port="30266" username="chaos"/> </services> </node> |
| 486 | </rspec> |
| 487 | INFO:omni: ------------------------------------------------------------ |
| 488 | INFO:omni: Completed createsliver: |
| 489 | |
| 490 | Options as run: |
| 491 | aggregate: http://www.utah.geniracks.net/protogeni/xmlrpc/am |
| 492 | configfile: /home/chaos/omni/omni_pgeni |
| 493 | framework: pg |
| 494 | native: True |
| 495 | |
| 496 | Args: createsliver ecgtest2 IG-MON-nodes-E.rspec |
| 497 | |
| 498 | Result Summary: Slice urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+ecgtest2 expires within 1 day(s) on 2012-05-19 10:30:51 UTC |
| 499 | Reserved resources on http://www.utah.geniracks.net/protogeni/xmlrpc/am. |
| 500 | INFO:omni: ============================================================ |
| 501 | }}} |
| 502 | |
| 503 | Summary: this means that VM IDs are reused. |
| 504 | |
| 505 | At this point, i was going to gather more information about logs, when the Utah rack became totally unavailable: i was no longer able to use my shell sessions to any machines in the rack, and got ping timeouts to boss. |
| 506 | |
| 507 | After about 8 minutes, things became available again. I went looking for logs of my dataplane file copy activity to see whether the dataplane had been interrupted, at which point i found out that sshd on the dataplane does not appear to be logged anywhere, either in `/var/log` within the container or on pc5 itself. That's not a rack requirement, but it seems non-ideal for experimenters. I opened [instaticket:27] to report it. |
| 508 | |