Changes between Version 27 and Version 28 of LAMP/Tutorial

09/22/10 02:27:55 (14 years ago)



  • LAMP/Tutorial

    v27 v28  
    428 Ah! This looks familiar. We can see our test and its parameters, and we also see a 1 week summary of the bandwidth for our test. We have two options of graphs, the Line Graph and Scatter Graph. Let's see both (1 Day).
    430 [[Image(bwctl.png)]]
    431 [[Image(psb_bwctl.png)]]
     428Ah! This looks familiar. We can see our test and its parameters, and we also see a 1 week summary of the bandwidth for our test. We have two options of graphs, the Line Graph and Scatter Graph. Let's see both (1 Day, line and scatter respectively).
     434That looks nice. Seems like ProtoGENI allocated a 100Mb link for our slice. Let's confirm this:
     437# ethtool eth24
     438Settings for eth24:
     439        Supported ports: [ TP MII ]
     440        Supported link modes:   10baseT/Half 10baseT/Full
     441                                100baseT/Half 100baseT/Full
     442        Supports auto-negotiation: Yes
     443        Advertised link modes:  Not reported
     444        Advertised auto-negotiation: No
     445        Speed: 100Mb/s
     449Yes, seems like we're measuring our link throughput pretty accurately. Let's move on to the one-way latency data.
     451==== Visualizing One-way Latency Data  ====
     453We go back to the ''Registered Services'' page and ''Query'' the PSB_OWAMP service on node1.
     457Oops! We have found another bug :). perfSONARBUOY seems to only be exporting the data for the one-way delay tests on the loopback interface. All these bugs should be fixed in RC2 and certainly by the final release (expected around October 10).
     459However, even running a one-way latency test manually shows a couple of problems in our slice.
     462# owping node2
     463Approximately 13.1 seconds until results available
     465--- owping statistics from [node1-link1]:59783 to [node2]:59781 ---
     467100 sent, 0 lost (0.000%), 0 duplicates
     468one-way delay min/median/max = 8.02/9.1/9.29 ms, (err=4.91 ms)
     469one-way jitter = 0.1 ms (P95-P50)
     471--- owping statistics from [node2]:45501 to [node1-link1]:33482 ---
     473one-way delay min/median/max = -7.88/-7.8/2.1 ms, (err=4.91 ms)
     474one-way jitter = 1.1 ms (P95-P50)
     477Ouch, 5ms error and max of 10ms? We have already seen through our Ping latency tests that the '''round-trip''' latency hovers around 2ms; these tests cannot be trusted! Analyzing our node a little bit we can find one of the culprits:
     480 ntpq -p
     481     remote           refid      st t when poll reach   delay   offset  jitter
     483*    2 u   16   64   37    0.153   -4.779   2.687
     484... (other servers that have not peered) ...
     487We are offset by 5ms from our only NTP synchronization source! This will greatly affect precision measurements on our slice. Many factors can contribute to errors on this type of network measurements; only extensive testing on different slices and hardwares will show if they're appropriate for this environment.
     489Unfortunately, nothing to see here, let's move on to the Host Monitoring data.
     492==== Visualizing Host Monitoring Data  ====
     494We have saved the best for last (or maybe you like networking like we do). There are two ways of accessing the host monitoring data that we've collected on our nodes. One is by querying the SNMP MA that exports the data with the perfSONAR format. The other is to go to the Ganglia Web interface on our ''host monitoring collector'' node (in this example it runs on the same node as the LAMP Portal). Let's first try the SNMP MA. We go to the now familiar ''Registered Services'' page (or click on the Host Monitoring link on the side bar, which takes us there), and ''Query'' the SNMP Service running on the lamp node.
     498We have been greeted with a large, red "proceed at your own risk" warning :). The corresponding interface on the pS-Performance Toolkit only queried the network utilization (bytes/sec) eventType on the SNMP MA. We are extending this interface to query all of the host monitoring metrics collected by Ganglia. This is still an early prototype, but should be functional (we are keen on receiving bug reports!). Let's pick a random metric, say amount of processes running on the CPU, and open its Flash Graph. Note that you can read a description for the eventType by rest the mouse on top of it.
     501The perfAdmin visualization tool above allowed us to verify that our data is indeed being exported by the SNMP MA using the perfSONAR schema and API. However, the Ganglia Web visualization tool shows all the hosts monitoring metrics collected with a comprehensive and robust interface. Thus, for host monitoring in specific, we suggest this tool for visualizing the instrumentation on the slice. We can access the Ganglia Web through the URL https://<collector node>/ganglia/.
     505On the front page we have a summary of our "cluster", in our case the whole slice. We can select a node from the dropdown box to see all the metrics we're collecting on each node. Let's select ''node2''.
     509We can see clear periodic spikes on CPU load and network traffic. These spikes most likely correspond to our scheduled Throughput tests. Note that the network traffic graph shows only 5MB/s, even though we were getting 90Mb/s