Thursday, 22 June 2017

Cyclic latency measurements in stress-ng V0.08.06

The stress-ng logo
The latest release of stress-ng contains a mechanism to measure latencies via a cyclic latency test.  Essentially this is just a loop that cycles around performing high precisions sleeps and measures the (extra overhead) latency taken to perform the sleep compared to expected time.  This loop runs with either one of the Round-Robin (rr) or First-In-First-Out real time scheduling polices.

The cyclic test can be configured to specify the sleep time (in nanoseconds), the scheduling type (rr or fifo),  the scheduling priority (1 to 100) and also the sleep method (explained later).

The first 10,000 latency measurements are used to compute various latency statistics:
  • mean latency (aka the 'average')
  • modal latency (the most 'popular' latency)
  • minimum latency
  • maximum latency
  • standard deviation
  • latency percentiles (25%, 50%, 75%, 90%, 95.40%, 99.0%, 99.5%, 99.9% and 99.99%
  • latency distribution (enabled with the --cyclic-dist option)
The latency percentiles indicate the latency at which a percentage of the samples fall into.  For example, the 99% percentile for the 10,000 samples is the latency at which 9,900 samples are equal to or below.

The latency distribution is shown when the --cyclic-dist option is used; one has to specify the distribution interval in nanoseconds and up to the first 100 values in the distribution are output.

For an idle machine, one can invoke just the cyclic measurements with stress-ng as follows:

 sudo stress-ng --cyclic 1 --cyclic-policy fifo \
--cyclic-prio 100 --cyclic-method --clock_ns \
--cyclic-sleep 20000 --cyclic-dist 1000 -t 5  
 stress-ng: info: [27594] dispatching hogs: 1 cyclic  
 stress-ng: info: [27595] stress-ng-cyclic: sched SCHED_FIFO: 20000 ns delay, 10000 samples  
 stress-ng: info: [27595] stress-ng-cyclic:  mean: 5242.86 ns, mode: 4880 ns  
 stress-ng: info: [27595] stress-ng-cyclic:  min: 3050 ns, max: 44818 ns, std.dev. 1142.92  
 stress-ng: info: [27595] stress-ng-cyclic: latency percentiles:  
 stress-ng: info: [27595] stress-ng-cyclic:  25.00%:    4881 us  
 stress-ng: info: [27595] stress-ng-cyclic:  50.00%:    5191 us  
 stress-ng: info: [27595] stress-ng-cyclic:  75.00%:    5261 us  
 stress-ng: info: [27595] stress-ng-cyclic:  90.00%:    5368 us  
 stress-ng: info: [27595] stress-ng-cyclic:  95.40%:    6857 us  
 stress-ng: info: [27595] stress-ng-cyclic:  99.00%:    8942 us  
 stress-ng: info: [27595] stress-ng-cyclic:  99.50%:    9821 us  
 stress-ng: info: [27595] stress-ng-cyclic:  99.90%:   22210 us  
 stress-ng: info: [27595] stress-ng-cyclic:  99.99%:   36074 us  
 stress-ng: info: [27595] stress-ng-cyclic: latency distribution (1000 us intervals):  
 stress-ng: info: [27595] stress-ng-cyclic: latency (us) frequency  
 stress-ng: info: [27595] stress-ng-cyclic:           0         0  
 stress-ng: info: [27595] stress-ng-cyclic:        1000         0  
 stress-ng: info: [27595] stress-ng-cyclic:        2000         0  
 stress-ng: info: [27595] stress-ng-cyclic:        3000        82  
 stress-ng: info: [27595] stress-ng-cyclic:        4000      3342  
 stress-ng: info: [27595] stress-ng-cyclic:        5000      5974  
 stress-ng: info: [27595] stress-ng-cyclic:        6000       197  
 stress-ng: info: [27595] stress-ng-cyclic:        7000       209  
 stress-ng: info: [27595] stress-ng-cyclic:        8000       100  
 stress-ng: info: [27595] stress-ng-cyclic:        9000        50  
 stress-ng: info: [27595] stress-ng-cyclic:       10000        10  
 stress-ng: info: [27595] stress-ng-cyclic:       11000         9  
 stress-ng: info: [27595] stress-ng-cyclic:       12000         2  
 stress-ng: info: [27595] stress-ng-cyclic:       13000         2  
 stress-ng: info: [27595] stress-ng-cyclic:       14000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       15000         9  
 stress-ng: info: [27595] stress-ng-cyclic:       16000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       17000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       18000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       19000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       20000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       21000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       22000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       23000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       24000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       25000         2  
 stress-ng: info: [27595] stress-ng-cyclic:       26000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       27000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       28000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       29000         2  
 stress-ng: info: [27595] stress-ng-cyclic:       30000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       31000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       32000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       33000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       34000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       35000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       36000         1  
 stress-ng: info: [27595] stress-ng-cyclic:       37000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       38000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       39000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       40000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       41000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       42000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       43000         0  
 stress-ng: info: [27595] stress-ng-cyclic:       44000         1  
 stress-ng: info: [27594] successful run completed in 5.00s  
   

Note that stress-ng needs to be invoked using sudo to enable the Real Time FIFO scheduling for the cyclic measurements.

The above example uses the following options:

  • --cyclic 1
    • starts one instance of the cyclic measurements (1 is always recommended)
  • --cyclic-policy fifo 
    • use the real time First-In-First-Out scheduling for the cyclic measurements
  • --cyclic-prio 100 
    • use the maximum scheduling priority  
  • --cyclic-method clock_ns
    • use the clock_nanoseconds(2) system call to perform the high precision duration sleep
  • --cyclic-sleep 20000 
    • sleep for 20000 nanoseconds per cyclic iteration
  • --cyclic-dist 1000 
    • enable latency distribution statistics with an interval of 1000 nanoseconds between each data point.
  • -t 5
    • run for just 5 seconds
From the run above, we can see that 99.5% of latencies were less than 9821 nanoseconds and most clustered around the 4880 nanosecond model point. The distribution data shows that there is some clustering around the 5000 nanosecond point and the samples tail off with a bit of a long tail.

Now for the interesting part. Since stress-ng is packed with many different stressors we can run these while performing the cyclic measurements, for example, we can tell stress-ng to run *all* the virtual memory related stress tests and see how this affects the latency distribution using the following:

 sudo stress-ng --cyclic 1 --cyclic-policy fifo \  
 --cyclic-prio 100 --cyclic-method clock_ns \  
 --cyclic-sleep 20000 --cyclic-dist 1000 \  
 --class vm --all 1 -t 60s  
   

..the above invokes all the vm class of stressors to run all at the same time (with just one instance of each stressor) for 60 seconds.

The --cyclic-method specifies the delay used on each of the 10,000 cyclic iterations used.  The default (and recommended method) is clock_ns, using the high precision delay.  The available cyclic delay methods are:
  • clock_ns (use the clock_nanosecond() sleep)
  • posix_ns (use the POSIX nanosecond() sleep)
  • itimer (use a high precision clock timer and pause to wait for a signal to measure latency)
  • poll (busy spin-wait on clock_gettime() to eat cycles for a delay.
All the delay mechanisms use the CLOCK_REALTIME system clock for timing.

I hope this is plenty of cyclic measurement functionality to get some useful latency benchmarks against various kernel components when using some or a mix of the stress-ng stressors.  Let me know if I am missing some other cyclic measurement options and I can see if I can add them in.

Keep stressing and measuring those systems!

Friday, 26 May 2017

What is new in FWTS 17.05.00?

Version 17.05.00 of the Firmware Test Suite was released this week as part of  the regular end-of-month release cadence. So what is new in this release?
  • Alex Hung has been busy bringing the SMBIOS tests in-sync with the SMBIOS 3.1.1 standard
  • IBM provided some OPAL (OpenPower Abstraction Layer) Firmware tests:
    • Reserved memory DT validation tests
    • Power management DT Validation tests
  • The first fwts snap was created
  •  Over 40 bugs were fixed
As ever, we are grateful for all the community contributions to FWTS.  The full release details are available from the fwts-devel mailing list.

I expect that the next upcoming ACPICA release will be integrated into the 17.06.00 FWTS release next month.

Monday, 15 May 2017

Firmware Test Suite Text Based Front-End

The Firmware Test Suite (FWTS) has an easy to use text based front-end that is primarily used by the FWTS Live-CD image but it can also be used in the Ubuntu terminal.

To install and run the front-end use:

 sudo apt-get install fwts-frontend  
 sudo fwts-frontend-text  

..and one should see a menu of options:


In this demonstration, the "All Batch Tests" option has been selected:


Tests will be run one by one and a progress bar shows the progress of each test. Some tests run very quickly, others can take several minutes depending on the hardware configuration (such as number of processors).

Once the tests are all complete, the following dialogue box is displayed:


The test has saved several files into the directory /fwts/15052017/1748/ and selecting Yes one can view the results log in a scroll-box:


Exiting this, the FWTS frontend dialog is displayed:


Press enter to exit (note that the Poweroff option is just for the fwts Live-CD image version of fwts-frontend).

The tool dumps various logs, for example, the above run generated:

 ls -alt /fwts/15052017/1748/  
 total 1388  
 drwxr-xr-x 5 root root  4096 May 15 18:09 ..  
 drwxr-xr-x 2 root root  4096 May 15 17:49 .  
 -rw-r--r-- 1 root root 358666 May 15 17:49 acpidump.log  
 -rw-r--r-- 1 root root  3808 May 15 17:49 cpuinfo.log  
 -rw-r--r-- 1 root root 22238 May 15 17:49 lspci.log  
 -rw-r--r-- 1 root root 19136 May 15 17:49 dmidecode.log  
 -rw-r--r-- 1 root root 79323 May 15 17:49 dmesg.log  
 -rw-r--r-- 1 root root  311 May 15 17:49 README.txt  
 -rw-r--r-- 1 root root 631370 May 15 17:49 results.html  
 -rw-r--r-- 1 root root 281371 May 15 17:49 results.log  

acpidump.log is a dump of the ACPI tables in format compatible with the ACPICA acpidump tool.  The results.log file is a copy of the results generated by FWTS and results.html is a HTML formatted version of the log.

Monday, 8 May 2017

Simple job scripting in stress-ng 0.08.00

The latest release of stress-ng 0.08.00 now contains a new job scripting feature. Jobs allow one to bundle up a set of stress options  into a script rather than cram them all onto the command line.  One can now also run multiple invocations of a stressor with the latest version of stress-ng and conbined with job scripts we now have a powerful way of running more complex stress tests.

The job script commands are essentially the stress-ng long options without the need for the '--' option characters.  One option per line is allowed.

For example:

 $ stress-ng --cpu 1 --matrix 1 --verbose --tz --timeout 60s --cpu 1 --matrix -1 --icache 1 

would become:

 $cat example.job  
 verbose  
 tz  
 timeout 60  
 cpu 1  
 matrix 1  
 icache 1  

One can also add comments using the # character prefix.   By default the stressors will be run in parallel, but one can use the "run sequential" command in the job script to run the stressors sequentially.

The following script runs the mmap stressor multiple times using more memory on each run:

 $ cat mmap.job  
 run sequential # one job at a time  
 timeout 2m   # run for 2 minutes  
 verbose     # verbose output  
 #  
 # run 4 invocations and increase memory each time  
 #  
 mmap 1  
 mmap-bytes 25%  
 mmap 1  
 mmap-bytes 50%  
 mmap 1  
 mmap-bytes 75%  
 mmap 1  
 mmap-bytes 100%  

Some of the stress-ng stressors have various "methods" that allow one to modify the way the stressor behaves.  The following example shows how job scripts can be uses to exercise a system using different stressor methods:

 $ cat /usr/share/stress-ng/example-jobs/matrix-methods.job   
 #  
 # hot-cpu class stressors:  
 #  various options have been commented out, one can remove the  
 #  proceeding comment to enable these options if required.  
 #  
 # run the following tests in parallel or sequentially  
 #  
 run sequential  
 # run parallel  
 #  
 # verbose  
 #  show all debug, warnings and normal information output.  
 #  
 verbose  
 #  
 # run each of the tests for 60 seconds  
 # stop stress test after N seconds. One can also specify the units  
 # of time in seconds, minutes, hours, days or years with the suf‐  
 # fix s, m, h, d or y.  
 #  
 timeout 1m  
 # tz  
 #  collect temperatures from the available thermal zones on the  
 #  machine (Linux only). Some devices may have one or more thermal  
 #  zones, where as others may have none.  
 tz  
 #  
 # matrix stressor with examples of all the methods allowed  
 #  
 #  start N workers that perform various matrix operations on float‐  
 #  ing point values. By default, this will exercise all the matrix  
 #  stress methods one by one. One can specify a specific matrix  
 #  stress method with the --matrix-method option.  
 #  
 #  
 # Method      Description  
 # all       iterate over all the below matrix stress methods  
 # add       add two N × N matrices  
 # copy       copy one N × N matrix to another  
 # div       divide an N × N matrix by a scalar  
 # hadamard     Hadamard product of two N × N matrices  
 # frobenius    Frobenius product of two N × N matrices  
 # mean       arithmetic mean of two N × N matrices  
 # mult       multiply an N × N matrix by a scalar  
 # prod       product of two N × N matrices  
 # sub       subtract one N × N matrix from another N × N matrix  
 # trans      transpose an N × N matrix  
 #  
 matrix 0  
 matrix-method all  
 matrix 0  
 matrix-method add  
 matrix 0  
 matrix-method copy  
 matrix 0  
 matrix-method div  
 matrix 0  
 matrix-method frobenius  
 matrix 0  
 matrix-method hadamard  
 matrix 0  
 matrix-method mean  
 matrix 0  
 matrix-method mult  
 matrix 0  
 matrix-method prod  
 matrix 0  
 matrix-method sub  
 matrix 0  
 matrix-method trans  

Various example job scripts can be found in /usr/share/stress-ng/example-job, one can use these as a base for writing more complex stressors.  The example jobs have all the options commented (using the text from the stress-ng manual) to make it easier to see how each stressor can be run.

Version 0.08.00 landed in Ubuntu 17.10 Artful Aardvark and is available as a snap and I've got backports in ppa:colin-king/white for older releases of Ubuntu.

Thursday, 20 April 2017

Tracking CoverityScan issues on Linux-next

Over the past 6 months I've been running static analysis on linux-next with CoverityScan on a regular basis (to find new issues and fix some of them) as well as keeping a record of the defect count.


Since the beginning of September over 2000 defects have been eliminated by a host of upstream developers and the steady downward trend of outstanding issues is good to see.  A proportion of the outstanding defects are false positives or issues where the code is being overly zealous, for example, bounds checking where some conditions can never happen. Considering there are millions of lines of code, the defect rate is about average for such a large project.

I plan to keep the static analysis running long term and I'll try and post stats every 6 months or so to see how things are progressing.

Thursday, 5 January 2017

BCC: a powerful front end to extended Berkeley Packet Filters

The BPF Compiler Collection (BCC) is a toolkit for building kernel tracing tools that leverage the functionality provided by the Linux extended Berkeley Packet Filters (BPF).

BCC allows one to write BPF programs with front-ends in Python or Lua with kernel instrumentation written in C.  The instrumentation code is built into sandboxed eBPF byte code and is executed in the kernel.

The BCC github project README file provides an excellent overview and description of BCC and the various available BCC tools.  Building BCC from scratch can be a bit time consuming, however,  the good news is that the BCC tools are now available as a snap and so BCC can be quickly and easily installed just using:

 sudo snap install --devmode bcc  

There are currently over 50 BCC tools in the snap, so let's have a quick look at a few:

cachetop allows one to view the top page cache hit/miss statistics. To run this use:

 sudo bcc.cachetop  



The funccount tool allows one to count the number of times specific functions get called.  For example, to see how many kernel functions with the name starting with "do_" get called per second one can use:

 sudo bcc.funccount "do_*" -i 1  


To see how to use all the options in this tool, use the -h option:

 sudo bcc.funccount -h  

I've found the funccount tool to be especially useful to check on kernel activity by checking on hits on specific function names.

The slabratetop tool is useful to see the active kernel SLAB/SLUB memory allocation rates:

 sudo bcc.slabratetop  


If you want to see which process is opening specific files, one can snoop on open system calls use the opensnoop tool:

 sudo bcc.opensnoop -T


Hopefully this will give you a taste of the useful tools that are available in BCC (I have barely scratched the surface in this article).  I recommend installing the snap and giving it a try.

As it stands,BCC provides a useful mechanism to develop BPF tracing tools and I look forward to regularly updating the BCC snap as more tools are added to BCC. Kudos to Brendan Gregg for BCC!