FIO (Flexible I/O Tester) Part8 – Interpret and understand the result/output

“fio” Flexible IO Tester post series from installing to a deep dive explanation and usage.

full course
  1. FIO (Flexible I/O Tester) Part1 – Installation and compiling if needed
  2. FIO (Flexible I/O Tester) Part2 – First run and defining job files
  3. FIO (Flexible I/O Tester) Part3 – Environment variables and keywords
  4. FIO (Flexible I/O Tester) Part4 – fixed, ranges or finer grained block sizes?
  5. FIO (Flexible I/O Tester) Part5 – Direct I/O or buffered (page cache) or raw performance?
  6. FIO (Flexible I/O Tester) Part6 – Sequential read and readahead
  7. FIO (Flexible I/O Tester) Part7 – Steady State of SSD,NVMe,PCIe Flash with TKperf
  8. FIO (Flexible I/O Tester) Part8 – Interpret and understand the result/output
  9. FIO (Flexible I/O Tester) Part9 – fio2gnuplot to visualize the output

The result/output of “fio” can be overwhelming because this decent tool does a lot for you. Your job is to feed “fio” with the right options and then interpret the result/output. This posting will help you to understand the result/output in detail. I know it’s difficult to read but I am limited by the WordPress design here a little bit and may improve it in the future.

The official documentation

The HOWTO provides some insights about the result/output of “fio”. I copy&paste some parts of the HOWTO and give you some more details or summarize other parts.

Output while running

PThread setup, but not started.
CThread created.
IThread initialized, waiting or generating necessary data.
pThread running pre-reading file(s).
RRunning, doing sequential reads.
rRunning, doing random reads.
WRunning, doing sequential writes.
wRunning, doing random writes.
Running, doing mixed sequential reads/writes.
mRunning, doing mixed random reads/writes.
FRunning, currently waiting for fsync()
fRunning, finishing up (writing IO logs, etc)
VRunning, doing verification of written data.
EThread exited, not reaped by main thread yet.
_Thread reaped, or
XThread reaped, exited with an error.
KThread reaped, exited due to signal.

Job overview output

This will give you an overview about the jobs and the option used. It’s useful to check the heading if you receive only the results of a run but not the command line call or job file.

Data direction output

All details for each data direction will be shown here. Most important numbers are:

  • io=
  • bw=
  • iops=
  • issued =
  • lat =

Details inside the box.

Group statistics

Disk statistics

Example: Interpret the result/output of SanDisk PX600-1000 and Samsung EVO 840

I ran the linux raw peak performance test:  job-fio-11.ini which is included in the “fio” sample linux files on Testverse.

This means a 4K random write test to show the peak 4K write IOPS.

I invoked the script:

Result for SanDisk PX600-1000:

Result for Samsung EVO 840:

Part 1: Job overview output

  • “g=0”
    • this job belongs to group 0 – Groups can be used to aggregate job results.
  •  “rw=rw”
    • the IO pattern is “random write”
  •  “bs=4K-4K/4K-4K/4K-4K”
    • the block size is 4K for all types (read,write,discard). This test is only scheduling  writes. The important part for this job is 4K-4K in the middle.
  •  “ioengine=libaio”
    • the used ioengine is libaio which means parallel writes will be scheduled. This is not a good test for the file system performance because it skips the page cache.
  •  “iodepth=32”
    • there will be up to “32 IO units” in flight against the device. So there will be a queue filled up to 32 outstanding IOs most of the time.
  • the version is “fio-2.2.9-26-g669e”
  • 4 threads will be started.

Part 2: Data direction output

The header is self explaining: Job name = fio-job-11 …

The detailed IO statistics for the job:

  • “write” (remember the job schedules 100% random writes)
  • “io=41119MB”
    • number of MB transfered
  • “bw=1370.5MB/s”
    • write data at a speed of 1370.5MB per second in average
  • “iops=350728”
    • is the average IO per second (4K in this case).
  • “runt=30013msec”
    • The job ran ~30 seconds
  • “slat”, “clat”,”lat” min,max,avg,stdev
    • slat means submission latency and presents the time it took to submit this IO to the kernel for processing.
    • clat  means completion latency and presents the time that passes between submission to the kernel and when the IO is complete, not including submission latency.
    • lat is the best metric which represents the whole latency an application would experience.  The avg slat+ avg clat = ~ avg lat.
    • Keep an eye if the numbers are usec or msec ..etc.! Compare PX600-1000 to EVO 840.
    • See Al Tobey blog for some more details.
  • “clat percentile” gives a detailed explanation how much IO in percentage completed in which time frame. In this case: 99% of the IO completed in <=1192 usec = 1,2 msec. This value is often used to ignore the few spikes when testing. The maximum clat has been 13505 which has been ~14x longer than the average of 344.
  • “bw”  min, max, per,avg, stdev
    • In this case the bandwidth  has been 345090 KB/s = ~337 MB/s
  • “lat” this is like the clat part.
    • In this case 91.01% of the IO completed between 500usec and >250usec. This is in line with the avg latency of 360.23usec. Only ~ 8,7% of the IO took between 2ms and >750usec. Both together is nearly 99,8% of all IO.
  •  “cpu”
    • this line is dedicated to the CPU usage of the running the job
      •  “usr=5.66%”
        • this is the percentage of CPU usage of the running job at user level
        • 100% would mean that one CPU core will be at 100% workload, depending if HT on/off
      •  “sys=12.09%”
        • this is the percentage of CPU usage of the running job at system/kernel level
      • “ctx=2783241”
        • The number of context switches this thread encountered while running
      • “majf=0” and “minf=32689”
  • “IO depths :
    • “32=116.7%… ”
      • this number showed that this job was able to have ~32 IO units in flight
      • I am not sure why it’s >100%
      • “submit: …..16=100.0%….”
        • shows how many IO were submitted in a single submit call. In this case it could be in the range of 8 to 16
        • This is in line with the script which used iodepth_batch values
      • “complete: …. 16=100.0%….”
        • same like submit but for completed calls.
      • “issued:….total=w=10526416/w=0/d=0″…
        • 10526416 write IO have been issued, no reads, no discards and none of them have been short or dropped

Part 3: Group statistics

    • “io=41119MB”
      • As in the job statistics the same amount of transfered MB here because its only one  job
    • “aggrb=1370.5MB/s”
      • aggregated bandwidth of all jobs/threads for group 0
    • “minb=1370.5MB/s maxb=1370.5MB/s”
      • The minimum and maximum bandwidth one thread saw. In this case is the minimum the same as the maximum.I don’t think this is correct! Will clarify this.
    • “mint=30008msec” and “maxt=30008msec”
      • Smallest and longest runtime of one of the jobs. The same because we ran only one job

Part 4: Disk statistics

  • “fioa: ios=71/767604”
    • 71 read IO and 767604 write IO on /dev/fioa
    • I am not sure why there are 71 read IO. I am pretty sure i didn’t run anything myself in the background. Who knows?
  • “merge=0/0” number of merges the IO from the IO scheduler
    • no merges here
  • “ticks=0/259476”
    • number of ticks we kept the drive busy. A sign that the device is saturated.
    • A tick is related to one jiffy. The next lines are only a approximation. read about Kernel Timer for more details.
    • In this example I checked the value for CONFIG_HZ

      • CONFIG_HZ is set to 250 which means 1 second / 250 = 0,004s = 4ms
      • 259476 ticks * 4ms = 1037s  ???
      • I believe this is the cumulated wait time all IO spend in the queue. If you increase only the iodepth the value increase linear.
      • 1037s / 32 (iodepth) = ~32,4s its a little bit more than the runtime of 30s
  • “io_queue=256816”
    • total time spend in the disk queue
  • “util=94.63%”
    • the utilization of the drive is 94.63 means the drive seems to be nearly saturated with this workload.

This should give you a good idea which parts of a result/output exist and some insights how to interpret them. And yes its a tough one.


Leave a Reply

Your email address will not be published.