“fio” Flexible IO Tester post series from installing to a deep dive explanation and usage.
full course- FIO (Flexible I/O Tester) Part1 – Installation and compiling if needed
- FIO (Flexible I/O Tester) Part2 – First run and defining job files
- FIO (Flexible I/O Tester) Part3 – Environment variables and keywords
- FIO (Flexible I/O Tester) Part4 – fixed, ranges or finer grained block sizes?
- FIO (Flexible I/O Tester) Part5 – Direct I/O or buffered (page cache) or raw performance?
- FIO (Flexible I/O Tester) Part6 – Sequential read and readahead
- FIO (Flexible I/O Tester) Part7 – Steady State of SSD,NVMe,PCIe Flash with TKperf
- FIO (Flexible I/O Tester) Part8 – Interpret and understand the result/output
- FIO (Flexible I/O Tester) Part9 – fio2gnuplot to visualize the output
The result/output of “fio” can be overwhelming because this decent tool does a lot for you. Your job is to feed “fio” with the right options and then interpret the result/output. This posting will help you to understand the result/output in detail. I know it’s difficult to read but I am limited by the WordPress design here a little bit and may improve it in the future.
The official documentation
The HOWTO provides some insights about the result/output of “fio”. I copy&paste some parts of the HOWTO and give you some more details or summarize other parts.
Output while running
1 2 3 4 5 6 7 |
fio spits out a lot of output. While running, fio will display the status of the jobs created. An example of that would be: Threads: 1: [_r] [24.8% done] [ 13509/ 8334 kb/s] [eta 00h:01m:31s] The characters inside the square brackets denote the current status of each thread. The possible values (in typical life cycle order) are: |
Idle | Run | Description |
---|---|---|
P | Thread setup, but not started. | |
C | Thread created. | |
I | Thread initialized, waiting or generating necessary data. | |
p | Thread running pre-reading file(s). | |
R | Running, doing sequential reads. | |
r | Running, doing random reads. | |
W | Running, doing sequential writes. | |
w | Running, doing random writes. | |
M | Running, doing mixed sequential reads/writes. | |
m | Running, doing mixed random reads/writes. | |
F | Running, currently waiting for fsync() | |
f | Running, finishing up (writing IO logs, etc) | |
V | Running, doing verification of written data. | |
E | Thread exited, not reaped by main thread yet. | |
_ | Thread reaped, or | |
X | Thread reaped, exited with an error. | |
K | Thread reaped, exited due to signal. |
Job overview output
This will give you an overview about the jobs and the option used. It’s useful to check the heading if you receive only the results of a run but not the command line call or job file.
1 2 3 4 |
job-1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 job-2: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.2.9-26-g669e Starting 2 processes |
Data direction output
All details for each data direction will be shown here. Most important numbers are:
- io=
- bw=
- iops=
- issued =
- lat =
Details inside the box.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
When fio is done (or interrupted by ctrl-c), it will show the data for each thread, group of threads, and disks in that order. For each data direction, the output looks like: Client1 (g=0): err= 0: write: io= 32MB, bw= 666KB/s, iops=89 , runt= 50320msec slat (msec): min= 0, max= 136, avg= 0.03, stdev= 1.92 clat (msec): min= 0, max= 631, avg=48.50, stdev=86.82 bw (KB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, stdev=681.68 cpu : usr=1.49%, sys=0.25%, ctx=7969, majf=0, minf=17 IO depths : 1=0.1%, 2=0.3%, 4=0.5%, 8=99.0%, 16=0.0%, 32=0.0%, >32=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=0/32768, short=0/0 lat (msec): 2=1.6%, 4=0.0%, 10=3.2%, 20=12.8%, 50=38.4%, 100=24.8%, lat (msec): 250=15.2%, 500=0.0%, 750=0.0%, 1000=0.0%, >=2048=0.0% io= Number of megabytes io performed bw= Average bandwidth rate iops= Average IOs performed per second runt= The runtime of that thread slat= Submission latency (avg being the average, stdev being the standard deviation). This is the time it took to submit the io. For sync io, the slat is really the completion latency, since queue/complete is one operation there. This value can be in milliseconds or microseconds, fio will choose the most appropriate base and print that. In the example above, milliseconds is the best scale. Note: in --minimal mode latencies are always expressed in microseconds. clat= Completion latency. Same names as slat, this denotes the time from submission to completion of the io pieces. For sync io, clat will usually be equal (or very close) to 0, as the time from submit to complete is basically just CPU time (io has already been done, see slat explanation). bw= Bandwidth. Same names as the xlat stats, but also includes an approximate percentage of total aggregate bandwidth this thread received in this group. This last value is only really useful if the threads in this group are on the same disk, since they are then competing for disk access. cpu= CPU usage. User and system time, along with the number of context switches this thread went through, usage of system and user time, and finally the number of major and minor page faults. IO depths= The distribution of io depths over the job life time. The numbers are divided into powers of 2, so for example the 16= entries includes depths up to that value but higher than the previous entry. In other words, it covers the range from 16 to 31. IO submit= How many pieces of IO were submitting in a single submit call. Each entry denotes that amount and below, until the previous entry - eg, 8=100% mean that we submitted anywhere in between 5-8 ios per submit call. IO complete= Like the above submit number, but for completions instead. IO issued= The number of read/write requests issued, and how many of them were short. IO latencies= The distribution of IO completion latencies. This is the time from when IO leaves fio and when it gets completed. The numbers follow the same pattern as the IO depths, meaning that 2=1.6% means that 1.6% of the IO completed within 2 msecs, 20=12.8% means that 12.8% of the IO took more than 10 msecs, but less than (or equal to) 20 msecs. |
Group statistics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
After each client has been listed, the group statistics are printed. They will look like this: Run status group 0 (all jobs): READ: io=64MB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec WRITE: io=64MB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec For each data direction, it prints: io= Number of megabytes io performed. aggrb= Aggregate bandwidth of threads in this group. minb= The minimum average bandwidth a thread saw. maxb= The maximum average bandwidth a thread saw. mint= The smallest runtime of the threads in that group. maxt= The longest runtime of the threads in that group. |
Disk statistics
1 2 3 4 5 6 7 8 9 10 11 12 |
Disk stats (read/write): sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00% Each value is printed for both reads and writes, with reads first. The numbers denote: ios= Number of ios performed by all groups. merge= Number of merges io the io scheduler. ticks= Number of ticks we kept the disk busy. io_queue= Total time spent in the disk queue. util= The disk utilization. A value of 100% means we kept the disk busy constantly, 50% would be a disk idling half of the time. |
Example: Interpret the result/output of SanDisk PX600-1000 and Samsung EVO 840
I ran the linux raw peak performance test: job-fio-11.ini which is included in the “fio” sample linux files on Testverse.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
[global] readwrite=rw rwmixread=0 blocksize=4K ioengine=libaio numjobs=4 thread=1 direct=1 iodepth=32 iodepth_batch=16 iodepth_batch_complete=16 group_reporting=1 ramp_time=5 norandommap=1 description=fio random 4K write peak IOPS time_based=1 runtime=30 randrepeat=0 write_bw_log=fio-job-11 write_lat_log=fio-job-11 write_iops_log=fio-job-11 [fio-job-11] filename=${DISK} ; example /dev/fioa cpus_allowed=1-4 |
This means a 4K random write test to show the peak 4K write IOPS.
I invoked the script:
1 |
sudo DISK=/dev/fioa fio fio-job-11.ini |
1 |
sudo DISK=/dev/sda fio fio-job-11.ini |
Result for SanDisk PX600-1000:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
fio-job-11: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 ... fio-2.2.9-26-g669e Starting 4 threads fio-job-11: (groupid=0, jobs=4): err= 0: pid=2377: Sun Aug 23 12:56:42 2015 Description : [fio random 4K write peak IOPS] write: io=41119MB, bw=1370.5MB/s, iops=350728, runt= 30013msec slat (usec): min=11, max=230, avg=15.62, stdev= 1.79 clat (usec): min=20, max=13505, avg=344.55, stdev=224.87 lat (usec): min=66, max=13523, avg=360.23, stdev=224.84 clat percentiles (usec): | 1.00th=[ 262], 5.00th=[ 262], 10.00th=[ 262], 20.00th=[ 266], | 30.00th=[ 266], 40.00th=[ 270], 50.00th=[ 274], 60.00th=[ 274], | 70.00th=[ 278], 80.00th=[ 282], 90.00th=[ 374], 95.00th=[ 1020], | 99.00th=[ 1192], 99.50th=[ 1208], 99.90th=[ 1256], 99.95th=[ 1272], | 99.99th=[ 1304] bw (KB /s): min= 12, max=356608, per=24.60%, avg=345090.77, stdev=45012.87 lat (usec) : 50=0.01%, 100=0.01%, 250=0.24%, 500=91.01%, 750=0.04% lat (usec) : 1000=3.31% lat (msec) : 2=5.39%, 20=0.01% cpu : usr=5.66%, sys=12.09%, ctx=2783241, majf=0, minf=32689 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=116.7%, >=64=0.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=10526416/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: io=41119MB, aggrb=1370.5MB/s, minb=1370.5MB/s, maxb=1370.5MB/s, mint=30013msec, maxt=30013msec Disk stats (read/write): fioa: ios=71/767604, merge=0/0, ticks=0/259476, in_queue=256816, util=94.63% |
Result for Samsung EVO 840:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
fio-job-11: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 ... fio-2.2.9-26-g669e Starting 4 threads fio-job-11: (groupid=0, jobs=4): err= 0: pid=2881: Sun Aug 23 13:20:28 2015 Description : [fio random 4K write peak IOPS] write: io=2453.5MB, bw=83721KB/s, iops=20926, runt= 30008msec slat (usec): min=11, max=63, avg=15.57, stdev= 4.97 clat (msec): min=4, max=35, avg= 6.09, stdev= 2.30 lat (msec): min=4, max=35, avg= 6.10, stdev= 2.30 clat percentiles (usec): | 1.00th=[ 4640], 5.00th=[ 4704], 10.00th=[ 4768], 20.00th=[ 4832], | 30.00th=[ 4832], 40.00th=[ 5152], 50.00th=[ 5344], 60.00th=[ 5472], | 70.00th=[ 5664], 80.00th=[ 6240], 90.00th=[10176], 95.00th=[10688], | 99.00th=[12096], 99.50th=[15808], 99.90th=[30080], 99.95th=[32384], | 99.99th=[34560] bw (KB /s): min= 12, max=24733, per=24.59%, avg=20583.40, stdev=2925.28 lat (msec) : 10=88.51%, 20=11.17%, 50=0.33% cpu : usr=1.00%, sys=0.79%, ctx=47811, majf=0, minf=11528 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=117.3%, >=64=0.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=627952/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: io=2453.5MB, aggrb=83721KB/s, minb=83721KB/s, maxb=83721KB/s, mint=30008msec, maxt=30008msec Disk stats (read/write): sda: ios=91/45915, merge=0/688845, ticks=16/277508, in_queue=277588, util=99.81% |
Part 1: Job overview output
1 2 3 4 |
fio-job-11: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 ... fio-2.2.9-26-g669e Starting 4 threads |
- “g=0”
- this job belongs to group 0 – Groups can be used to aggregate job results.
- “rw=rw”
- the IO pattern is “random write”
- “bs=4K-4K/4K-4K/4K-4K”
- the block size is 4K for all types (read,write,discard). This test is only scheduling writes. The important part for this job is 4K-4K in the middle.
- “ioengine=libaio”
- the used ioengine is libaio which means parallel writes will be scheduled. This is not a good test for the file system performance because it skips the page cache.
- “iodepth=32”
- there will be up to “32 IO units” in flight against the device. So there will be a queue filled up to 32 outstanding IOs most of the time.
- the version is “fio-2.2.9-26-g669e”
- 4 threads will be started.
Part 2: Data direction output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
fio-job-11: (groupid=0, jobs=4): err= 0: pid=2377: Sun Aug 23 12:56:42 2015 Description : [fio random 4K write peak IOPS] write: io=41119MB, bw=1370.5MB/s, iops=350728, runt= 30013msec slat (usec): min=11, max=230, avg=15.62, stdev= 1.79 clat (usec): min=20, max=13505, avg=344.55, stdev=224.87 lat (usec): min=66, max=13523, avg=360.23, stdev=224.84 clat percentiles (usec): | 1.00th=[ 262], 5.00th=[ 262], 10.00th=[ 262], 20.00th=[ 266], | 30.00th=[ 266], 40.00th=[ 270], 50.00th=[ 274], 60.00th=[ 274], | 70.00th=[ 278], 80.00th=[ 282], 90.00th=[ 374], 95.00th=[ 1020], | 99.00th=[ 1192], 99.50th=[ 1208], 99.90th=[ 1256], 99.95th=[ 1272], | 99.99th=[ 1304] bw (KB /s): min= 12, max=356608, per=24.60%, avg=345090.77, stdev=45012.87 lat (usec) : 50=0.01%, 100=0.01%, 250=0.24%, 500=91.01%, 750=0.04% lat (usec) : 1000=3.31% lat (msec) : 2=5.39%, 20=0.01% cpu : usr=5.66%, sys=12.09%, ctx=2783241, majf=0, minf=32689 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=116.7%, >=64=0.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=10526416/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 |
The header is self explaining: Job name = fio-job-11 …
The detailed IO statistics for the job:
- “write” (remember the job schedules 100% random writes)
- “io=41119MB”
- number of MB transfered
- “bw=1370.5MB/s”
- write data at a speed of 1370.5MB per second in average
- “iops=350728”
- is the average IO per second (4K in this case).
- “runt=30013msec”
- The job ran ~30 seconds
- “slat”, “clat”,”lat” min,max,avg,stdev
- slat means submission latency and presents the time it took to submit this IO to the kernel for processing.
- clat means completion latency and presents the time that passes between submission to the kernel and when the IO is complete, not including submission latency.
- lat is the best metric which represents the whole latency an application would experience. The avg slat+ avg clat = ~ avg lat.
- Keep an eye if the numbers are usec or msec ..etc.! Compare PX600-1000 to EVO 840.
- See Al Tobey blog for some more details.
- “clat percentile” gives a detailed explanation how much IO in percentage completed in which time frame. In this case: 99% of the IO completed in <=1192 usec = 1,2 msec. This value is often used to ignore the few spikes when testing. The maximum clat has been 13505 which has been ~14x longer than the average of 344.
- “bw” min, max, per,avg, stdev
- In this case the bandwidth has been 345090 KB/s = ~337 MB/s
- “lat” this is like the clat part.
- In this case 91.01% of the IO completed between 500usec and >250usec. This is in line with the avg latency of 360.23usec. Only ~ 8,7% of the IO took between 2ms and >750usec. Both together is nearly 99,8% of all IO.
- “cpu”
- this line is dedicated to the CPU usage of the running the job
- “usr=5.66%”
- this is the percentage of CPU usage of the running job at user level
- 100% would mean that one CPU core will be at 100% workload, depending if HT on/off
- “sys=12.09%”
- this is the percentage of CPU usage of the running job at system/kernel level
- “ctx=2783241”
- The number of context switches this thread encountered while running
- “majf=0” and “minf=32689”
- The number of major and minor page faults
- majf should be zero. Understanding page faults.
- “usr=5.66%”
- this line is dedicated to the CPU usage of the running the job
- “IO depths :
- “32=116.7%… ”
- this number showed that this job was able to have ~32 IO units in flight
- I am not sure why it’s >100%
- “submit: …..16=100.0%….”
- shows how many IO were submitted in a single submit call. In this case it could be in the range of 8 to 16
- This is in line with the script which used iodepth_batch values
-
- “complete: …. 16=100.0%….”
- same like submit but for completed calls.
- “issued:….total=w=10526416/w=0/d=0″…
- 10526416 write IO have been issued, no reads, no discards and none of them have been short or dropped
- “complete: …. 16=100.0%….”
- “32=116.7%… ”
Part 3: Group statistics
1 2 |
Run status group 0 (all jobs): WRITE: io=41119MB, aggrb=1370.5MB/s, minb=1370.5MB/s, maxb=1370.5MB/s, mint=30013msec, maxt=30013msec |
- WRITE
- “io=41119MB”
- As in the job statistics the same amount of transfered MB here because its only one job
- “aggrb=1370.5MB/s”
- aggregated bandwidth of all jobs/threads for group 0
- “minb=1370.5MB/s maxb=1370.5MB/s”
- The minimum and maximum bandwidth one thread saw. In this case is the minimum the same as the maximum.I don’t think this is correct! Will clarify this.
- “mint=30008msec” and “maxt=30008msec”
- Smallest and longest runtime of one of the jobs. The same because we ran only one job
- “io=41119MB”
Part 4: Disk statistics
1 2 |
Disk stats (read/write): fioa: ios=71/767604, merge=0/0, ticks=0/259476, in_queue=256816, util=94.63% |
- “fioa: ios=71/767604”
- 71 read IO and 767604 write IO on /dev/fioa
- I am not sure why there are 71 read IO. I am pretty sure i didn’t run anything myself in the background. Who knows?
- “merge=0/0” number of merges the IO from the IO scheduler
- no merges here
- “ticks=0/259476”
- number of ticks we kept the drive busy. A sign that the device is saturated.
- A tick is related to one jiffy. The next lines are only a approximation. read about Kernel Timer for more details.
- In this example I checked the value for CONFIG_HZ
1cat /boot/config-`uname -r` | grep HZ- CONFIG_HZ is set to 250 which means 1 second / 250 = 0,004s = 4ms
- 259476 ticks * 4ms = 1037s ???
- I believe this is the cumulated wait time all IO spend in the queue. If you increase only the iodepth the value increase linear.
- 1037s / 32 (iodepth) = ~32,4s its a little bit more than the runtime of 30s
- “io_queue=256816”
- total time spend in the disk queue
- “util=94.63%”
- the utilization of the drive is 94.63 means the drive seems to be nearly saturated with this workload.
This should give you a good idea which parts of a result/output exist and some insights how to interpret them. And yes its a tough one.
Go GIMP.