Monthly Archives: September 2015

FIO (Flexible I/O Tester) Part9 – fio2gnuplot to visualize the output

When installing the linux build of “fio” it provides a tool called fio2gnuplot. This tool renders the output files of “fio” and uses gnuplot to generate nice graphics. Gnuplot is a portable command-line driven graphing utility which is freely distributed.

Example shows distribution of IOPS with different block sizes and different Read/Write Mix:

PX600-1000-IOPS-mes3DPlt

Requirements

I am using “fio” 2.2.10 which was release on 12.09.2015.

Since 2.1.2 fio2gnuplot is part of the “fio” release. To generate the graphics you need to install gnuplot.

How to generate the log files?

There are some “fio” options to generate log files.

  • write_bw_log=<Filename>
  • write_iops_log=<Filename>
  • write_lat_log=<Filename>
  • per_job_logs=0/1 ( >2.2.8 so not for Windows build 16.09.2015)

write_bw_log generates a log file with the bandwidth details of the job and so on. If you don’t set the per_job_logs=0 then for each thread (numjob=X)  there will be one file. Most of the time this is not wanted because you would like to generate graphics based on all threads. An issue I found is that the default patterns of fio2gnuplot ( -b / -i) will not work because it search for  ( *_bw.log and *_iops.log) file endings. But the files end with *_bw.X.log and *_iops.X.log. It should be fixed with this commit.

If per_job_logs=0 set and all log files option have been set:

  • write_bw_log=fio-test
  • write_iops_log=fio-test
  • write_lat_log=fio-test

then 5 files will be generated:

How does a log file look like?

Means 4096 bytes in the fourth column is the block size (4K). The second column is the bandwidth in KB/s. I believe the first column is the passed time in ms. The third column which is 0 indicates that this row is related to reads. If this is related to write than the third column is 1.

Using fio2gnuplot

fio2gnuplot works in two major phases. The first phase is to generate the input files for gnuplot and do some calculating based on the data like the avg or min and max.

Starting fio2gnuplot -b will search for all bandwidth files in the local directory and generates the input files for gnuplot. The opition “-i” is the default pattern for iops files. There is no default  pattern for latency.

fio2gnuplot_phase1

The second phase is to generate the graphics. The option “-g” can be used for this. Per default “-g” deletes the input files for gnuplot. The option “-k” can be used to keep this files for later editing. If you want to make changes to the output you are able to edit gnuplot files like the mygraph file.

fio2gnuplot_phase2

And this is the output of fio-test_bw-2Draw.png

fio-test_bw-2Draw

Using fio2gnuplot to compare files with the default pattern -b or -i

You can copy all log file in the same directory and call fio2gnuplot with the right pattern. I make use of “-b” for bandwidth comparisons.

fio2gnuplot_compare

And this is the output of compare-result-2Dsmooth.png

compare-result-2Dsmooth

Using fio2gnuplot to compare files with a custom pattern

Sometimes the default pattern will not work. For example there is no pattern for the latency output. For this case you can specify your own pattern with the option “-p <pattern>” and using a title. WARNING: Using the pattern “*.log” will raise an error. I fixed this and in the future this should work.

compare-result-pattern

And this is the output of compare-result-2Dsmooth.png

compare-lat-2Dsmooth

Go Keepass2.

 

7. SQL Server Performance Tuning study with HammerDB – Flashsoft and PX600 unleash the full power

I solved all bottlenecks since we started this performance tuning study. But now I can’t find any improvements which can be done without altering the schema or indexes which is not allowed by TPC-C rules. It is a similar situation when you run a third party application with a database which you are not allowed to change. A great solution to improve the disk latency is caching based on Flash, because it is transparent to the application vendor. The advantage of Flashsoft 3.7 is that it provides a READ and WRITE cache. The write cache is the one which should help with this OLTP workload. Remember Flashsoft can cache FC,iSCSI,NFS and local devices.

Phase 3 – Forming a hypothesis – Part 5

  • Based on observation and declaration form a hypothesis
    • Based on observation and the lessons I learned, I believe the TPM/NOPM values should increase, if the disc access latency will be reduced with the use of READ/WRITE cache (Flashsoft).

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
      • In this case I concentrate only on the TPM/NOPM values.
    • some metrics as key metrics
      • TPM/NOPM
    • the right level of details
    • an efficient approach in terms of time and results
      • Installing and configuring Flashsoft will take 30 min
    • a tool you fully understand
  • 4.3 document the defined method and setup a test plan

  I will run the following test plans:

Test plan 1

Implement a READ/WRITE cache for SQL Server based on Samsung 840 basic

Start HammerDB workload

Run ShowIOBottlenecks and Resource Monitor

Stop HammerDB workload and compare this run with the baseline

Analyze the ShowIOBottlenecks

Test plan 2

If ShowIOBottlenecks still shows wait events for disk latency I will use the PX600-1000 as READ/ WRITE cache device

Start HammerDB workload

Run ShowIOBottlenecks and Resource Monitor

Stop HammerDB workload and compare this run with the baseline

Phase 5 – Testing the hypothesis – Test Plan 1+2

  • 5.1 Run the test plan
  • avoid or don’t test if other workloads are running
  • run the test at least two times

I recorded a video when running the test plan 1.

  • System achieved 338486 SQL Server TPM at 73651 NOPM
  • System achieved 314510 SQL Server TPM at 68320 NOPM
  • System achieved 313778 SQL Server TPM at 68256 NOPM

the Resource Monitor showed that there is still latency around 10ms and ShowIOBottlenecks shows that there are wait events for Write log. So I decided to use the PX600-1000.

I recorded a video when running the test plan 2.

  • System achieved 338989 SQL Server TPM at 73685 NOPM
  • System achieved 348164 SQL Server TPM at 75689 NOPM
  • System achieved 336965 SQL Server TPM at 73206 NOPM
  •  5.2 save the results
    • All results are saved into the log files

Phase 6 – Analysis of results – Test Plan 1+2

  • 6.2 Read and interpret all metrics
    • understand all metrics
    • compare metrics to basic/advanced baseline metrics
      • TEST RESULT Flashsoft Samsung 840:
        • System achieved (338486,314510,313778)=322258 SQL Server TPM at (73651,68320,68256)=  70076 NOPM
      • TEST RESULT Flashsoft SanDisk PX600-1000:

        • System achieved (338989,348164,336965)= 341373 SQL Server TPM at  (73685,75689,73206) = 74193 NOPM
      • TEST RESULT before:
        • System achieved =161882 SQL Server TPM at = 35150 NOPM
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes to the environment  the results should be stable.
    • concentrate on key metrics
      • While using Flashsoft with Samsung Basic 840 or SanDisk PX600-1000 I could nearly double the performance compared to the last run.
    •  is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
  • 6.3 Visualize your data
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1
    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis is true. I doubled the performance while I make use of the Flashsoft caching solution. I found that there is only a small difference between the Samsung and the SanDisk drive. SanDisk PX600-1000 should be much faster than the consumer SSD. The reason seems to be a new bottleneck I found. Page Latch waits are involved!
  • 7.2 Next Step
    • Is the hypothesis true?
      • Yes
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series where I’ll track down the Page Latch wait events and solve them.

6. SQL Server Performance Tuning study with HammerDB – Database Engine Tuning Advisor

In Part 5 an issue with the cost estimator has been solved and the HammerDB workload runs much faster. But what to tune now? Let’s give the Database Engine Tuning Advisor a chance for this performance tuning.

Phase 3 – Forming a hypothesis – Part 4

  • Based on observation and declaration form a hypothesis
    • Based on observation and the lessons I learned I believe the TPM/NOPM values should increase if we further tune SQL statements with the help of the Database Engine Tuning Advisor

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
      • In this case I concentrate only on the TPM/NOPM values.
    • some metrics as key metrics
      • TPM/NOPM
    • the right level of details
    • an efficient approach in terms of time and results
      • Adding indexes may take 1h
    • a tool you fully understand
      • The Database Engine Tuning Advisor will be used to analyze the plan cache and this tool will provide some advises how to introduce indexes, partitions  etc.
  • 4.3 document the defined method and setup a test plan

  I will run the following test plan and analyzing:

Test plan 1

Run the Database Engine Tuning Advisor and analyze the plan cache (last 1000 events)

Analyze the report

Maybe create indexes / partitions/ stats. etc

Start HammerDB workload

Stop HammerDB workload and compare this run with the baseline

Phase 5 – Testing the hypothesis – Test Plan 1

  • 5.1 Run the test plan

    • avoid or don’t test if other workloads are running
    • run the test at least two times

I recorded a video when running the test plan 1.  So as long as I stay in line with the TPC-C rules the Database Engine Tuning Advisor optimization is only related to creating new statistics.

  •  5.2 save the results
    • All results are saved into the log files

Phase 6 – Analysis of results – Test Plan 1

  • 6.2 Read and interpret all metrics
    • understand all metrics
    • compare metrics to basic/advanced baseline metrics
      • TEST RESULT NOW : System achieved =161882 SQL Server TPM at = 35150 NOPM
      • TEST RESULT BEFORE: System achieved 161091 SQL Server TPM at = 35013 NOPM
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • so no measurable changes
    •  is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
  • 6.3 Visualize your data
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1
    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis could not really tested. The Database Engine Advisor Engine provided changes which are not in line with the TPC-C so we could not really tune SQL Statements. BUT when there are no more options left to tune I will introduce new indexes.
  • 7.2 Next Step
    • Is the hypothesis true?
      • Not evaluated
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series. The end of the video showed that we should give Flashsoft another try to improve the disk latency.

5. SQL Server Performance Tuning study with HammerDB – TOP 10 most costly SQL statements

In the last post we found that the bottleneck seems not related to the wait events  ACCESS_METHODS_DATASET_PARENT. I learned a few points now! Tuning the wait events should not be the first step in tuning a database. Since ages it is recommended to tune a database workload starting by the application down to the hardware. The reason is obvious. The performance tuning factors you can get using a better hardware or optimizing your hardware is normally in a range between 5% to 100%. Tuning one SQL statement may increase the overall performance 10x , 50x  or maybe 1000x.

Phase 3 – Forming a hypothesis – Part 3

  • Based on observation and declaration form a hypothesis
    • Based on observation that the bottleneck is related to CPU and workload I believe: “The TPM/NOPM should be increasing if we improve the TOP 10 most costly SQL statements or at least the most costly of all”

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
    • some metrics as key metrics
      • total_worker_time for the most costly SQL statement will be declared as the key metric because the workload seems to be bounded by CPU
    • the right level of details
    • an efficient approach in terms of time and results
      • approx. 1 h
    • a tool you fully understand
      • Pinal Dave posted a nice script which I will use to list the TOP 10 most costly SQL statements and the used execution plan. I make use of the order by total_worker_time.
  • 4.3 document the defined method and setup a test plan

  I will run the following test plan and analyzing:

Test plan 1

Run the TOP10 script.

Identify and analyze the most costly SQL statement of all (make use of the execution plan analyzer)

Tune the discovered SQL statement

Start HammerDB workload

Run the TOP10 script and confirm if the most costly SQL statement of all is improved

Stop HammerDB workload and compare this run with the baseline

Phase 5 – Testing the hypothesis – Test Plan 1

  • 5.1 Run the test plan

    • avoid or don’t test if other workloads are running
    • run the test at least two times

I run the TOP10 Script:

SQLServer2014_TOP10_before

The most costly statement here is surprisingly a select statement. This is strange because the OLTP workload should most of the time try to update/insert something.

After a short Internet research I found this blog which showed that this is related to the changes of the SQL Server Query Optimizer cardinality estimation process.

So adding an Index or changing the Query Optimizer? I decide to change the database compatibility to the pre-SQL Server 2014 legacy CE. The right way would be to add an index or use the Trace Flag 948. But these changes would not stay in-line with TPC-C rules!

The execution plan for the SQL Select looks like this before the changes with a table scan of the stock table which costs a lot!:

SQL Server2 014 Execution plan before

I change the SQL Server Query Optimizer cardinality estimation for the tpcc database to pre-SQL Server 2014. After the change the sys.dm_exec_query_stats should be flushed.

SQL_Change_compatibily-level

I started the HammerDB workload again like shown in this video.

I run the TOP10 script:

SQLServer2014_TOP10_after

As you can see there are still two phases while running. One which shows CPU nearly at 90% and low disk access. And another with high disk access (saturated) and low CPU involved.

I stopped the HammerDB workload and compare this run with the baseline

  •  5.2 save the results
    • All results are saved into the log files

Phase 6 – Analysis of results – Test Plan 1

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • sys.dm_exec_query_stats show that the most costly SQL statement. The select which has been an issue is:

      • compare metrics to basic/advanced baseline metrics
        • TEST RESULT NOW : System achieved (149811,160772,172690)=161091 SQL Server TPM at (32541,34909,37590)= 35013 NOPM
        • TEST RESULT IN THE Beginning: System achieved 20820 SQL Server TPM at 4530 NOPM
      • is the result statistically correct?
        • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
      • has sensitivity analysis been done?
        • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
      • concentrate on key metrics
        • total_worker_time for the most costly SQL statement has been reduced to 13.339.391 compared to 1.286.386.037 before. The statement is not in the TOP10 anymore.

           
  • 6.3 Visualize your data
    • The screen-shots will do the job.
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1
    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis has been proven right. The TPM=161091 and NOPM=35013 reached shows that solving this bottleneck caused by the SQL Server Query Optimizer cardinality estimation seems to have a big influence. The performance increased around 7x!
  • 7.2 Next Step
    • Is the hypothesis true?
      • Yes. 
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series because I am pretty sure there is much more I can tune.

4. SQL Server Performance Tuning study with HammerDB – solving ACCESS_METHODS_DATASET_PARENT

In the last post we found that the bottleneck seems not related to the disc access latency. So I disabled Flashsoft 3.7 for now. The next part in this performance tuning study will show how we solve the wait events for ACCESS_METHODS_DATASET_PARENT.

Phase 3 – Forming a hypothesis – Part 2

  • Based on observation and declaration form a hypothesis
    • Based on observation and the last ShowIOBottlenecks run I believe: “The TPM/NOPM should be increasing if the wait events  Intra Query Parallelism – ACCESS_METHODS_DATASET_PARENT will be reduced/resolved”

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
    • some metrics as key metrics
    • the right level of details
    • an efficient approach in terms of time and results
      • approx. 10 min
    • a tool you fully understand
  • 4.3 document the defined method and setup a test plan

  I will run the following test plan and analyzing:

Test plan 1

This post by sqlskills points out that the ACCESS_METHODS_DATASET_PARENT I should investigate the MAX_DOP setting

I will check the Max Degree of Parallelism setting which should be 0 per default and set it to a value of 2. Because this server has HT active and we got 2 active cores.

Microsoft shows how to set the Max Degree of Parallelism for this instance.

I will run ShowIOBottlenecks while the HammerDB workload is running with Max Degree of Parallelism set to 2.

Phase 5 – Testing the hypothesis – Test Plan 1

  • 5.1 Run the test plan

    • avoid or don’t test if other workloads are running
    • run the test at least two times

I checked the actual Max Degree of Parallelism with this query:

nameminimummaximumconfig_valuerun_value
max degree of parallelism03276700

Then I set the Max Degree of Parallelism to 2:

The output has been:

I started the HammerDB workload again like shown in this video.

  •  5.2 save the results
    • All results are saved into the log files

Phase 6 – Analysis of results – Test Plan 1

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • The important metrics ShowIOBottlenecks showed that from a wait event view there are two main phases when running:
      • This one shows that CPU and Threading seems to be the bottleneck and the disk is not saturated.SQLServer2014_MaxDegreePar_active_CPU
      • This one shows that I/O WRITELOG seems to be the bottleneck and CPU still got a ~30% wait time.SQLServer2014_MaxDegreePar_active_WriteLog
  • compare metrics to basic/advanced baseline metrics
    • TEST RESULT NOW : System achieved 20227 SQL Server TPM at 4443 NOPM (Best run)
    • TEST RESULT IN THE Beginning: System achieved 20820 SQL Server TPM at 4530 NOPM
  • is the result statistically correct?
    • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
  • has sensitivity analysis been done?
    • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
  • concentrate on key metrics
    • The ACCESS_METHODS_DATASET_PARENT is sometimes still arising (<=10%) but it seems to be solved for now.
  • 6.3 Visualize your data
    • The screen-shots will do the job.
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1
    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis has been proven wrong. The TPM=20170 and NOPM=4336 reached shows that solving the ACCESS_METHODS_DATASET_PARENT seems to have nearly no influence at the moment.
    • The reasons why the hypothesis is wrong:
      • It seems we solved this bottleneck but it seems there is another issue which we did not discover yet.
      • The setting MAX DOP to 2 will be used in this case because it is recommend by Microsoft.
  • 7.2 Next Step
    • Is the hypothesis true?
      • No. 
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series. And believe me. The TPM and NOPM values will increase 🙂

3. SQL Server Performance Tuning study with HammerDB – Using 8PP – Part1

In the post 8PP I described a scientific approach to test a system. In this series I will follow 8PP and showcase how it can be used to fulfill a performance tuning of SQL Server for this HammerDB workload.

The workload I will tune is as follow:

HammerDB TPC-C autopilot workload which runs three times a timed driver script with 25 virtual user with a ramped up time of 1 min and a runtime of 5 Min. The minutes per test are set to 8 min to give the SQL Server a chance to settle after each run. The average TPM has been (20998,20178,21283)= 20820 and NOPM has been (4602,4380,4609)=4530. Watch the beginning of this video to understand how I setup the workload.

Phase 1 – Observation

1.1 Understand the problem/issue

  • Talk to all responsible people if possible
    • I am responsible for the whole setup.
  • Is the problem/issue based on a real workload?
    • No! It is always important to know if the workload is based on a real workload because some synthetic tests are not well chosen to test a system. Your chance to avoid spending time on senseless testing.
  • Is the evaluation technique appropriate?
    • In this case I accept that this test is synthetic and consider it is appropriate for my target.

1.2 Define your universe

  • If possible isolate the system as much as you can
    • The whole test is running on one PC. I could increase the isolation if I uninstall  the unused 10Gbe NICs and the management NIC which I am using for RDP access. I consider this should have nearly zero impact on my testing.
  • Make sure to write down exactly how your system/environment is build
    • Firmware, OS, driver, app versions, etc…

1.3 Define and run basic baseline tests (CPU,MEM,NET,STORAGE)

  • Define the basic tests and run them while the application is stopped
    • These test are done to make sure that there is no mayor bottleneck introduced because of a bad configured hardware environment.
    • I used PerformanceTest 8.0 to check CPU and MEM.
      • CPU results
        • The CPU Mark reached only 5467. Remember only 2 out of 4 cores are active. I will repeat this test when all 4 cores are online again.
        • Passmark_comparicon
      • MEM results
        • The Memory Mark reached 2661 and a latency of 28.2. Compared to the baselines of 29ns its right configured.
        • Passmark_comparicon_Mem
    • I skipped the NET test because it should not be involved in this testing.
    • I tested the SSD storage before:
  • Document the basic baseline tests
    • All basic base line tests are documented here. The results seems to be in-line with the well-known test from the Internet.
  • Compare to older basic baseline tests if any are available
    • There are no old basic baseline tests I could use to compare.

1.4 Describe the problem/issue in detail

  • Document the symptoms of the problem/issue
    • I started HammerDB TPC-C autopilot workload which runs three times a timed driver script with 25 virtual user with a ramped up time of 1 min and runtime of 5 Min. The minutes per test are set to 8 min to give the SQL Server a chance to settle after each run. The average TPM has been (20998,20178,21283)= 20820 and NOPM has been (4602,4380,4609)=4530.
  • Document the system behavior (CPU,MEM,NETWORK,Storage) while the problem/issue arise
    • While running the HammerDB workload I monitored the utilization of CPU,Memory and Network which showed that CPU is ~100% workload. No memory pressure and no traffic on the network. The disk showed a response time of up to ~30ms and up to ~5MB/sec.
    • For sure this documentation could be in more detail but should be okay for now.

Phase 2 – Declaration of the end goal or issue

  • Official declare the goal or issue
    • The goal is to increase the TPM/NOPM number of the HammerDB TPC-C workload with 25 virtual user with a ramped up of 1 min and runtime of 5 min to the highest possible number. I will resolve all bottlenecks regardless if hardware, software,  SQL Server options or SQL schema as long it is transparent to the HammerDB driver script. This means it is okay to add an index, or make use of NAND flash or more CPUs. But it is not allowed to change the driver script itself and it needs to run without errors.
  • Agree with all participants on this goal or issue
    • I agree with this goal 🙂 

Phase 3 – Forming a hypothesis – Part 1

  • Based on observation and declaration form a hypothesis
    • Based on observation 1.4 – I believe: “The TPM/NOPM should be increasing if the access time to the data on the disk will be faster”

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
      • latency of database files
      • wait event time of IO in relation to the whole wait time
      • read/write throughput
      • read/write distribution
    • some metrics as key metrics
      • wait event time of IOs in relation to the whole wait time
    • the right level of details
    • an efficient approach in terms of time and results
      • approx. 2h
    • a tool you fully understand
  • 4.3 document the defined method and setup a test plan

  I will run the following test plan and analyzing:

Test plan 1

I will run ShowIOBottlenecks while the HammerDB workload is running.

Prove if there are significant wait events for the disk access in relation to the whole wait time.

Test plan 2

I will run the HammerDB workload and use the Resource Monitor to monitor disk access.

If there are significant wait events for the disk access I will further check if its related to READ or WRITE.

If related to READ I will make use of ioTurbine Profiler which shows if a read cache could help here.

Test plan 3

Depending of the findings I will introduce a read/write cache with the help of Flashsoft 3.7 to cache the data files.

Again run ShowIOBottlenecks while the HammerDB workload is running.

Document my findings on this blog.

Phase 5 – Testing the hypothesis – Test Plan 1

  • 5.1 Run the test plan

    • avoid or don’t test if other workloads are running
    • run the test at least two times
  1. I recorded a short video to demonstrate how I used the ShowIOBottlenecks script to verify if there is a IO bottleneck.

  • 5.2 save the results
    • I saved the results of ShowIOBottleneckss to a xlsx.

Phase 6 – Analysis of results – Test Plan 1

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • The important metrics ShowIOBottlenecks showed are:
        • around 40% Intra Query Parallelism wait events for ACCESS_METHODS_DATASET_PARENT
        • around 25% CPU & Threading (CPU)
        • around 20% of the time the writelog IO
        • The latency to the data files seem to be high. ~30ms
    • compare metrics to basic/advanced baseline metrics
      • There are no baseline so far
    • is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • wait event time of IO is 20% of to the whole wait time
  • 6.3 Visualize your data
    • HammerDB_initial_autopilot_run3x25_wait_distribution
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1”

    • The high Intra Query Parallelism could be considered strange. It means that another bottleneck exist which I should solve in first place. But I will ignore it right now.
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 5 – Testing the hypothesis – Test Plan 2

  • 5.1 Run the test plan
  1. I recorded a short video to demonstrate how I used the ShowIOBottlenecks script. Two times I showed how to display the read/write distribution with Windows Resource Monitor. Min 2:00 and 3:35.

  • 5.2 save the results
    • The read has been ~10.000 B/sec
    • The write has been ~2.752.000 B/sec

Phase 6 – Analysis of results – Test Plan 2

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • read and write B/sec
    • compare metrics to basic/advanced baseline metrics
      • There are no baseline so far
    • is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • The read B/sec is ~ 2,8% of all disk B/sec. So it does not look like that a read cache will help.
      • Just for showcasing the ioTurbine Profiler:
        • I started the HammerDB workload again and monitored the D:\ device. I ignored 4.2 (time) in this case which should not be done in real environments.
  • 6.3 Visualize your data
    • read_write_distribution
    • watch the ioTurbine Profiler video !!!!
    • https://www.youtube.com/watch?v=O_eDnnIZ6wo
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1”

    • Nothing strange. It’s an OLTP workload so there should be more writes than reads.
  • 6.5 Present understandable graphics for your audience
    • See 6.3.

Phase 5 – Testing the hypothesis – Test Plan 3

  • 5.1 Run the test plan
  • The Analysis showed that faster writes could make sense and reduce the wait events for the write log which are up to 20% of the whole wait time. There are different options to achieve fast writes for the log.
    • Introduce a faster storage for the log file
    • Introduce a write cache for the log file
    • Tune the log file flush mechanisms and log file structures
    • Tune the application to avoid too much commits.
  • As defined we are not able to tune the application. We could try to tune the log file itself but the best bet should be faster storage or a write cache because we know the D:\ is a slow HDD. In this case I will start with the write cache. The reason is that a write cache could be used transparently for a single device, logical device or pool and for block devices provided via a SAN, even when using a shared file systems like CSV or VMFS.
  1. recorded a short video where I am configuring Flashsoft 3.7 to use the Samsung 840 Basic SSD as a read/write cache for the D:\ drive. Then I started the HammerDB again and let the baseline run again and started the ShowIOBottlenecks script.
  • 5.2 save the results
    • The results are saved in the log files.

Phase 6 – Analysis of results – Test Plan 3

  • 6.2 Read and interpret all metrics
    • understand all metrics
    • compare metrics to basic/advanced baseline metrics
      • The baseline is documented in 1.4 with TPM=20820 and NOPM=4530
      • With Flashsoft 3.7 activated we reached TPM=20170 and NOPM=4336
    • is the result statistically correct?
      • More or less. The test run 3 times in a row and the last run has been recorded.
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • The key metric is: wait event time of IOs in relation to the whole wait time. The video showed we could reduce the wait time for write logs to <=1% which has been up to 20%.
  • 6.3 Visualize your data
    • I skipped it because it is that simple to understand. 20% wait time for write log dropped to <=1%.
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1”

    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis has been proven wrong. The TPM=20170 and NOPM=4336 I reached with Flashsoft 3.7 and faster disk access seems to have no influence at the moment. Even the workload runs slower than before. I believe that is related to some CPU  overhead introduced by Flashsoft. But that’s not proven.
    • The reasons why the hypothesis is wrong:
      • I ignored strange values at Test plan 1 –  6.4
      • I ignored to understand all metrics at Test plan 1 – 6.1
        • How much time is waiting time compared to the runtime? This is a difficult topic because the wait times we see is related to sys.dm_os_wait_stats and sys.dm_os_latch_stats. These values are the aggregated values of all tasks. This counter can only indicate if there are wait events and which one maybe a problem. In this case disc access time is not the bottleneck.
        • In the video at 7:50 you can see 41822 wait events occurred for Intra Query Parallelism and the avg waits per ms is 340. The script ShowIOBottlenecks runs for ~5000ms so a value of 8,3 should be there. I found a bug in the script which I corrected here.
  • 7.2 Next Step
    • Is the hypothesis true?
      • No. 
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series. And believe me. The TPM and NOPM values will increase 🙂

Go VLC.