3. SQL Server Performance Tuning study with HammerDB – Using 8PP – Part1

SQL Server performance study with HammerDB

full course
  1. 1. SQL Server Performance Tuning study with HammerDB – Setup SQL Server
  2. 2. SQL Server Performance Tuning study with HammerDB – Setup HammerDB
  3. 3. SQL Server Performance Tuning study with HammerDB – Using 8PP – Part1
  4. 4. SQL Server Performance Tuning study with HammerDB – solving ACCESS_METHODS_DATASET_PARENT
  5. 5. SQL Server Performance Tuning study with HammerDB – TOP 10 most costly SQL statements
  6. 6. SQL Server Performance Tuning study with HammerDB – Database Engine Tuning Advisor
  7. 7. SQL Server Performance Tuning study with HammerDB – Flashsoft and PX600 unleash the full power
  8. 8. SQL Server Performance Tuning study with HammerDB – Solve PAGEIOLATCH latch contention

In the post 8PP I described a scientific approach to test a system. In this series I will follow 8PP and showcase how it can be used to fulfill a performance tuning of SQL Server for this HammerDB workload.

The workload I will tune is as follow:

HammerDB TPC-C autopilot workload which runs three times a timed driver script with 25 virtual user with a ramped up time of 1 min and a runtime of 5 Min. The minutes per test are set to 8 min to give the SQL Server a chance to settle after each run. The average TPM has been (20998,20178,21283)= 20820 and NOPM has been (4602,4380,4609)=4530. Watch the beginning of this video to understand how I setup the workload.

Phase 1 – Observation

1.1 Understand the problem/issue

  • Talk to all responsible people if possible
    • I am responsible for the whole setup.
  • Is the problem/issue based on a real workload?
    • No! It is always important to know if the workload is based on a real workload because some synthetic tests are not well chosen to test a system. Your chance to avoid spending time on senseless testing.
  • Is the evaluation technique appropriate?
    • In this case I accept that this test is synthetic and consider it is appropriate for my target.

1.2 Define your universe

  • If possible isolate the system as much as you can
    • The whole test is running on one PC. I could increase the isolation if I uninstall  the unused 10Gbe NICs and the management NIC which I am using for RDP access. I consider this should have nearly zero impact on my testing.
  • Make sure to write down exactly how your system/environment is build
    • Firmware, OS, driver, app versions, etc…

1.3 Define and run basic baseline tests (CPU,MEM,NET,STORAGE)

  • Define the basic tests and run them while the application is stopped
    • These test are done to make sure that there is no mayor bottleneck introduced because of a bad configured hardware environment.
    • I used PerformanceTest 8.0 to check CPU and MEM.
      • CPU results
        • The CPU Mark reached only 5467. Remember only 2 out of 4 cores are active. I will repeat this test when all 4 cores are online again.
        • Passmark_comparicon
      • MEM results
        • The Memory Mark reached 2661 and a latency of 28.2. Compared to the baselines of 29ns its right configured.
        • Passmark_comparicon_Mem
    • I skipped the NET test because it should not be involved in this testing.
    • I tested the SSD storage before:
  • Document the basic baseline tests
    • All basic base line tests are documented here. The results seems to be in-line with the well-known test from the Internet.
  • Compare to older basic baseline tests if any are available
    • There are no old basic baseline tests I could use to compare.

1.4 Describe the problem/issue in detail

  • Document the symptoms of the problem/issue
    • I started HammerDB TPC-C autopilot workload which runs three times a timed driver script with 25 virtual user with a ramped up time of 1 min and runtime of 5 Min. The minutes per test are set to 8 min to give the SQL Server a chance to settle after each run. The average TPM has been (20998,20178,21283)= 20820 and NOPM has been (4602,4380,4609)=4530.
  • Document the system behavior (CPU,MEM,NETWORK,Storage) while the problem/issue arise
    • While running the HammerDB workload I monitored the utilization of CPU,Memory and Network which showed that CPU is ~100% workload. No memory pressure and no traffic on the network. The disk showed a response time of up to ~30ms and up to ~5MB/sec.
    • For sure this documentation could be in more detail but should be okay for now.

Phase 2 – Declaration of the end goal or issue

  • Official declare the goal or issue
    • The goal is to increase the TPM/NOPM number of the HammerDB TPC-C workload with 25 virtual user with a ramped up of 1 min and runtime of 5 min to the highest possible number. I will resolve all bottlenecks regardless if hardware, software,  SQL Server options or SQL schema as long it is transparent to the HammerDB driver script. This means it is okay to add an index, or make use of NAND flash or more CPUs. But it is not allowed to change the driver script itself and it needs to run without errors.
  • Agree with all participants on this goal or issue
    • I agree with this goal 🙂 

Phase 3 – Forming a hypothesis – Part 1

  • Based on observation and declaration form a hypothesis
    • Based on observation 1.4 – I believe: “The TPM/NOPM should be increasing if the access time to the data on the disk will be faster”

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
      • latency of database files
      • wait event time of IO in relation to the whole wait time
      • read/write throughput
      • read/write distribution
    • some metrics as key metrics
      • wait event time of IOs in relation to the whole wait time
    • the right level of details
    • an efficient approach in terms of time and results
      • approx. 2h
    • a tool you fully understand
  • 4.3 document the defined method and setup a test plan

  I will run the following test plan and analyzing:

Test plan 1

I will run ShowIOBottlenecks while the HammerDB workload is running.

Prove if there are significant wait events for the disk access in relation to the whole wait time.

Test plan 2

I will run the HammerDB workload and use the Resource Monitor to monitor disk access.

If there are significant wait events for the disk access I will further check if its related to READ or WRITE.

If related to READ I will make use of ioTurbine Profiler which shows if a read cache could help here.

Test plan 3

Depending of the findings I will introduce a read/write cache with the help of Flashsoft 3.7 to cache the data files.

Again run ShowIOBottlenecks while the HammerDB workload is running.

Document my findings on this blog.

Phase 5 – Testing the hypothesis – Test Plan 1

  • 5.1 Run the test plan

    • avoid or don’t test if other workloads are running
    • run the test at least two times
  1. I recorded a short video to demonstrate how I used the ShowIOBottlenecks script to verify if there is a IO bottleneck.

  • 5.2 save the results
    • I saved the results of ShowIOBottleneckss to a xlsx.

Phase 6 – Analysis of results – Test Plan 1

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • The important metrics ShowIOBottlenecks showed are:
        • around 40% Intra Query Parallelism wait events for ACCESS_METHODS_DATASET_PARENT
        • around 25% CPU & Threading (CPU)
        • around 20% of the time the writelog IO
        • The latency to the data files seem to be high. ~30ms
    • compare metrics to basic/advanced baseline metrics
      • There are no baseline so far
    • is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • wait event time of IO is 20% of to the whole wait time
  • 6.3 Visualize your data
    • HammerDB_initial_autopilot_run3x25_wait_distribution
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1”

    • The high Intra Query Parallelism could be considered strange. It means that another bottleneck exist which I should solve in first place. But I will ignore it right now.
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 5 – Testing the hypothesis – Test Plan 2

  • 5.1 Run the test plan
  1. I recorded a short video to demonstrate how I used the ShowIOBottlenecks script. Two times I showed how to display the read/write distribution with Windows Resource Monitor. Min 2:00 and 3:35.

  • 5.2 save the results
    • The read has been ~10.000 B/sec
    • The write has been ~2.752.000 B/sec

Phase 6 – Analysis of results – Test Plan 2

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • read and write B/sec
    • compare metrics to basic/advanced baseline metrics
      • There are no baseline so far
    • is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • The read B/sec is ~ 2,8% of all disk B/sec. So it does not look like that a read cache will help.
      • Just for showcasing the ioTurbine Profiler:
        • I started the HammerDB workload again and monitored the D:\ device. I ignored 4.2 (time) in this case which should not be done in real environments.
  • 6.3 Visualize your data
    • read_write_distribution
    • watch the ioTurbine Profiler video !!!!
    • https://www.youtube.com/watch?v=O_eDnnIZ6wo
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1”

    • Nothing strange. It’s an OLTP workload so there should be more writes than reads.
  • 6.5 Present understandable graphics for your audience
    • See 6.3.

Phase 5 – Testing the hypothesis – Test Plan 3

  • 5.1 Run the test plan
  • The Analysis showed that faster writes could make sense and reduce the wait events for the write log which are up to 20% of the whole wait time. There are different options to achieve fast writes for the log.
    • Introduce a faster storage for the log file
    • Introduce a write cache for the log file
    • Tune the log file flush mechanisms and log file structures
    • Tune the application to avoid too much commits.
  • As defined we are not able to tune the application. We could try to tune the log file itself but the best bet should be faster storage or a write cache because we know the D:\ is a slow HDD. In this case I will start with the write cache. The reason is that a write cache could be used transparently for a single device, logical device or pool and for block devices provided via a SAN, even when using a shared file systems like CSV or VMFS.
  1. recorded a short video where I am configuring Flashsoft 3.7 to use the Samsung 840 Basic SSD as a read/write cache for the D:\ drive. Then I started the HammerDB again and let the baseline run again and started the ShowIOBottlenecks script.
  • 5.2 save the results
    • The results are saved in the log files.

Phase 6 – Analysis of results – Test Plan 3

  • 6.2 Read and interpret all metrics
    • understand all metrics
    • compare metrics to basic/advanced baseline metrics
      • The baseline is documented in 1.4 with TPM=20820 and NOPM=4530
      • With Flashsoft 3.7 activated we reached TPM=20170 and NOPM=4336
    • is the result statistically correct?
      • More or less. The test run 3 times in a row and the last run has been recorded.
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • The key metric is: wait event time of IOs in relation to the whole wait time. The video showed we could reduce the wait time for write logs to <=1% which has been up to 20%.
  • 6.3 Visualize your data
    • I skipped it because it is that simple to understand. 20% wait time for write log dropped to <=1%.
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1”

    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis has been proven wrong. The TPM=20170 and NOPM=4336 I reached with Flashsoft 3.7 and faster disk access seems to have no influence at the moment. Even the workload runs slower than before. I believe that is related to some CPU  overhead introduced by Flashsoft. But that’s not proven.
    • The reasons why the hypothesis is wrong:
      • I ignored strange values at Test plan 1 –  6.4
      • I ignored to understand all metrics at Test plan 1 – 6.1
        • How much time is waiting time compared to the runtime? This is a difficult topic because the wait times we see is related to sys.dm_os_wait_stats and sys.dm_os_latch_stats. These values are the aggregated values of all tasks. This counter can only indicate if there are wait events and which one maybe a problem. In this case disc access time is not the bottleneck.
        • In the video at 7:50 you can see 41822 wait events occurred for Intra Query Parallelism and the avg waits per ms is 340. The script ShowIOBottlenecks runs for ~5000ms so a value of 8,3 should be there. I found a bug in the script which I corrected here.
  • 7.2 Next Step
    • Is the hypothesis true?
      • No. 
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series. And believe me. The TPM and NOPM values will increase 🙂

Go VLC.

 

Leave a Reply

Your email address will not be published. Required fields are marked *