Category Archives: SQL Server

8. SQL Server Performance Tuning study with HammerDB – Solve PAGEIOLATCH latch contention

In the last part I found that there is a new bottleneck. It seems this is related to the PAGEIOLATCH_SH and PAGEIOLATCH_EX. The exact values depend on the time slots which is measured by the ShowIOBottlenecks script. The picture shows >70 percent wait time.

PagelatchIO

To track down the latch contention wait events Microsoft provides a decent whitepaper. I used the following script and run it several times to get an idea which resources are blocked.

PagelatchSH_5.1.240101

The resource_description column returned by this script provides the resource description in the format <DatabaseID,FileID,PageID> where the name of the database associated with DatabaseID can be determined by passing the value of DatabaseID to the DB_NAME () function.

First lets find out which table this is. This can be done via inspecting the the page and retrieving the Metadata ObjectId.

dbccpage

The metadata objectid is 373576369. Now it is easy to retrieve the related table name.


warehouse_tablename

It is the “warehouse” table.

What is the bottleneck here?

dbccpage

First of all this an explanation about the wait events:

PAGEIOLATCH_EX
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Exclusive mode. Long waits may indicate problems with the disk subsystem.

PAGEIOLATCH_SH
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Shared mode. Long waits may indicate problems with the disk subsystem

In our case this means a lot of inserts/updates are done when running the TPC-C workload and a task waits on a latch for this page shared or exclusive! When inspecting this page we know its the warehouse table and we created the database with 33 warehouses in the beginning.

The page size in SQL server is 8K and the 33 rows all fit just in one page (m_slotcnt =33). This means some operations can no be parallelized!!

To solve this I will change the “physical” design of this table which is still in-line with the TPC-C rules. There may be different ways to achieve this. I add a column and insert some text which forces SQL server to restructure the pages and then delete the column.

add_drop

Okay now check if the m_slotCnt is 1 which means every row is in one page.

dbccpage_new

It’s done.

ShowIOBottleneck_solvedWarehouse

When running the workload again the PAGEIOLATCH_SH and PAGELATCHIO_EX wait events are nearly gone.

Before:

  • System achieved 338989 SQL Server TPM at 73685 NOPM
  • System achieved 348164 SQL Server TPM at 75689 NOPM
  • System achieved 336965 SQL Server TPM at 73206 NOPM

After:

  • System achieved 386324 SQL Server TPM at 83941 NOPM
  • System achieved 370919 SQL Server TPM at 80620 NOPM
  • System achieved 366426 SQL Server TPM at 79726 NOPM

The workload increased slightly. Again I monitored that CPU is at 100% when running. At this point I could continue to tune the SQL statements as I did the last 2-3 posts. Remember I started the SQL Server Performance Tuning with 20820 TPM at 4530 NOPM. This means more then 10x faster!

But the next step maybe to add some hardware. This all runs on just 2 of the 4 cores which are available as I wrote in the first part.

Go ChaosMonkey!

7. SQL Server Performance Tuning study with HammerDB – Flashsoft and PX600 unleash the full power

I solved all bottlenecks since we started this performance tuning study. But now I can’t find any improvements which can be done without altering the schema or indexes which is not allowed by TPC-C rules. It is a similar situation when you run a third party application with a database which you are not allowed to change. A great solution to improve the disk latency is caching based on Flash, because it is transparent to the application vendor. The advantage of Flashsoft 3.7 is that it provides a READ and WRITE cache. The write cache is the one which should help with this OLTP workload. Remember Flashsoft can cache FC,iSCSI,NFS and local devices.

Phase 3 – Forming a hypothesis – Part 5

  • Based on observation and declaration form a hypothesis
    • Based on observation and the lessons I learned, I believe the TPM/NOPM values should increase, if the disc access latency will be reduced with the use of READ/WRITE cache (Flashsoft).

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
      • In this case I concentrate only on the TPM/NOPM values.
    • some metrics as key metrics
      • TPM/NOPM
    • the right level of details
    • an efficient approach in terms of time and results
      • Installing and configuring Flashsoft will take 30 min
    • a tool you fully understand
  • 4.3 document the defined method and setup a test plan

  I will run the following test plans:

Test plan 1

Implement a READ/WRITE cache for SQL Server based on Samsung 840 basic

Start HammerDB workload

Run ShowIOBottlenecks and Resource Monitor

Stop HammerDB workload and compare this run with the baseline

Analyze the ShowIOBottlenecks

Test plan 2

If ShowIOBottlenecks still shows wait events for disk latency I will use the PX600-1000 as READ/ WRITE cache device

Start HammerDB workload

Run ShowIOBottlenecks and Resource Monitor

Stop HammerDB workload and compare this run with the baseline

Phase 5 – Testing the hypothesis – Test Plan 1+2

  • 5.1 Run the test plan
  • avoid or don’t test if other workloads are running
  • run the test at least two times

I recorded a video when running the test plan 1.

  • System achieved 338486 SQL Server TPM at 73651 NOPM
  • System achieved 314510 SQL Server TPM at 68320 NOPM
  • System achieved 313778 SQL Server TPM at 68256 NOPM

the Resource Monitor showed that there is still latency around 10ms and ShowIOBottlenecks shows that there are wait events for Write log. So I decided to use the PX600-1000.

I recorded a video when running the test plan 2.

  • System achieved 338989 SQL Server TPM at 73685 NOPM
  • System achieved 348164 SQL Server TPM at 75689 NOPM
  • System achieved 336965 SQL Server TPM at 73206 NOPM
  •  5.2 save the results
    • All results are saved into the log files

Phase 6 – Analysis of results – Test Plan 1+2

  • 6.2 Read and interpret all metrics
    • understand all metrics
    • compare metrics to basic/advanced baseline metrics
      • TEST RESULT Flashsoft Samsung 840:
        • System achieved (338486,314510,313778)=322258 SQL Server TPM at (73651,68320,68256)=  70076 NOPM
      • TEST RESULT Flashsoft SanDisk PX600-1000:

        • System achieved (338989,348164,336965)= 341373 SQL Server TPM at  (73685,75689,73206) = 74193 NOPM
      • TEST RESULT before:
        • System achieved =161882 SQL Server TPM at = 35150 NOPM
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes to the environment  the results should be stable.
    • concentrate on key metrics
      • While using Flashsoft with Samsung Basic 840 or SanDisk PX600-1000 I could nearly double the performance compared to the last run.
    •  is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
  • 6.3 Visualize your data
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1
    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis is true. I doubled the performance while I make use of the Flashsoft caching solution. I found that there is only a small difference between the Samsung and the SanDisk drive. SanDisk PX600-1000 should be much faster than the consumer SSD. The reason seems to be a new bottleneck I found. Page Latch waits are involved!
  • 7.2 Next Step
    • Is the hypothesis true?
      • Yes
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series where I’ll track down the Page Latch wait events and solve them.

6. SQL Server Performance Tuning study with HammerDB – Database Engine Tuning Advisor

In Part 5 an issue with the cost estimator has been solved and the HammerDB workload runs much faster. But what to tune now? Let’s give the Database Engine Tuning Advisor a chance for this performance tuning.

Phase 3 – Forming a hypothesis – Part 4

  • Based on observation and declaration form a hypothesis
    • Based on observation and the lessons I learned I believe the TPM/NOPM values should increase if we further tune SQL statements with the help of the Database Engine Tuning Advisor

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
      • In this case I concentrate only on the TPM/NOPM values.
    • some metrics as key metrics
      • TPM/NOPM
    • the right level of details
    • an efficient approach in terms of time and results
      • Adding indexes may take 1h
    • a tool you fully understand
      • The Database Engine Tuning Advisor will be used to analyze the plan cache and this tool will provide some advises how to introduce indexes, partitions  etc.
  • 4.3 document the defined method and setup a test plan

  I will run the following test plan and analyzing:

Test plan 1

Run the Database Engine Tuning Advisor and analyze the plan cache (last 1000 events)

Analyze the report

Maybe create indexes / partitions/ stats. etc

Start HammerDB workload

Stop HammerDB workload and compare this run with the baseline

Phase 5 – Testing the hypothesis – Test Plan 1

  • 5.1 Run the test plan

    • avoid or don’t test if other workloads are running
    • run the test at least two times

I recorded a video when running the test plan 1.  So as long as I stay in line with the TPC-C rules the Database Engine Tuning Advisor optimization is only related to creating new statistics.

  •  5.2 save the results
    • All results are saved into the log files

Phase 6 – Analysis of results – Test Plan 1

  • 6.2 Read and interpret all metrics
    • understand all metrics
    • compare metrics to basic/advanced baseline metrics
      • TEST RESULT NOW : System achieved =161882 SQL Server TPM at = 35150 NOPM
      • TEST RESULT BEFORE: System achieved 161091 SQL Server TPM at = 35013 NOPM
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • so no measurable changes
    •  is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
  • 6.3 Visualize your data
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1
    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis could not really tested. The Database Engine Advisor Engine provided changes which are not in line with the TPC-C so we could not really tune SQL Statements. BUT when there are no more options left to tune I will introduce new indexes.
  • 7.2 Next Step
    • Is the hypothesis true?
      • Not evaluated
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series. The end of the video showed that we should give Flashsoft another try to improve the disk latency.

5. SQL Server Performance Tuning study with HammerDB – TOP 10 most costly SQL statements

In the last post we found that the bottleneck seems not related to the wait events  ACCESS_METHODS_DATASET_PARENT. I learned a few points now! Tuning the wait events should not be the first step in tuning a database. Since ages it is recommended to tune a database workload starting by the application down to the hardware. The reason is obvious. The performance tuning factors you can get using a better hardware or optimizing your hardware is normally in a range between 5% to 100%. Tuning one SQL statement may increase the overall performance 10x , 50x  or maybe 1000x.

Phase 3 – Forming a hypothesis – Part 3

  • Based on observation and declaration form a hypothesis
    • Based on observation that the bottleneck is related to CPU and workload I believe: “The TPM/NOPM should be increasing if we improve the TOP 10 most costly SQL statements or at least the most costly of all”

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
    • some metrics as key metrics
      • total_worker_time for the most costly SQL statement will be declared as the key metric because the workload seems to be bounded by CPU
    • the right level of details
    • an efficient approach in terms of time and results
      • approx. 1 h
    • a tool you fully understand
      • Pinal Dave posted a nice script which I will use to list the TOP 10 most costly SQL statements and the used execution plan. I make use of the order by total_worker_time.
  • 4.3 document the defined method and setup a test plan

  I will run the following test plan and analyzing:

Test plan 1

Run the TOP10 script.

Identify and analyze the most costly SQL statement of all (make use of the execution plan analyzer)

Tune the discovered SQL statement

Start HammerDB workload

Run the TOP10 script and confirm if the most costly SQL statement of all is improved

Stop HammerDB workload and compare this run with the baseline

Phase 5 – Testing the hypothesis – Test Plan 1

  • 5.1 Run the test plan

    • avoid or don’t test if other workloads are running
    • run the test at least two times

I run the TOP10 Script:

SQLServer2014_TOP10_before

The most costly statement here is surprisingly a select statement. This is strange because the OLTP workload should most of the time try to update/insert something.

After a short Internet research I found this blog which showed that this is related to the changes of the SQL Server Query Optimizer cardinality estimation process.

So adding an Index or changing the Query Optimizer? I decide to change the database compatibility to the pre-SQL Server 2014 legacy CE. The right way would be to add an index or use the Trace Flag 948. But these changes would not stay in-line with TPC-C rules!

The execution plan for the SQL Select looks like this before the changes with a table scan of the stock table which costs a lot!:

SQL Server2 014 Execution plan before

I change the SQL Server Query Optimizer cardinality estimation for the tpcc database to pre-SQL Server 2014. After the change the sys.dm_exec_query_stats should be flushed.

SQL_Change_compatibily-level

I started the HammerDB workload again like shown in this video.

I run the TOP10 script:

SQLServer2014_TOP10_after

As you can see there are still two phases while running. One which shows CPU nearly at 90% and low disk access. And another with high disk access (saturated) and low CPU involved.

I stopped the HammerDB workload and compare this run with the baseline

  •  5.2 save the results
    • All results are saved into the log files

Phase 6 – Analysis of results – Test Plan 1

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • sys.dm_exec_query_stats show that the most costly SQL statement. The select which has been an issue is:

      • compare metrics to basic/advanced baseline metrics
        • TEST RESULT NOW : System achieved (149811,160772,172690)=161091 SQL Server TPM at (32541,34909,37590)= 35013 NOPM
        • TEST RESULT IN THE Beginning: System achieved 20820 SQL Server TPM at 4530 NOPM
      • is the result statistically correct?
        • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
      • has sensitivity analysis been done?
        • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
      • concentrate on key metrics
        • total_worker_time for the most costly SQL statement has been reduced to 13.339.391 compared to 1.286.386.037 before. The statement is not in the TOP10 anymore.

           
  • 6.3 Visualize your data
    • The screen-shots will do the job.
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1
    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis has been proven right. The TPM=161091 and NOPM=35013 reached shows that solving this bottleneck caused by the SQL Server Query Optimizer cardinality estimation seems to have a big influence. The performance increased around 7x!
  • 7.2 Next Step
    • Is the hypothesis true?
      • Yes. 
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series because I am pretty sure there is much more I can tune.

4. SQL Server Performance Tuning study with HammerDB – solving ACCESS_METHODS_DATASET_PARENT

In the last post we found that the bottleneck seems not related to the disc access latency. So I disabled Flashsoft 3.7 for now. The next part in this performance tuning study will show how we solve the wait events for ACCESS_METHODS_DATASET_PARENT.

Phase 3 – Forming a hypothesis – Part 2

  • Based on observation and declaration form a hypothesis
    • Based on observation and the last ShowIOBottlenecks run I believe: “The TPM/NOPM should be increasing if the wait events  Intra Query Parallelism – ACCESS_METHODS_DATASET_PARENT will be reduced/resolved”

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
    • some metrics as key metrics
    • the right level of details
    • an efficient approach in terms of time and results
      • approx. 10 min
    • a tool you fully understand
  • 4.3 document the defined method and setup a test plan

  I will run the following test plan and analyzing:

Test plan 1

This post by sqlskills points out that the ACCESS_METHODS_DATASET_PARENT I should investigate the MAX_DOP setting

I will check the Max Degree of Parallelism setting which should be 0 per default and set it to a value of 2. Because this server has HT active and we got 2 active cores.

Microsoft shows how to set the Max Degree of Parallelism for this instance.

I will run ShowIOBottlenecks while the HammerDB workload is running with Max Degree of Parallelism set to 2.

Phase 5 – Testing the hypothesis – Test Plan 1

  • 5.1 Run the test plan

    • avoid or don’t test if other workloads are running
    • run the test at least two times

I checked the actual Max Degree of Parallelism with this query:

nameminimummaximumconfig_valuerun_value
max degree of parallelism03276700

Then I set the Max Degree of Parallelism to 2:

The output has been:

I started the HammerDB workload again like shown in this video.

  •  5.2 save the results
    • All results are saved into the log files

Phase 6 – Analysis of results – Test Plan 1

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • The important metrics ShowIOBottlenecks showed that from a wait event view there are two main phases when running:
      • This one shows that CPU and Threading seems to be the bottleneck and the disk is not saturated.SQLServer2014_MaxDegreePar_active_CPU
      • This one shows that I/O WRITELOG seems to be the bottleneck and CPU still got a ~30% wait time.SQLServer2014_MaxDegreePar_active_WriteLog
  • compare metrics to basic/advanced baseline metrics
    • TEST RESULT NOW : System achieved 20227 SQL Server TPM at 4443 NOPM (Best run)
    • TEST RESULT IN THE Beginning: System achieved 20820 SQL Server TPM at 4530 NOPM
  • is the result statistically correct?
    • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
  • has sensitivity analysis been done?
    • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
  • concentrate on key metrics
    • The ACCESS_METHODS_DATASET_PARENT is sometimes still arising (<=10%) but it seems to be solved for now.
  • 6.3 Visualize your data
    • The screen-shots will do the job.
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1
    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis has been proven wrong. The TPM=20170 and NOPM=4336 reached shows that solving the ACCESS_METHODS_DATASET_PARENT seems to have nearly no influence at the moment.
    • The reasons why the hypothesis is wrong:
      • It seems we solved this bottleneck but it seems there is another issue which we did not discover yet.
      • The setting MAX DOP to 2 will be used in this case because it is recommend by Microsoft.
  • 7.2 Next Step
    • Is the hypothesis true?
      • No. 
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series. And believe me. The TPM and NOPM values will increase 🙂

3. SQL Server Performance Tuning study with HammerDB – Using 8PP – Part1

In the post 8PP I described a scientific approach to test a system. In this series I will follow 8PP and showcase how it can be used to fulfill a performance tuning of SQL Server for this HammerDB workload.

The workload I will tune is as follow:

HammerDB TPC-C autopilot workload which runs three times a timed driver script with 25 virtual user with a ramped up time of 1 min and a runtime of 5 Min. The minutes per test are set to 8 min to give the SQL Server a chance to settle after each run. The average TPM has been (20998,20178,21283)= 20820 and NOPM has been (4602,4380,4609)=4530. Watch the beginning of this video to understand how I setup the workload.

Phase 1 – Observation

1.1 Understand the problem/issue

  • Talk to all responsible people if possible
    • I am responsible for the whole setup.
  • Is the problem/issue based on a real workload?
    • No! It is always important to know if the workload is based on a real workload because some synthetic tests are not well chosen to test a system. Your chance to avoid spending time on senseless testing.
  • Is the evaluation technique appropriate?
    • In this case I accept that this test is synthetic and consider it is appropriate for my target.

1.2 Define your universe

  • If possible isolate the system as much as you can
    • The whole test is running on one PC. I could increase the isolation if I uninstall  the unused 10Gbe NICs and the management NIC which I am using for RDP access. I consider this should have nearly zero impact on my testing.
  • Make sure to write down exactly how your system/environment is build
    • Firmware, OS, driver, app versions, etc…

1.3 Define and run basic baseline tests (CPU,MEM,NET,STORAGE)

  • Define the basic tests and run them while the application is stopped
    • These test are done to make sure that there is no mayor bottleneck introduced because of a bad configured hardware environment.
    • I used PerformanceTest 8.0 to check CPU and MEM.
      • CPU results
        • The CPU Mark reached only 5467. Remember only 2 out of 4 cores are active. I will repeat this test when all 4 cores are online again.
        • Passmark_comparicon
      • MEM results
        • The Memory Mark reached 2661 and a latency of 28.2. Compared to the baselines of 29ns its right configured.
        • Passmark_comparicon_Mem
    • I skipped the NET test because it should not be involved in this testing.
    • I tested the SSD storage before:
  • Document the basic baseline tests
    • All basic base line tests are documented here. The results seems to be in-line with the well-known test from the Internet.
  • Compare to older basic baseline tests if any are available
    • There are no old basic baseline tests I could use to compare.

1.4 Describe the problem/issue in detail

  • Document the symptoms of the problem/issue
    • I started HammerDB TPC-C autopilot workload which runs three times a timed driver script with 25 virtual user with a ramped up time of 1 min and runtime of 5 Min. The minutes per test are set to 8 min to give the SQL Server a chance to settle after each run. The average TPM has been (20998,20178,21283)= 20820 and NOPM has been (4602,4380,4609)=4530.
  • Document the system behavior (CPU,MEM,NETWORK,Storage) while the problem/issue arise
    • While running the HammerDB workload I monitored the utilization of CPU,Memory and Network which showed that CPU is ~100% workload. No memory pressure and no traffic on the network. The disk showed a response time of up to ~30ms and up to ~5MB/sec.
    • For sure this documentation could be in more detail but should be okay for now.

Phase 2 – Declaration of the end goal or issue

  • Official declare the goal or issue
    • The goal is to increase the TPM/NOPM number of the HammerDB TPC-C workload with 25 virtual user with a ramped up of 1 min and runtime of 5 min to the highest possible number. I will resolve all bottlenecks regardless if hardware, software,  SQL Server options or SQL schema as long it is transparent to the HammerDB driver script. This means it is okay to add an index, or make use of NAND flash or more CPUs. But it is not allowed to change the driver script itself and it needs to run without errors.
  • Agree with all participants on this goal or issue
    • I agree with this goal 🙂 

Phase 3 – Forming a hypothesis – Part 1

  • Based on observation and declaration form a hypothesis
    • Based on observation 1.4 – I believe: “The TPM/NOPM should be increasing if the access time to the data on the disk will be faster”

Phase 4 – Define an appropriated method to test the hypothesis

  • 4.1 don’t define too complex methods
  • 4.2 choose … for testing the hypothesis
    • the right workload
      • original workload
    • the right metrics
      • latency of database files
      • wait event time of IO in relation to the whole wait time
      • read/write throughput
      • read/write distribution
    • some metrics as key metrics
      • wait event time of IOs in relation to the whole wait time
    • the right level of details
    • an efficient approach in terms of time and results
      • approx. 2h
    • a tool you fully understand
  • 4.3 document the defined method and setup a test plan

  I will run the following test plan and analyzing:

Test plan 1

I will run ShowIOBottlenecks while the HammerDB workload is running.

Prove if there are significant wait events for the disk access in relation to the whole wait time.

Test plan 2

I will run the HammerDB workload and use the Resource Monitor to monitor disk access.

If there are significant wait events for the disk access I will further check if its related to READ or WRITE.

If related to READ I will make use of ioTurbine Profiler which shows if a read cache could help here.

Test plan 3

Depending of the findings I will introduce a read/write cache with the help of Flashsoft 3.7 to cache the data files.

Again run ShowIOBottlenecks while the HammerDB workload is running.

Document my findings on this blog.

Phase 5 – Testing the hypothesis – Test Plan 1

  • 5.1 Run the test plan

    • avoid or don’t test if other workloads are running
    • run the test at least two times
  1. I recorded a short video to demonstrate how I used the ShowIOBottlenecks script to verify if there is a IO bottleneck.

  • 5.2 save the results
    • I saved the results of ShowIOBottleneckss to a xlsx.

Phase 6 – Analysis of results – Test Plan 1

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • The important metrics ShowIOBottlenecks showed are:
        • around 40% Intra Query Parallelism wait events for ACCESS_METHODS_DATASET_PARENT
        • around 25% CPU & Threading (CPU)
        • around 20% of the time the writelog IO
        • The latency to the data files seem to be high. ~30ms
    • compare metrics to basic/advanced baseline metrics
      • There are no baseline so far
    • is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • wait event time of IO is 20% of to the whole wait time
  • 6.3 Visualize your data
    • HammerDB_initial_autopilot_run3x25_wait_distribution
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1”

    • The high Intra Query Parallelism could be considered strange. It means that another bottleneck exist which I should solve in first place. But I will ignore it right now.
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 5 – Testing the hypothesis – Test Plan 2

  • 5.1 Run the test plan
  1. I recorded a short video to demonstrate how I used the ShowIOBottlenecks script. Two times I showed how to display the read/write distribution with Windows Resource Monitor. Min 2:00 and 3:35.

  • 5.2 save the results
    • The read has been ~10.000 B/sec
    • The write has been ~2.752.000 B/sec

Phase 6 – Analysis of results – Test Plan 2

  • 6.2 Read and interpret all metrics
    • understand all metrics
      • read and write B/sec
    • compare metrics to basic/advanced baseline metrics
      • There are no baseline so far
    • is the result statistically correct?
      • No. The selection was only one point in time. I repeated the test a few times with a similar result, but still no.
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • The read B/sec is ~ 2,8% of all disk B/sec. So it does not look like that a read cache will help.
      • Just for showcasing the ioTurbine Profiler:
        • I started the HammerDB workload again and monitored the D:\ device. I ignored 4.2 (time) in this case which should not be done in real environments.
  • 6.3 Visualize your data
    • read_write_distribution
    • watch the ioTurbine Profiler video !!!!
    • https://www.youtube.com/watch?v=O_eDnnIZ6wo
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1”

    • Nothing strange. It’s an OLTP workload so there should be more writes than reads.
  • 6.5 Present understandable graphics for your audience
    • See 6.3.

Phase 5 – Testing the hypothesis – Test Plan 3

  • 5.1 Run the test plan
  • The Analysis showed that faster writes could make sense and reduce the wait events for the write log which are up to 20% of the whole wait time. There are different options to achieve fast writes for the log.
    • Introduce a faster storage for the log file
    • Introduce a write cache for the log file
    • Tune the log file flush mechanisms and log file structures
    • Tune the application to avoid too much commits.
  • As defined we are not able to tune the application. We could try to tune the log file itself but the best bet should be faster storage or a write cache because we know the D:\ is a slow HDD. In this case I will start with the write cache. The reason is that a write cache could be used transparently for a single device, logical device or pool and for block devices provided via a SAN, even when using a shared file systems like CSV or VMFS.
  1. recorded a short video where I am configuring Flashsoft 3.7 to use the Samsung 840 Basic SSD as a read/write cache for the D:\ drive. Then I started the HammerDB again and let the baseline run again and started the ShowIOBottlenecks script.
  • 5.2 save the results
    • The results are saved in the log files.

Phase 6 – Analysis of results – Test Plan 3

  • 6.2 Read and interpret all metrics
    • understand all metrics
    • compare metrics to basic/advanced baseline metrics
      • The baseline is documented in 1.4 with TPM=20820 and NOPM=4530
      • With Flashsoft 3.7 activated we reached TPM=20170 and NOPM=4336
    • is the result statistically correct?
      • More or less. The test run 3 times in a row and the last run has been recorded.
    • has sensitivity analysis been done?
      • Just an approximation. There are so much variables even in this simple environment that this would take too much time. The approximation shows that as long I don’t make changes myself to the environment  the results should be stable.
    • concentrate on key metrics
      • The key metric is: wait event time of IOs in relation to the whole wait time. The video showed we could reduce the wait time for write logs to <=1% which has been up to 20%.
  • 6.3 Visualize your data
    • I skipped it because it is that simple to understand. 20% wait time for write log dropped to <=1%.
  • 6.4 “Strange” results means you need to go back to “Phase 4.2 or 1.1”

    • nothing strange
  • 6.5 Present understandable graphics for your audience
    • Done.

Phase 7 – Conclusion

Is the goal or issue well defined? If not go back to  “Phase 1.1”

  • 7.1 Form a conclusion if and how the hypothesis achieved the goal or solved the issue!
    • The hypothesis has been proven wrong. The TPM=20170 and NOPM=4336 I reached with Flashsoft 3.7 and faster disk access seems to have no influence at the moment. Even the workload runs slower than before. I believe that is related to some CPU  overhead introduced by Flashsoft. But that’s not proven.
    • The reasons why the hypothesis is wrong:
      • I ignored strange values at Test plan 1 –  6.4
      • I ignored to understand all metrics at Test plan 1 – 6.1
        • How much time is waiting time compared to the runtime? This is a difficult topic because the wait times we see is related to sys.dm_os_wait_stats and sys.dm_os_latch_stats. These values are the aggregated values of all tasks. This counter can only indicate if there are wait events and which one maybe a problem. In this case disc access time is not the bottleneck.
        • In the video at 7:50 you can see 41822 wait events occurred for Intra Query Parallelism and the avg waits per ms is 340. The script ShowIOBottlenecks runs for ~5000ms so a value of 8,3 should be there. I found a bug in the script which I corrected here.
  • 7.2 Next Step
    • Is the hypothesis true?
      • No. 
      • if goal/issue is not achieved/solved, form a new hypothesis.
        • I will form a new hypothesis in the next post of this series. And believe me. The TPM and NOPM values will increase 🙂

Go VLC.

 

2. SQL Server Performance Tuning study with HammerDB – Setup HammerDB

In the last post of this performance tuning study the setup of SQL Server has been done which is up and running now. The next part is to install the workload generator HammerDB. I decided to use HammerDB because its open source and implements the TPC-C benchmark defined by http://www.tpc.org/.

Setup HammerDB

I downloaded the version  HammerDB-2.18-Win-x86-64 from the website and installed it to the C:\ drive.

The important part of HammerDB is to specify how you set it up and how you run it.

After launching HammerDB I switched to SQL Server and I choose TPC-C as the benchmark option.

DBHammer_TPC-C

Schema Build with 33 warehouses

I used 33 warehouses in this configuration. There is a reason why it is 33 and not for example 100. I will cover this later. The build run took a while.

HammerDB_schemaBuild

HammerDB_schemaBuildit

HammerDB is installed and ready to run synthetic workloads.

Go webmin!

1. SQL Server Performance Tuning study with HammerDB – Setup SQL Server

Based on a project with Thomas Kejser last year I started a new blog project to showcase a simple SQL Server performance tuning study. The target is to show some basic tuning options you could use to improve the SQL Server performance. I will make use of the 8PP to follow a scientific approach of tuning. But a database system like SQL Server 2014 SP 1 is a complex environment and I will cover only a few topics of the full monitor and performance tuning SQL Server provides.

The Use Case

Learn SQL Server basic tuning based on the 8PP approach!

Because I don’t have a good real world workload to tune I will use the tool HammerDB instead to generate a TPC-C workload. The focus of this study is the tuning itself so it’s not important which workload I use. I agree that the bottlenecks I will encounter may not represent real world workload ones.

The Setup

Hardware:

I will make use of Testverse as the hardware platform.

IMPORTANT!!!!! The active CPU cores have been set to 2 instead of 4.

The reason for this is that I believe there will be a point in time where the CPU will be a bottleneck. Then I will have a chance to show what influence more CPU power will have.

All tests will run on this machine so no network or other systems should be involved. Testverse is connected to the LAN and I will use RDP while running the tests. I keep an eye on the network part and consider this has little to no influence to the testing.

The OS drive Samsung 840 Basic is replaced by OCZ Agility 2 OCZSSD2-2AGTE120G.

The test SSD/PCIe devices will be installed with FOB performance in the beginning.

Operating System:

Windows 2012 R2 DataCenter Edition with all actual patches are installed (24.08.2015). Automatic updates will be deactivated for the test period. The Samsung EcoGreen F4 HDD is formatted with NTFS v3.1 with 512 bytes per Sector with 64k NTFS allocation unit size.

windows2012_disk_manager

SQL Server 2014 SP1:

The SQL Server will be installed right now. The instance root is set to the Samsung EcoGreen F4 HDD (D:\}. One Requirement is .Net Framework 3.5 SP1 which can be installed via the “Add Roles and Features”.

SQLServer2ß14_Net351

Feature selection

I select only the features which are really needed for this study. So basically its the “Database Engine Service” and the “Management Tools”. Remember I changed the instance root to the D:\ drive.

SQLServer2ß14_Feature_selection

Instance Configuration

In this case I make use of the default instance.

SQLServer2ß14_instance

Server Configuration

I enable all SQL Server services and set the “Startup Type” to Automatic. The agent and browser services maybe used later.

SQLServer2ß14_accounts

Database Engine Configuration

I decided to use the Mixed Mode. Last time I used HammerDB it has been difficult to go with Windows authentication mode. The local administrator group will be SQL Server administrators as well. This is not best practices but will work.

SQLServer2ß14_authentication

Let’s check with the SQL Server Management Studio if we are able to connect and run a simple select.

SQLServer2014_simple_select

Okay. SQL Server is up and running with the default configuration. I did not make changes to the default settings of SQL Server 2014 SP1 to make sure this test can be easily  reproduced.

Go HammerDB!