Monthly Archives: October 2015

8. SQL Server Performance Tuning study with HammerDB – Solve PAGEIOLATCH latch contention

In the last part I found that there is a new bottleneck. It seems this is related to the PAGEIOLATCH_SH and PAGEIOLATCH_EX. The exact values depend on the time slots which is measured by the ShowIOBottlenecks script. The picture shows >70 percent wait time.

PagelatchIO

To track down the latch contention wait events Microsoft provides a decent whitepaper. I used the following script and run it several times to get an idea which resources are blocked.

PagelatchSH_5.1.240101

The resource_description column returned by this script provides the resource description in the format <DatabaseID,FileID,PageID> where the name of the database associated with DatabaseID can be determined by passing the value of DatabaseID to the DB_NAME () function.

First lets find out which table this is. This can be done via inspecting the the page and retrieving the Metadata ObjectId.

dbccpage

The metadata objectid is 373576369. Now it is easy to retrieve the related table name.


warehouse_tablename

It is the “warehouse” table.

What is the bottleneck here?

dbccpage

First of all this an explanation about the wait events:

PAGEIOLATCH_EX
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Exclusive mode. Long waits may indicate problems with the disk subsystem.

PAGEIOLATCH_SH
Occurs when a task is waiting on a latch for a buffer that is in an I/O request. The latch request is in Shared mode. Long waits may indicate problems with the disk subsystem

In our case this means a lot of inserts/updates are done when running the TPC-C workload and a task waits on a latch for this page shared or exclusive! When inspecting this page we know its the warehouse table and we created the database with 33 warehouses in the beginning.

The page size in SQL server is 8K and the 33 rows all fit just in one page (m_slotcnt =33). This means some operations can no be parallelized!!

To solve this I will change the “physical” design of this table which is still in-line with the TPC-C rules. There may be different ways to achieve this. I add a column and insert some text which forces SQL server to restructure the pages and then delete the column.

add_drop

Okay now check if the m_slotCnt is 1 which means every row is in one page.

dbccpage_new

It’s done.

ShowIOBottleneck_solvedWarehouse

When running the workload again the PAGEIOLATCH_SH and PAGELATCHIO_EX wait events are nearly gone.

Before:

  • System achieved 338989 SQL Server TPM at 73685 NOPM
  • System achieved 348164 SQL Server TPM at 75689 NOPM
  • System achieved 336965 SQL Server TPM at 73206 NOPM

After:

  • System achieved 386324 SQL Server TPM at 83941 NOPM
  • System achieved 370919 SQL Server TPM at 80620 NOPM
  • System achieved 366426 SQL Server TPM at 79726 NOPM

The workload increased slightly. Again I monitored that CPU is at 100% when running. At this point I could continue to tune the SQL statements as I did the last 2-3 posts. Remember I started the SQL Server Performance Tuning with 20820 TPM at 4530 NOPM. This means more then 10x faster!

But the next step maybe to add some hardware. This all runs on just 2 of the 4 cores which are available as I wrote in the first part.

Go ChaosMonkey!