Stale statistics on a newly created temporary table in a stored procedure can lead to poor performance

When you create a temporary table you expect a new table with no past history (statistics based on past existence), this is not true if you have less than 6 updates to the temporary table. This might lead to poor performance of queries which are sensitive to the content of temporary tables.

I was optimizing SQL Server Performance at one of my customers who provides search functionality on their website. They use stored procedure with temporary table for the search. The performance of the search depended on who searched what in the past, option (recompile) by itself had no effect. Sometimes a simple search led to timeout because of non-optimal plan usage due to this behavior. This is not a plan caching issue rather temporary table statistics caching issue, which was part of the temporary object caching feature that was introduced in SQL Server 2005 and is also present in SQL Server 2008 and SQL Server 2012. In this customer case we implemented a workaround to avoid this issue (see below for example for workarounds).

When temporary tables are cached, the statistics are not newly created rather cached from the past and updated based on automatic update statistics threshold. Caching temporary tables/objects is good for performance, but caching stale statistics from the past is not optimal.

We can work around this issue by disabling temporary table caching by explicitly executing a DDL statement on the temporary table. One possibility is to execute an alter table statement, but this can lead to duplicate constraint name error on concurrent stored procedure execution. The other way to work around this is to create an index.

I think there might be many customers in such a situation without knowing that stale statistics are being cached along with temporary table leading to poor performance.

Ideal solution is to have more aggressive statistics update when the temporary table has less number of rows when temporary table caching is used. I will open a connect item to report this issue.

Meanwhile you can mitigate the issue by creating an index on the temporary table. You can monitor active temporary tables using Windows Server Performance Monitor counter: SQL Server: General Statistics->Active Temp Tables.

 The script to understand the issue and the workaround is listed below:

set nocount on

set statistics time off

set statistics io off

drop table tab7

go

create table tab7 (c1 int primary key clustered, c2 int, c3 char(200))

go

create index test on tab7(c2, c1, c3)

go

begin tran

declare @i int

set @i = 1

while @i <= 50000

begin

insert into tab7 values (@i, 1, ‘a’)

set @i = @i + 1

end

commit tran

go

insert into tab7 values (50001, 1, ‘a’)

go

checkpoint

go

drop proc test_slow

go

create proc test_slow @i int

as

begin

declare @j int

create table #temp1 (c1 int primary key)

insert into #temp1 (c1) select @i

select @j = t7.c1 from tab7 t7 inner join #temp1 t on (t7.c2 = t.c1)

end

go

dbcc dropcleanbuffers

set statistics time on

set statistics io on

go

–high reads as expected for parameter ’1′

exec test_slow 1

go

dbcc dropcleanbuffers

go

–high reads that are not expected for parameter ’2′

exec test_slow 2

go

drop proc test_with_recompile

go

create proc test_with_recompile @i int

as

begin

declare @j int

create table #temp1 (c1 int primary key)

insert into #temp1 (c1) select @i

select @j = t7.c1 from tab7 t7 inner join #temp1 t on (t7.c2 = t.c1)

option (recompile)

end

go

dbcc dropcleanbuffers

set statistics time on

set statistics io on

go

–high reads as expected for parameter ’1′

exec test_with_recompile 1

go

dbcc dropcleanbuffers

go

–high reads that are not expected for parameter ’2′

–low reads on 3rd execution as expected for parameter ’2′

exec test_with_recompile 2

go

drop proc test_with_alter_table_recompile

go

create proc test_with_alter_table_recompile @i int

as

begin

declare @j int

create table #temp1 (c1 int primary key)

–to avoid caching of temporary tables one can create a constraint

–but this might lead to duplicate constraint name error on concurrent usage

alter table #temp1 add constraint test123 unique(c1)

insert into #temp1 (c1) select @i

select @j = t7.c1 from tab7 t7 inner join #temp1 t on (t7.c2 = t.c1)

option (recompile)

end

go

dbcc dropcleanbuffers

set statistics time on

set statistics io on

go

–high reads as expected for parameter ’1′

exec test_with_alter_table_recompile 1

go

dbcc dropcleanbuffers

go

–low reads as expected for parameter ’2′

exec test_with_alter_table_recompile 2

go

drop proc test_with_index_recompile

go

create proc test_with_index_recompile @i int

as

begin

declare @j int

create table #temp1 (c1 int primary key)

–to avoid caching of temporary tables one can create an index

create index test on #temp1(c1)

insert into #temp1 (c1) select @i

select @j = t7.c1 from tab7 t7 inner join #temp1 t on (t7.c2 = t.c1)

option (recompile)

end

go

set statistics time on

set statistics io on

dbcc dropcleanbuffers

go

–high reads as expected for parameter ’1′

exec test_with_index_recompile 1

go

dbcc dropcleanbuffers

go

–low reads as expected for parameter ’2′

exec test_with_index_recompile 2

go

Plan Caching and Query Memory Part 2: When not to use stored procedure or other plan caching mechanisms like sp_executesql or prepared statement

SQL Server estimates Memory requirement at compile time, when stored procedure or other plan caching mechanisms like sp_executesql or prepared statement are used, the memory requirement is estimated based on first set of execution parameters. This is a common reason for spill over tempdb and hence poor performance. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. This article covers Hash Match operations with examples. It is recommended to read Plan Caching and Query Memory Part I before this article which covers an introduction and Query memory for Sort. In most cases it is cheaper to pay for the compilation cost of dynamic queries than huge cost for spill over tempdb, unless memory requirement for a query does not change significantly based on predicates.

This article covers underestimation / overestimation of memory for Hash Match operation. Plan Caching and Query Memory Part I covers underestimation / overestimation for Sort. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

To read additional articles I wrote click here. To watch the webcasts click here.

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Most of these concepts are also covered in our webcasts: www.sqlworkshops.com/webcasts

Let’s create a Customer’s State table that has 99% of customers in NY and the rest 1% in WA.Customers table used in Part I of this article is also used here.To observe Hash Warning, enable ‘Hash Warning’ in SQL Profiler under Events ‘Errors and Warnings’.

–Example provided by www.sqlworkshops.com

drop table CustomersState

go

create table CustomersState(CustomerID int primary key, Address char(200), State char(2))

go

insert into CustomersState(CustomerID, Address) select CustomerID, ‘Address’ from Customers

update CustomersState set State = ‘NY’ where CustomerID % 100 != 1

update CustomersState set State = ‘WA’ where CustomerID % 100 = 1

go

update statistics CustomersState with fullscan

go

Let’s create a stored procedure that joins customers with CustomersState table with a predicate on State.

–Example provided by www.sqlworkshops.com

create proc CustomersByState @State char(2) as

begin

declare @CustomerID int

select @CustomerID = e.CustomerID from Customers e

inner join CustomersState es on (e.CustomerID = es.CustomerID)

where es.State = @State

option (maxdop 1)

end

go

Let’s execute the stored procedure first with parameter value ‘WA’ – which will select 1% of data.

set statistics time on

go

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

The stored procedure took 294 ms to complete.

The stored procedure was granted 6704 KB based on 8000 rows being estimated.

The estimated number of rows, 8000 is similar to actual number of rows 8000 and hence the memory estimation should be ok.

There was no Hash Warning in SQL Profiler.To observe Hash Warning, enable ‘Hash Warning’ in SQL Profiler under Events ‘Errors and Warnings’.

Now let’s execute the stored procedure with parameter value ‘NY’ – which will select 99% of data.

-Example provided by www.sqlworkshops.com

exec CustomersByState‘NY’

go

The stored procedure took 2922 ms to complete.

The stored procedure was granted 6704 KB based on 8000 rows being estimated.

The estimated number of rows, 8000 is way different from the actual number of rows 792000 because the estimation is based on the first set of parameter value supplied to the stored procedure which is ‘WA’ in our case. This underestimation will lead to spill over tempdb, resulting in poor performance.

There was Hash Warning (Recursion) in SQL Profiler.To observe Hash Warning, enable ‘Hash Warning’ in SQL Profiler under Events ‘Errors and Warnings’.

Let’s recompile the stored procedure and then let’s first execute the stored procedure with parameter value ‘NY’.
In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts, www.sqlworkshops.com/webcasts for further details.

exec sp_recompileCustomersByState

go

–Example provided by www.sqlworkshops.com

exec CustomersByState‘NY’

go

Now the stored procedure took only 1046 ms instead of 2922 ms.

The stored procedure was granted 146752 KB of memory.

The estimated number of rows, 792000 is similar to actual number of rows of 792000. Better performance of this stored procedure execution is due to better estimation of memory and avoiding spill over tempdb.

There was no Hash Warning in SQL Profiler.

Now let’s execute the stored procedure with parameter value ‘WA’.

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

The stored procedure took 351 ms to complete, higher than the previous execution time of 294 ms.

This stored procedure was granted more memory (146752 KB) than necessary (6704 KB) based on parameter value ‘NY’ for estimation (792000 rows) instead of parameter value ‘WA’ for estimation (8000 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is ‘NY’ in this case. This overestimation leads to poor performance of this Hash Match operation, it might also affect the performance of other concurrently executing queries requiring memory and hence overestimation is not recommended.

The estimated number of rows, 792000 is much more than the actual number of rows of 8000.

Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined data range.Let’s recreate the stored procedure with recompile hint.

–Example provided by www.sqlworkshops.com

drop proc CustomersByState

go

create proc CustomersByState @State char(2) as

begin

declare @CustomerID int

select @CustomerID = e.CustomerID from Customers e

inner join CustomersState es on (e.CustomerID = es.CustomerID)

where es.State = @State

option (maxdop 1, recompile)

end

go

Let’s execute the stored procedure initially with parameter value ‘WA’ and then with parameter value ‘NY’.

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

exec CustomersByState‘NY’

go

The stored procedure took 297 ms and 1102 ms in line with previous optimal execution times.

The stored procedure with parameter value ‘WA’ has good estimation like before.

Estimated number of rows of 8000 is similar to actual number of rows of 8000.

The stored procedure with parameter value ‘NY’ also has good estimation and memory grant like before because the stored procedure was recompiled with current set of parameter values.

Estimated number of rows of 792000 is similar to actual number of rows of 792000.

The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit.

There was no Hash Warning in SQL Profiler.

Let’s recreate the stored procedure with optimize for hint of ‘NY’.

–Example provided by www.sqlworkshops.com

drop proc CustomersByState

go

create proc CustomersByState @State char(2) as

begin

declare @CustomerID int

select @CustomerID = e.CustomerID from Customers e

inner join CustomersState es on (e.CustomerID = es.CustomerID)

where es.State = @State

option (maxdop 1, optimize for (@State = ‘NY’))

end

go

Let’s execute the stored procedure initially with parameter value ‘WA’ and then with parameter value ‘NY’.

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

exec CustomersByState‘NY’

go

The stored procedure took 353 ms with parameter value ‘WA’, this is much slower than the optimal execution time of 294 ms we observed previously. This is because of overestimation of memory. The stored procedure with parameter value ‘NY’ has optimal execution time like before.

The stored procedure with parameter value ‘WA’ has overestimation of rows because of optimize for hint value of ‘NY’.

Unlike before, more memory was estimated to this stored procedure based on optimize for hint value ‘NY’.

The stored procedure with parameter value ‘NY’ has good estimation because of optimize for hint value of ‘NY’. Estimated number of rows of 792000 is similar to actual number of rows of 792000.

Optimal amount of memory was estimated to this stored procedure based on optimize for hint value ‘NY’.

There was no Hash Warning in SQL Profiler.

This article covers underestimation / overestimation of memory for Hash Match operation. Plan Caching and Query Memory Part I covers underestimation / overestimation for Sort. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case.

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.

Disclaimer and copyright information: This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to Ramesh Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. Previously published under / http://sqlblogcasts.com/blogs/sqlworkshops/archive/2011/02/16/plan-caching-and-query-memory-part-ii-hash-match-when-not-to-use-stored-procedure-or-other-plan-caching-mechanisms-like-sp-executesql-or-prepared-statement.aspx / http://www.sqlworkshops.com/plancachingandquerymemory2.htm and in Level 400 Webcasts http://www.sqlworkshops.com/webcasts.

 

Workspace Memory / Query Memory Tuning – RESOURCE_SEMAPHORE / IO_COMPLETION / SLEEP_TASK Waits

SQL Server is configured to use a lot of memory, but my query is slow and not using all the memory available and it is spilling the sort or the hash match operation to tempdb, how can you tune the configuration and the query?

Memory allocating queries request memory based on estimation (ideal memory) and query memory (workspace memory) availability, when they don’t get the right amount of memory they spill to tempdb and lead to performance issues. Previous articles discussed ways to make the estimation better; this article discusses ways to address query memory availability.

You can find the ideal amount of memory a query needs (when the query is executing) using sys.dm_exec_query_memory_grants. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union.

select * from sys.dm_exec_query_memory_grants
go

Column ideal_memory_kb indicates the ideal amount of memory the query needs. This is based on estimation, this might be incorrect for various reasons including out of date statistics, in some cases (more common that you might think) under estimation of memory by the optimizer even when the statistics are up to date (refer to article http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html) and due to plan caching (refer to article Plan Caching and Query Memory). Some might recommend tweaking the index/column statistics in an undocumented way to inflate the estimation for additional memory, I suggest following the MSSQL Tip http://www.mssqltips.com/sqlservertip/1955 and webcast 1 and 2 at www.sqlworkshops.com/webcasts to understand the issue and the recommendations.

To read additional articles I wrote click here. To watch the webcasts click here.

The ideal amount of memory a query needs is based on estimation. Requested memory is based on ideal memory and maximum available workspace memory. There is a possibility the estimation is correct (let’s say you have up to date statistics and/or fixed the under estimation issues by following the above articles) but the available workspace memory is not enough and hence ideal amount is low. The next question is: what is available workspace memory.

Available workspace memory, also known as query memory, is the amount of memory available for common memory allocating queries that perform Sort and do Hash Match operations. This is automatically calculated based on your system configuration. You can monitor the currently available workspace memory and maximum workspace memory by executing the command dbcc memorystatus and looking for ‘Available’ and ‘Current Max’ under ‘Query Memory Objects (default)’ and ‘Small Query Memory Objects (default)’. Or using Performance Monitor counters ‘Granted Workspace Memory (KB)’ and ‘Maximum Workspace Memory (KB)’ under object Memory Manager. ‘Maximum Workspace Memory (KB)’ is the sum of ‘Current Max’ of both ‘Query Memory Objects (default)’ and ‘Small Query Memory Objects (default)’, note ‘Current Max’ is in pages (* 8 = KB).

Here is a query to find the ‘Maximum Workspace Memory (KB)’ using sys.dm_os_performance_counters, ‘Maximum Workspace Memory (KB)’ can be up to 75% of ‘Target Server Memory (KB)’.

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script.

–Example provided by www.sqlworkshops.com
select (select cntr_value
        from sys.dm_os_performance_counters
        where object_name like ‘%Memory Manager%’ and counter_name like ‘Maximum Workspace Memory (KB)%’) as ‘Maximum Workspace Memory (KB)’,
    (select cntr_value
        from sys.dm_os_performance_counters
        where object_name like ‘%Memory Manager%’ and counter_name like ‘Target Server Memory (KB)%’) as ‘Target Server Memory (KB)’,
   (select cntr_value
        from sys.dm_os_performance_counters
        where object_name like ‘%Memory Manager%’ and counter_name like ‘Maximum Workspace Memory (KB)%’) * 100.0 /
    (select cntr_value
        from sys.dm_os_performance_counters
        where object_name like ‘%Memory Manager%’ and counter_name like ‘Target Server Memory (KB)%’)  as Ratio
go

In my server, when Target Server Memory is 4096MB, Maximum Workspace Memory is 3077MB, which is about 75% of Target Server Memory.

By default a query will not request more than 25% of this Maximum Workspace Memory in SQL Server 2008 and above and this Memory Grant 25% can be changed using Resource Governor Workload Group settings. With SQL Server 2005 and below, this is 20% and cannot be changed without the support of Microsoft (with a combination of undocumented trace flag and changes to the configuration parameters).

Why should you care about 25%? You might have a customer executing a heavy reporting query in the night and they want to use more than 25% of Maximum Workspace Memory for that single query. Let’s assume the customer configured 4096MB for their SQL Server instance (the Target Server Memory might be less than the configured memory if there is memory pressure on the server), let’s say their workspace memory is 3077MB (‘Query Memory Objects (default)’->‘Current Max’ = 381175 pages = 3049400KB + ‘Small Query Memory Objects (default)’ ->‘Current Max’ = 12800 pages = 102400KB; 3049400KB + 102400KB = 3077MB. And this customer’s query executing the report is limited to a maximum of 3049400KB / 4 = 744MB.

Let’s say the customer’s query executing the report needs 1,470MB of query memory (ideal memory) in order not to spill the sort to tempdb, but the query will request only 744MB. In case the customer can set the ‘request_max_memory_grant_percent’ (Memory Grant %) of the Workload Group setting to 50%, then the query can request up to 1,488MB and in this case the query will request the ideal memory it needs which is 1,470MB.

SQL Server has memory grant queues based on cost, if queries need more memory and there is not enough memory available in the queue, then the query will wait, you can get additional details using the command dbcc memorystatus. If a query requests huge amount of memory, but doesn’t utilize it (due to over estimation), this memory will be reserved and cannot be used by other queries, in some cases this will lead to unnecessary memory grant waits. So one has to be very careful not to overestimate (also described in the webcast www.sqlworkshops.com/webcasts) too much memory as it will affect concurrency. When the query waits for memory, the wait type will be ‘RESOURCE_SEMAPHORE’.

Let’s set ‘max server memory (MB)’ to 4GB.

exec sp_configure‘max server memory (MB)’, 4096
go
reconfigure
go

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script.

Scenario 1:
This query will underestimate memory due to optimizer issues and will also request less memory (744MB) due to the 25% Resource Governor Workload Group Memory Grant setting. The query will be slow spilling the sort to tempdb.

When the sort spills to tempdb the wait type will be ‘IO_COMPLETION’ but when the Hash Match operation spills to tempdb the wait type will be ‘SLEEP_TASK’.

We are using option (maxdop 1) to disable parallelism, to learn more about monitoring and tuning parallel query execution, refer to the webcast www.sqlworkshops.com/webcasts and article Parallel Sort and Merge Join – Watch out for unpredictability in performance.

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (maxdop 1)
go

Scenario 2:
This query will have better memory estimation due to the option clause with optimize for hint, but will still request less memory (744MB) due to the 25% Resource Governor Workload Group Memory Grant setting. The query will also be slow spilling the sort to tempdb.

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

Scenario 3:
This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,470MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will be fast with no spilling of sort to tempdb.

Please read the entire article and answer the challenge posted at the end of this article before changing the Resource Governor Workload Group Memory Grant setting in your production server as there are major disadvantages of changing the Resource Governor Workload Group Memory Grant setting, don’t do it without understanding the full picture.

alter workload group [default] with(request_max_memory_grant_percent=50)
go
alter
resource governor reconfigure
goset statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

Scenario 4:
Let’s execute 2 of this query concurrently in 2 sessions, with 25% Resource Governor Workload Group Memory Grant setting.

alter workload group [default] with(request_max_memory_grant_percent=25)
go
alter
resource governor reconfigure
go

Session 1:
Let’s execute this query in a loop and measure the performance of the query executed in session 2. This query will have better memory estimation due to the option clause with optimize for hint, but will still request less memory (744MB) due to the 25% Resource Governor Workload Group Memory Grant setting. The query will be spilling the sort to tempdb.

while 1=1
begin
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

end
go

Session 2:
Let’s execute this query few times. This query will have better memory estimation due to the option clause with optimize for hint, but will still request less memory (744MB) due to the 25% Resource Governor Workload Group Memory Grant setting. The query will be slow spilling the sort to tempdb. This query might take up to twice the amount of time to complete compared to Scenario 2, because 2 queries from 2 sessions are concurrently spilling to tempdb. There is no wait for memory grants (no RESOURCE_SEMAPHORE waits).

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

Scenario 5:
Let’s execute 2 of this query concurrently in 2 sessions, with 50% Resource Governor Workload Group Memory Grant setting.

alter workload group [default] with(request_max_memory_grant_percent=50)
go
alter
resource governor reconfigure
go

Session 1:
Let’s execute this query in a loop and measure the performance of the query executed in session 2. This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,470MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will have no spilling of sort to tempdb.

while 1=1
begin
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

end
go

Session 2:
Let’s execute this query few times. This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,470MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will be fast with no spilling of sort to tempdb. But this query might take up to twice the amount of time to complete compared to Scenario 3, because 2 queries from 2 sessions are concurrently requesting nearly 50% of workspace memory. There is wait for memory grants (RESOURCE_SEMAPHORE waits), SQL Server grants memory to one query at a time as the memory in the grant queue is not enough to grant simultaneously the requested memory to both queries.

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

Overall performance of scenario 5 is better than scenario 4 even though there was memory grant waits. In this case waiting for memory is far worse than spilling the sort to tempdb.

Scenario 6:
Let’s execute two queries with less memory requirement concurrently in two sessions (so memory can be granted for both queries simultaneously from the same memory grant queue), with 50% Resource Governor Workload Group Memory Grant setting.

Session 1:
Let’s execute this query in a loop and measure the performance of the query executed in session 2. This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,102MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will have no spilling of sort to tempdb.

while 1=1

begin
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 400000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 450000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

end
go

Session 2:
Let’s execute this query few times. This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,102MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will be fast with no spilling of sort to tempdb. There will be no wait for memory grants (RESOURCE_SEMAPHORE waits), SQL Server grants memory to both queries simultaneously as the memory in the grant queue is enough to grant the requested memory to both queries.

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 400000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 450000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

 Challenge:
There are major disadvantages of changing the Resource Governor Workload Group Memory Grant setting, so don’t make any changes without understanding the full picture. I will cover this is the next article, meanwhile if you can guess some reasons, write to me: contacts.

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.

Disclaimer and copyright information: This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to Ramesh Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. Previously published under / http://sqlblogcasts.com/blogs/sqlworkshops/archive/2011/10/14/workspace-memory-query-memory-tuning-resource-semaphore-io-completion-sleep-task-waits.aspxhttp://www.sqlworkshops.com/workspacememory.htm and in Level 400 Webcasts http://www.sqlworkshops.com/webcasts.

 

Plan Caching and Query Memory Part 1: When not to use stored procedure or other plan caching mechanisms like sp_executesql or prepared statement

The most common performance mistake SQL Server developers make:

SQL Server estimates memory requirement for queries at compilation time. This mechanism is fine for dynamic queries that need memory, but not for queries that cache the plan. With dynamic queries the plan is not reused for different set of parameters values / predicates and hence different amount of memory can be estimated based on different set of parameter values / predicates. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. This article covers Sort with examples. It is recommended to read Plan Caching and Query Memory Part II after this article which covers Hash Match operations.

When the plan is cached by using stored procedure or other plan caching mechanisms like sp_executesql or prepared statement, SQL Server estimates memory requirement based on first set of execution parameters. Later when the same stored procedure is called with different set of parameter values, the same amount of memory is used to execute the stored procedure. This might lead to underestimation / overestimation of memory on plan reuse, overestimation of memory might not be a noticeable issue for Sort operations, but underestimation of memory will lead to spill over tempdb resulting in poor performance.

This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

To read additional articles I wrote click here. To watch the webcasts click here.

In most cases it is cheaper to pay for the compilation cost of dynamic queries than huge cost for spill over tempdb, unless memory requirement for a stored procedure does not change significantly based on predicates.

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Most of these concepts are also covered in our webcasts: www.sqlworkshops.com/webcasts

Enough theory, let’s see an example where we sort initially 1 month of data and then use the stored procedure to sort 6 months of data.

Let’s create a stored procedure that sorts customers by name within certain date range.

–Example provided by www.sqlworkshops.com

create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as

begin

declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime

select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c

where c.CreationDate between @CreationDateFrom and @CreationDateTo

order by c.CustomerName

option (maxdop 1)

end

go

Let’s execute the stored procedure initially with 1 month date range.

setstatistics time on

go

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-31′

go

The stored procedure took 48 ms to complete.

The stored procedure was granted 6656 KB based on 43199.9 rows being estimated.

The estimated number of rows, 43199.9 is similar to actual number of rows 43200 and hence the memory estimation should be ok.

There was no Sort Warnings in SQL Profiler.

Now let’s execute the stored procedure with 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

The stored procedure took 679 ms to complete.

The stored procedure was granted 6656 KB based on 43199.9 rows being estimated.

The estimated number of rows, 43199.9 is way different from the actual number of rows 259200 because the estimation is based on the first set of parameter value supplied to the stored procedure which is 1 month in our case. This underestimation will lead to sort spill over tempdb, resulting in poor performance.

There was Sort Warnings in SQL Profiler.

To monitor the amount of data written and read from tempdb, one can execute select num_of_bytes_written, num_of_bytes_read from sys.dm_io_virtual_file_stats(2, NULL) before and after the stored procedure execution, for additional information refer to the webcast: www.sqlworkshops.com/webcasts.

Let’s recompile the stored procedure and then let’s first execute the stored procedure with 6 month date range.

In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts for further details.

exec sp_recompileCustomersByCreationDate

go

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

Now the stored procedure took only 294 ms instead of 679 ms.

The stored procedure was granted 26832 KB of memory.

The estimated number of rows, 259200 is similar to actual number of rows of 259200. Better performance of this stored procedure is due to better estimation of memory and avoiding sort spill over tempdb.

There was no Sort Warnings in SQL Profiler.

Now let’s execute the stored procedure with 1 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-31′

go

The stored procedure took 49 ms to complete, similar to our very first stored procedure execution.


This stored procedure was granted more memory (26832 KB) than necessary memory (6656 KB) based on 6 months of data estimation (259200 rows) instead of 1 month of data estimation (43199.9 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is 6 months in this case. This overestimation did not affect performance, but it might affect performance of other concurrent queries requiring memory and hence overestimation is not recommended. This overestimation might affect performance Hash Match operations, refer to article Plan Caching and Query Memory Part II for further details.

Let’s recompile the stored procedure and then let’s first execute the stored procedure with 2 day date range.

exec sp_recompileCustomersByCreationDate

go

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-02′

go

The stored procedure took 1 ms.

The stored procedure was granted 1024 KB based on 1440 rows being estimated.

There was no Sort Warnings in SQL Profiler.

Now let’s execute the stored procedure with 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

The stored procedure took 955 ms to complete, way higher than 679 ms or 294ms we noticed before.

The stored procedure was granted 1024 KB based on 1440 rows being estimated. But we noticed in the past this stored procedure with 6 month date range needed 26832 KB of memory to execute optimally without spill over tempdb. This is clear underestimation of memory and the reason for the very poor performance.

There was Sort Warnings in SQL Profiler. Unlike before this was a Multiple pass sort instead of Single pass sort. This occurs when granted memory is too low.

Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined date range.

Let’s recreate the stored procedure with recompile hint.

–Example provided by www.sqlworkshops.com

drop proc CustomersByCreationDate

go

create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as

begin

declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime

select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c

where c.CreationDate between @CreationDateFrom and @CreationDateTo

order by c.CustomerName

option (maxdop 1, recompile)

end

go

Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-30′

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

The stored procedure took 48ms and 291 ms in line with previous optimal execution times.

The stored procedure with 1 month date range has good estimation like before.

The stored procedure with 6 month date range also has good estimation and memory grant like before because the query was recompiled with current set of parameter values.

The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit.

 

Let’s recreate the stored procedure with optimize for hint of 6 month date range.

–Example provided by www.sqlworkshops.com

drop proc CustomersByCreationDate

go

create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as

begin

declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime

select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c

where c.CreationDate between @CreationDateFrom and @CreationDateTo

order by c.CustomerName

option (maxdop 1, optimize for (@CreationDateFrom = ’2001-01-01′, @CreationDateTo =’2001-06-30′))

end

go

Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-30′

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

The stored procedure took 48ms and 291 ms in line with previous optimal execution times.

The stored procedure with 1 month date range has overestimation of rows and memory. This is because we provided hint to optimize for 6 months of data.

The stored procedure with 6 month date range has good estimation and memory grant because we provided hint to optimize for 6 months of data.

Let’s execute the stored procedure with 12 month date range using the currently cashed plan for 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-12-31′

go

The stored procedure took 1138 ms to complete.

2592000 rows were estimated based on optimize for hint value for 6 month date range. Actual number of rows is 524160 due to 12 month date range.

The stored procedure was granted enough memory to sort 6 month date range and not 12 month date range, so there will be spill over tempdb.


There was Sort Warnings in SQL Profiler.

As we see above, optimize for hint cannot guarantee enough memory and optimal performance compared to recompile hint.

This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case.

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.

Disclaimer and copyright information: This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to Ramesh Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. Previously published under / http://sqlblogcasts.com/blogs/sqlworkshops/archive/2011/02/08/plan-caching-and-query-memory-part-i-when-not-to-use-stored-procedure-or-other-plan-caching-mechanisms-like-sp-executesql-or-prepared-statement-he-most-common-performance-mistake-sql-server-developers-make.aspx / http://www.sqlworkshops.com/plancachingandquerymemory.htm and in Level 400 Webcasts http://www.sqlworkshops.com/webcasts.

 

SQL Server IO handling mechanism can be severely affected by high CPU usage

Are you using SSD or SAN / NAS based storage solution and sporadically observe SQL Server experiencing high IO wait times or from time to time your DAS / HDD becomes very slow according to SQL Server statistics? Read on… I need your help to up vote my connect item – https://connect.microsoft.com/SQLServer/feedback/details/744650/sql-server-io-handling-mechanism-can-be-severely-affected-by-high-cpu-usage. Instead of taking few seconds, queries could take minutes/hours to complete when CPU is busy.

To read additional articles I wrote click here. To watch the webcasts click here.

In SQL Server when a query / request needs to read data that is not in data cache or when the request has to write to disk, like transaction log records, the request / task will queue up the IO operation and wait for it to complete (task in suspended state, this wait time is the resource wait time). When the IO operation is complete, the task will be queued to run on the CPU. If the CPU is busy executing other tasks, this task will wait (task in runnable state) until other tasks in the queue either complete or get suspended due to waits or exhaust their quantum of 4ms (this is the signal wait time, which along with resource wait time will increase the overall wait time). When the CPU becomes free, the task will finally be run on the CPU (task in running state).

The signal wait time can be up to 4ms per runnable task, this is by design. So if a CPU has 5 runnable tasks in the queue, then this query after the resource becomes available might wait up to a maximum of 5 X 4ms = 20ms in the runnable state (normally less as other tasks might not use the full quantum).

In case the CPU usage is high, let’s say many CPU intensive queries are running on the instance, there is a possibility that the IO operations that are completed at the Hardware and Operating System level are not yet processed by SQL Server, keeping the task in the resource wait state for longer than necessary. In case of an SSD, the IO operation might even complete in less than a millisecond, but it might take SQL Server 100s of milliseconds, for instance, to process the completed IO operation. For example, let’s say you have a user inserting 500 rows in individual transactions. When the transaction log is on an SSD or battery backed up controller that has write cache enabled, all of these inserts will complete in 100 to 200ms. With a CPU intensive parallel query executing across all CPU cores, the same inserts might take minutes to complete. WRITELOG wait time will be very high in this case (both under sys.dm_io_virtual_file_stats and sys.dm_os_wait_stats). In addition you will notice a large number of WAITELOG waits since log records are written by LOG WRITER and hence very high signal_wait_time_ms leading to more query delays. However, Performance Monitor Counter, PhysicalDisk, Avg. Disk sec/Write will report very low latency times.

Such delayed IO handling also occurs to read operations with artificially very high PAGEIOLATCH_SH wait time (with number of PAGEIOLATCH_SH waits remaining the same). This problem will manifest more and more as customers start using SSD based storage for SQL Server, since they drive the CPU usage to the limits with faster IOs. We have a few workarounds for specific scenarios, but we think Microsoft should resolve this issue at the product level. We have a connect item open – https://connect.microsoft.com/SQLServer/feedback/details/744650/sql-server-io-handling-mechanism-can-be-severely-affected-by-high-cpu-usage - (with example scripts) to reproduce this behavior, please up vote the item so the issue will be addressed by the SQL Server product team soon.