Plan Caching and Query Memory Part 2: When not to use stored procedure or other plan caching mechanisms like sp_executesql or prepared statement

SQL Server estimates Memory requirement at compile time, when stored procedure or other plan caching mechanisms like sp_executesql or prepared statement are used, the memory requirement is estimated based on first set of execution parameters. This is a common reason for spill over tempdb and hence poor performance. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. This article covers Hash Match operations with examples. It is recommended to read Plan Caching and Query Memory Part I before this article which covers an introduction and Query memory for Sort. In most cases it is cheaper to pay for the compilation cost of dynamic queries than huge cost for spill over tempdb, unless memory requirement for a query does not change significantly based on predicates.

This article covers underestimation / overestimation of memory for Hash Match operation. Plan Caching and Query Memory Part I covers underestimation / overestimation for Sort. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

To read additional articles I wrote click here. To watch the webcasts click here.

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Most of these concepts are also covered in our webcasts: www.sqlworkshops.com/webcasts

Let’s create a Customer’s State table that has 99% of customers in NY and the rest 1% in WA.Customers table used in Part I of this article is also used here.To observe Hash Warning, enable ‘Hash Warning’ in SQL Profiler under Events ‘Errors and Warnings’.

–Example provided by www.sqlworkshops.com

drop table CustomersState

go

create table CustomersState(CustomerID int primary key, Address char(200), State char(2))

go

insert into CustomersState(CustomerID, Address) select CustomerID, ‘Address’ from Customers

update CustomersState set State = ‘NY’ where CustomerID % 100 != 1

update CustomersState set State = ‘WA’ where CustomerID % 100 = 1

go

update statistics CustomersState with fullscan

go

Let’s create a stored procedure that joins customers with CustomersState table with a predicate on State.

–Example provided by www.sqlworkshops.com

create proc CustomersByState @State char(2) as

begin

declare @CustomerID int

select @CustomerID = e.CustomerID from Customers e

inner join CustomersState es on (e.CustomerID = es.CustomerID)

where es.State = @State

option (maxdop 1)

end

go

Let’s execute the stored procedure first with parameter value ‘WA’ – which will select 1% of data.

set statistics time on

go

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

The stored procedure took 294 ms to complete.

The stored procedure was granted 6704 KB based on 8000 rows being estimated.

The estimated number of rows, 8000 is similar to actual number of rows 8000 and hence the memory estimation should be ok.

There was no Hash Warning in SQL Profiler.To observe Hash Warning, enable ‘Hash Warning’ in SQL Profiler under Events ‘Errors and Warnings’.

Now let’s execute the stored procedure with parameter value ‘NY’ – which will select 99% of data.

-Example provided by www.sqlworkshops.com

exec CustomersByState‘NY’

go

The stored procedure took 2922 ms to complete.

The stored procedure was granted 6704 KB based on 8000 rows being estimated.

The estimated number of rows, 8000 is way different from the actual number of rows 792000 because the estimation is based on the first set of parameter value supplied to the stored procedure which is ‘WA’ in our case. This underestimation will lead to spill over tempdb, resulting in poor performance.

There was Hash Warning (Recursion) in SQL Profiler.To observe Hash Warning, enable ‘Hash Warning’ in SQL Profiler under Events ‘Errors and Warnings’.

Let’s recompile the stored procedure and then let’s first execute the stored procedure with parameter value ‘NY’.
In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts, www.sqlworkshops.com/webcasts for further details.

exec sp_recompileCustomersByState

go

–Example provided by www.sqlworkshops.com

exec CustomersByState‘NY’

go

Now the stored procedure took only 1046 ms instead of 2922 ms.

The stored procedure was granted 146752 KB of memory.

The estimated number of rows, 792000 is similar to actual number of rows of 792000. Better performance of this stored procedure execution is due to better estimation of memory and avoiding spill over tempdb.

There was no Hash Warning in SQL Profiler.

Now let’s execute the stored procedure with parameter value ‘WA’.

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

The stored procedure took 351 ms to complete, higher than the previous execution time of 294 ms.

This stored procedure was granted more memory (146752 KB) than necessary (6704 KB) based on parameter value ‘NY’ for estimation (792000 rows) instead of parameter value ‘WA’ for estimation (8000 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is ‘NY’ in this case. This overestimation leads to poor performance of this Hash Match operation, it might also affect the performance of other concurrently executing queries requiring memory and hence overestimation is not recommended.

The estimated number of rows, 792000 is much more than the actual number of rows of 8000.

Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined data range.Let’s recreate the stored procedure with recompile hint.

–Example provided by www.sqlworkshops.com

drop proc CustomersByState

go

create proc CustomersByState @State char(2) as

begin

declare @CustomerID int

select @CustomerID = e.CustomerID from Customers e

inner join CustomersState es on (e.CustomerID = es.CustomerID)

where es.State = @State

option (maxdop 1, recompile)

end

go

Let’s execute the stored procedure initially with parameter value ‘WA’ and then with parameter value ‘NY’.

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

exec CustomersByState‘NY’

go

The stored procedure took 297 ms and 1102 ms in line with previous optimal execution times.

The stored procedure with parameter value ‘WA’ has good estimation like before.

Estimated number of rows of 8000 is similar to actual number of rows of 8000.

The stored procedure with parameter value ‘NY’ also has good estimation and memory grant like before because the stored procedure was recompiled with current set of parameter values.

Estimated number of rows of 792000 is similar to actual number of rows of 792000.

The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit.

There was no Hash Warning in SQL Profiler.

Let’s recreate the stored procedure with optimize for hint of ‘NY’.

–Example provided by www.sqlworkshops.com

drop proc CustomersByState

go

create proc CustomersByState @State char(2) as

begin

declare @CustomerID int

select @CustomerID = e.CustomerID from Customers e

inner join CustomersState es on (e.CustomerID = es.CustomerID)

where es.State = @State

option (maxdop 1, optimize for (@State = ‘NY’))

end

go

Let’s execute the stored procedure initially with parameter value ‘WA’ and then with parameter value ‘NY’.

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

exec CustomersByState‘NY’

go

The stored procedure took 353 ms with parameter value ‘WA’, this is much slower than the optimal execution time of 294 ms we observed previously. This is because of overestimation of memory. The stored procedure with parameter value ‘NY’ has optimal execution time like before.

The stored procedure with parameter value ‘WA’ has overestimation of rows because of optimize for hint value of ‘NY’.

Unlike before, more memory was estimated to this stored procedure based on optimize for hint value ‘NY’.

The stored procedure with parameter value ‘NY’ has good estimation because of optimize for hint value of ‘NY’. Estimated number of rows of 792000 is similar to actual number of rows of 792000.

Optimal amount of memory was estimated to this stored procedure based on optimize for hint value ‘NY’.

There was no Hash Warning in SQL Profiler.

This article covers underestimation / overestimation of memory for Hash Match operation. Plan Caching and Query Memory Part I covers underestimation / overestimation for Sort. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case.

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.

Disclaimer and copyright information: This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to Ramesh Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. Previously published under / http://sqlblogcasts.com/blogs/sqlworkshops/archive/2011/02/16/plan-caching-and-query-memory-part-ii-hash-match-when-not-to-use-stored-procedure-or-other-plan-caching-mechanisms-like-sp-executesql-or-prepared-statement.aspx / http://www.sqlworkshops.com/plancachingandquerymemory2.htm and in Level 400 Webcasts http://www.sqlworkshops.com/webcasts.

 

Workspace Memory / Query Memory Tuning – RESOURCE_SEMAPHORE / IO_COMPLETION / SLEEP_TASK Waits

SQL Server is configured to use a lot of memory, but my query is slow and not using all the memory available and it is spilling the sort or the hash match operation to tempdb, how can you tune the configuration and the query?

Memory allocating queries request memory based on estimation (ideal memory) and query memory (workspace memory) availability, when they don’t get the right amount of memory they spill to tempdb and lead to performance issues. Previous articles discussed ways to make the estimation better; this article discusses ways to address query memory availability.

You can find the ideal amount of memory a query needs (when the query is executing) using sys.dm_exec_query_memory_grants. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union.

select * from sys.dm_exec_query_memory_grants
go

Column ideal_memory_kb indicates the ideal amount of memory the query needs. This is based on estimation, this might be incorrect for various reasons including out of date statistics, in some cases (more common that you might think) under estimation of memory by the optimizer even when the statistics are up to date (refer to article http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html) and due to plan caching (refer to article Plan Caching and Query Memory). Some might recommend tweaking the index/column statistics in an undocumented way to inflate the estimation for additional memory, I suggest following the MSSQL Tip http://www.mssqltips.com/sqlservertip/1955 and webcast 1 and 2 at www.sqlworkshops.com/webcasts to understand the issue and the recommendations.

To read additional articles I wrote click here. To watch the webcasts click here.

The ideal amount of memory a query needs is based on estimation. Requested memory is based on ideal memory and maximum available workspace memory. There is a possibility the estimation is correct (let’s say you have up to date statistics and/or fixed the under estimation issues by following the above articles) but the available workspace memory is not enough and hence ideal amount is low. The next question is: what is available workspace memory.

Available workspace memory, also known as query memory, is the amount of memory available for common memory allocating queries that perform Sort and do Hash Match operations. This is automatically calculated based on your system configuration. You can monitor the currently available workspace memory and maximum workspace memory by executing the command dbcc memorystatus and looking for ‘Available’ and ‘Current Max’ under ‘Query Memory Objects (default)’ and ‘Small Query Memory Objects (default)’. Or using Performance Monitor counters ‘Granted Workspace Memory (KB)’ and ‘Maximum Workspace Memory (KB)’ under object Memory Manager. ‘Maximum Workspace Memory (KB)’ is the sum of ‘Current Max’ of both ‘Query Memory Objects (default)’ and ‘Small Query Memory Objects (default)’, note ‘Current Max’ is in pages (* 8 = KB).

Here is a query to find the ‘Maximum Workspace Memory (KB)’ using sys.dm_os_performance_counters, ‘Maximum Workspace Memory (KB)’ can be up to 75% of ‘Target Server Memory (KB)’.

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script.

–Example provided by www.sqlworkshops.com
select (select cntr_value
        from sys.dm_os_performance_counters
        where object_name like ‘%Memory Manager%’ and counter_name like ‘Maximum Workspace Memory (KB)%’) as ‘Maximum Workspace Memory (KB)’,
    (select cntr_value
        from sys.dm_os_performance_counters
        where object_name like ‘%Memory Manager%’ and counter_name like ‘Target Server Memory (KB)%’) as ‘Target Server Memory (KB)’,
   (select cntr_value
        from sys.dm_os_performance_counters
        where object_name like ‘%Memory Manager%’ and counter_name like ‘Maximum Workspace Memory (KB)%’) * 100.0 /
    (select cntr_value
        from sys.dm_os_performance_counters
        where object_name like ‘%Memory Manager%’ and counter_name like ‘Target Server Memory (KB)%’)  as Ratio
go

In my server, when Target Server Memory is 4096MB, Maximum Workspace Memory is 3077MB, which is about 75% of Target Server Memory.

By default a query will not request more than 25% of this Maximum Workspace Memory in SQL Server 2008 and above and this Memory Grant 25% can be changed using Resource Governor Workload Group settings. With SQL Server 2005 and below, this is 20% and cannot be changed without the support of Microsoft (with a combination of undocumented trace flag and changes to the configuration parameters).

Why should you care about 25%? You might have a customer executing a heavy reporting query in the night and they want to use more than 25% of Maximum Workspace Memory for that single query. Let’s assume the customer configured 4096MB for their SQL Server instance (the Target Server Memory might be less than the configured memory if there is memory pressure on the server), let’s say their workspace memory is 3077MB (‘Query Memory Objects (default)’->‘Current Max’ = 381175 pages = 3049400KB + ‘Small Query Memory Objects (default)’ ->‘Current Max’ = 12800 pages = 102400KB; 3049400KB + 102400KB = 3077MB. And this customer’s query executing the report is limited to a maximum of 3049400KB / 4 = 744MB.

Let’s say the customer’s query executing the report needs 1,470MB of query memory (ideal memory) in order not to spill the sort to tempdb, but the query will request only 744MB. In case the customer can set the ‘request_max_memory_grant_percent’ (Memory Grant %) of the Workload Group setting to 50%, then the query can request up to 1,488MB and in this case the query will request the ideal memory it needs which is 1,470MB.

SQL Server has memory grant queues based on cost, if queries need more memory and there is not enough memory available in the queue, then the query will wait, you can get additional details using the command dbcc memorystatus. If a query requests huge amount of memory, but doesn’t utilize it (due to over estimation), this memory will be reserved and cannot be used by other queries, in some cases this will lead to unnecessary memory grant waits. So one has to be very careful not to overestimate (also described in the webcast www.sqlworkshops.com/webcasts) too much memory as it will affect concurrency. When the query waits for memory, the wait type will be ‘RESOURCE_SEMAPHORE’.

Let’s set ‘max server memory (MB)’ to 4GB.

exec sp_configure‘max server memory (MB)’, 4096
go
reconfigure
go

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script.

Scenario 1:
This query will underestimate memory due to optimizer issues and will also request less memory (744MB) due to the 25% Resource Governor Workload Group Memory Grant setting. The query will be slow spilling the sort to tempdb.

When the sort spills to tempdb the wait type will be ‘IO_COMPLETION’ but when the Hash Match operation spills to tempdb the wait type will be ‘SLEEP_TASK’.

We are using option (maxdop 1) to disable parallelism, to learn more about monitoring and tuning parallel query execution, refer to the webcast www.sqlworkshops.com/webcasts and article Parallel Sort and Merge Join – Watch out for unpredictability in performance.

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (maxdop 1)
go

Scenario 2:
This query will have better memory estimation due to the option clause with optimize for hint, but will still request less memory (744MB) due to the 25% Resource Governor Workload Group Memory Grant setting. The query will also be slow spilling the sort to tempdb.

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

Scenario 3:
This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,470MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will be fast with no spilling of sort to tempdb.

Please read the entire article and answer the challenge posted at the end of this article before changing the Resource Governor Workload Group Memory Grant setting in your production server as there are major disadvantages of changing the Resource Governor Workload Group Memory Grant setting, don’t do it without understanding the full picture.

alter workload group [default] with(request_max_memory_grant_percent=50)
go
alter
resource governor reconfigure
goset statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

Scenario 4:
Let’s execute 2 of this query concurrently in 2 sessions, with 25% Resource Governor Workload Group Memory Grant setting.

alter workload group [default] with(request_max_memory_grant_percent=25)
go
alter
resource governor reconfigure
go

Session 1:
Let’s execute this query in a loop and measure the performance of the query executed in session 2. This query will have better memory estimation due to the option clause with optimize for hint, but will still request less memory (744MB) due to the 25% Resource Governor Workload Group Memory Grant setting. The query will be spilling the sort to tempdb.

while 1=1
begin
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

end
go

Session 2:
Let’s execute this query few times. This query will have better memory estimation due to the option clause with optimize for hint, but will still request less memory (744MB) due to the 25% Resource Governor Workload Group Memory Grant setting. The query will be slow spilling the sort to tempdb. This query might take up to twice the amount of time to complete compared to Scenario 2, because 2 queries from 2 sessions are concurrently spilling to tempdb. There is no wait for memory grants (no RESOURCE_SEMAPHORE waits).

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

Scenario 5:
Let’s execute 2 of this query concurrently in 2 sessions, with 50% Resource Governor Workload Group Memory Grant setting.

alter workload group [default] with(request_max_memory_grant_percent=50)
go
alter
resource governor reconfigure
go

Session 1:
Let’s execute this query in a loop and measure the performance of the query executed in session 2. This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,470MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will have no spilling of sort to tempdb.

while 1=1
begin
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

end
go

Session 2:
Let’s execute this query few times. This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,470MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will be fast with no spilling of sort to tempdb. But this query might take up to twice the amount of time to complete compared to Scenario 3, because 2 queries from 2 sessions are concurrently requesting nearly 50% of workspace memory. There is wait for memory grants (RESOURCE_SEMAPHORE waits), SQL Server grants memory to one query at a time as the memory in the grant queue is not enough to grant simultaneously the requested memory to both queries.

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 500000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 600000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

Overall performance of scenario 5 is better than scenario 4 even though there was memory grant waits. In this case waiting for memory is far worse than spilling the sort to tempdb.

Scenario 6:
Let’s execute two queries with less memory requirement concurrently in two sessions (so memory can be granted for both queries simultaneously from the same memory grant queue), with 50% Resource Governor Workload Group Memory Grant setting.

Session 1:
Let’s execute this query in a loop and measure the performance of the query executed in session 2. This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,102MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will have no spilling of sort to tempdb.

while 1=1

begin
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 400000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 450000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

end
go

Session 2:
Let’s execute this query few times. This query will have better memory estimation due to the option clause with optimize for hint and will also request enough memory (1,102MB) due to the 50% Resource Governor Workload Group Memory Grant setting.The query will be fast with no spilling of sort to tempdb. There will be no wait for memory grants (RESOURCE_SEMAPHORE waits), SQL Server grants memory to both queries simultaneously as the memory in the grant queue is enough to grant the requested memory to both queries.

set statistics time on
go
–Example provided by www.sqlworkshops.com
declare @c1 int, @c2 int, @c3 char(2000)
declare @i int
set @i = 400000
select @c1 = c1, @c2 = c2, @c3 = c3
    from tab7
    where c1 < @i
    order by c2
    option (optimize for (@i = 450000), maxdop 1)
– Option optimize for is used to inflate memory request
– without this the query will spill the sort to tempdb due
– to query optimizer under estimation of memory.
– One possibility is to inflate the number of rows,
– the other possibility is to inflate the row size
– For more information refer to webcasts 1 & 2
– at www.sqlworkshops.com/webcasts.

go

 Challenge:
There are major disadvantages of changing the Resource Governor Workload Group Memory Grant setting, so don’t make any changes without understanding the full picture. I will cover this is the next article, meanwhile if you can guess some reasons, write to me: contacts.

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.

Disclaimer and copyright information: This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to Ramesh Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. Previously published under / http://sqlblogcasts.com/blogs/sqlworkshops/archive/2011/10/14/workspace-memory-query-memory-tuning-resource-semaphore-io-completion-sleep-task-waits.aspxhttp://www.sqlworkshops.com/workspacememory.htm and in Level 400 Webcasts http://www.sqlworkshops.com/webcasts.

 

Plan Caching and Query Memory Part 1: When not to use stored procedure or other plan caching mechanisms like sp_executesql or prepared statement

The most common performance mistake SQL Server developers make:

SQL Server estimates memory requirement for queries at compilation time. This mechanism is fine for dynamic queries that need memory, but not for queries that cache the plan. With dynamic queries the plan is not reused for different set of parameters values / predicates and hence different amount of memory can be estimated based on different set of parameter values / predicates. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. This article covers Sort with examples. It is recommended to read Plan Caching and Query Memory Part II after this article which covers Hash Match operations.

When the plan is cached by using stored procedure or other plan caching mechanisms like sp_executesql or prepared statement, SQL Server estimates memory requirement based on first set of execution parameters. Later when the same stored procedure is called with different set of parameter values, the same amount of memory is used to execute the stored procedure. This might lead to underestimation / overestimation of memory on plan reuse, overestimation of memory might not be a noticeable issue for Sort operations, but underestimation of memory will lead to spill over tempdb resulting in poor performance.

This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

To read additional articles I wrote click here. To watch the webcasts click here.

In most cases it is cheaper to pay for the compilation cost of dynamic queries than huge cost for spill over tempdb, unless memory requirement for a stored procedure does not change significantly based on predicates.

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Most of these concepts are also covered in our webcasts: www.sqlworkshops.com/webcasts

Enough theory, let’s see an example where we sort initially 1 month of data and then use the stored procedure to sort 6 months of data.

Let’s create a stored procedure that sorts customers by name within certain date range.

–Example provided by www.sqlworkshops.com

create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as

begin

declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime

select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c

where c.CreationDate between @CreationDateFrom and @CreationDateTo

order by c.CustomerName

option (maxdop 1)

end

go

Let’s execute the stored procedure initially with 1 month date range.

setstatistics time on

go

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-31′

go

The stored procedure took 48 ms to complete.

The stored procedure was granted 6656 KB based on 43199.9 rows being estimated.

The estimated number of rows, 43199.9 is similar to actual number of rows 43200 and hence the memory estimation should be ok.

There was no Sort Warnings in SQL Profiler.

Now let’s execute the stored procedure with 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

The stored procedure took 679 ms to complete.

The stored procedure was granted 6656 KB based on 43199.9 rows being estimated.

The estimated number of rows, 43199.9 is way different from the actual number of rows 259200 because the estimation is based on the first set of parameter value supplied to the stored procedure which is 1 month in our case. This underestimation will lead to sort spill over tempdb, resulting in poor performance.

There was Sort Warnings in SQL Profiler.

To monitor the amount of data written and read from tempdb, one can execute select num_of_bytes_written, num_of_bytes_read from sys.dm_io_virtual_file_stats(2, NULL) before and after the stored procedure execution, for additional information refer to the webcast: www.sqlworkshops.com/webcasts.

Let’s recompile the stored procedure and then let’s first execute the stored procedure with 6 month date range.

In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts for further details.

exec sp_recompileCustomersByCreationDate

go

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

Now the stored procedure took only 294 ms instead of 679 ms.

The stored procedure was granted 26832 KB of memory.

The estimated number of rows, 259200 is similar to actual number of rows of 259200. Better performance of this stored procedure is due to better estimation of memory and avoiding sort spill over tempdb.

There was no Sort Warnings in SQL Profiler.

Now let’s execute the stored procedure with 1 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-31′

go

The stored procedure took 49 ms to complete, similar to our very first stored procedure execution.


This stored procedure was granted more memory (26832 KB) than necessary memory (6656 KB) based on 6 months of data estimation (259200 rows) instead of 1 month of data estimation (43199.9 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is 6 months in this case. This overestimation did not affect performance, but it might affect performance of other concurrent queries requiring memory and hence overestimation is not recommended. This overestimation might affect performance Hash Match operations, refer to article Plan Caching and Query Memory Part II for further details.

Let’s recompile the stored procedure and then let’s first execute the stored procedure with 2 day date range.

exec sp_recompileCustomersByCreationDate

go

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-02′

go

The stored procedure took 1 ms.

The stored procedure was granted 1024 KB based on 1440 rows being estimated.

There was no Sort Warnings in SQL Profiler.

Now let’s execute the stored procedure with 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

The stored procedure took 955 ms to complete, way higher than 679 ms or 294ms we noticed before.

The stored procedure was granted 1024 KB based on 1440 rows being estimated. But we noticed in the past this stored procedure with 6 month date range needed 26832 KB of memory to execute optimally without spill over tempdb. This is clear underestimation of memory and the reason for the very poor performance.

There was Sort Warnings in SQL Profiler. Unlike before this was a Multiple pass sort instead of Single pass sort. This occurs when granted memory is too low.

Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined date range.

Let’s recreate the stored procedure with recompile hint.

–Example provided by www.sqlworkshops.com

drop proc CustomersByCreationDate

go

create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as

begin

declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime

select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c

where c.CreationDate between @CreationDateFrom and @CreationDateTo

order by c.CustomerName

option (maxdop 1, recompile)

end

go

Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-30′

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

The stored procedure took 48ms and 291 ms in line with previous optimal execution times.

The stored procedure with 1 month date range has good estimation like before.

The stored procedure with 6 month date range also has good estimation and memory grant like before because the query was recompiled with current set of parameter values.

The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit.

 

Let’s recreate the stored procedure with optimize for hint of 6 month date range.

–Example provided by www.sqlworkshops.com

drop proc CustomersByCreationDate

go

create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as

begin

declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime

select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c

where c.CreationDate between @CreationDateFrom and @CreationDateTo

order by c.CustomerName

option (maxdop 1, optimize for (@CreationDateFrom = ’2001-01-01′, @CreationDateTo =’2001-06-30′))

end

go

Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-01-30′

exec CustomersByCreationDate’2001-01-01′, ’2001-06-30′

go

The stored procedure took 48ms and 291 ms in line with previous optimal execution times.

The stored procedure with 1 month date range has overestimation of rows and memory. This is because we provided hint to optimize for 6 months of data.

The stored procedure with 6 month date range has good estimation and memory grant because we provided hint to optimize for 6 months of data.

Let’s execute the stored procedure with 12 month date range using the currently cashed plan for 6 month date range.

–Example provided by www.sqlworkshops.com

exec CustomersByCreationDate’2001-01-01′, ’2001-12-31′

go

The stored procedure took 1138 ms to complete.

2592000 rows were estimated based on optimize for hint value for 6 month date range. Actual number of rows is 524160 due to 12 month date range.

The stored procedure was granted enough memory to sort 6 month date range and not 12 month date range, so there will be spill over tempdb.


There was Sort Warnings in SQL Profiler.

As we see above, optimize for hint cannot guarantee enough memory and optimal performance compared to recompile hint.

This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case.

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.

Disclaimer and copyright information: This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to Ramesh Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. Previously published under / http://sqlblogcasts.com/blogs/sqlworkshops/archive/2011/02/08/plan-caching-and-query-memory-part-i-when-not-to-use-stored-procedure-or-other-plan-caching-mechanisms-like-sp-executesql-or-prepared-statement-he-most-common-performance-mistake-sql-server-developers-make.aspx / http://www.sqlworkshops.com/plancachingandquerymemory.htm and in Level 400 Webcasts http://www.sqlworkshops.com/webcasts.

 

SQL Server IO handling mechanism can be severely affected by high CPU usage

Are you using SSD or SAN / NAS based storage solution and sporadically observe SQL Server experiencing high IO wait times or from time to time your DAS / HDD becomes very slow according to SQL Server statistics? Read on… I need your help to up vote my connect item – https://connect.microsoft.com/SQLServer/feedback/details/744650/sql-server-io-handling-mechanism-can-be-severely-affected-by-high-cpu-usage. Instead of taking few seconds, queries could take minutes/hours to complete when CPU is busy.

To read additional articles I wrote click here. To watch the webcasts click here.

In SQL Server when a query / request needs to read data that is not in data cache or when the request has to write to disk, like transaction log records, the request / task will queue up the IO operation and wait for it to complete (task in suspended state, this wait time is the resource wait time). When the IO operation is complete, the task will be queued to run on the CPU. If the CPU is busy executing other tasks, this task will wait (task in runnable state) until other tasks in the queue either complete or get suspended due to waits or exhaust their quantum of 4ms (this is the signal wait time, which along with resource wait time will increase the overall wait time). When the CPU becomes free, the task will finally be run on the CPU (task in running state).

The signal wait time can be up to 4ms per runnable task, this is by design. So if a CPU has 5 runnable tasks in the queue, then this query after the resource becomes available might wait up to a maximum of 5 X 4ms = 20ms in the runnable state (normally less as other tasks might not use the full quantum).

In case the CPU usage is high, let’s say many CPU intensive queries are running on the instance, there is a possibility that the IO operations that are completed at the Hardware and Operating System level are not yet processed by SQL Server, keeping the task in the resource wait state for longer than necessary. In case of an SSD, the IO operation might even complete in less than a millisecond, but it might take SQL Server 100s of milliseconds, for instance, to process the completed IO operation. For example, let’s say you have a user inserting 500 rows in individual transactions. When the transaction log is on an SSD or battery backed up controller that has write cache enabled, all of these inserts will complete in 100 to 200ms. With a CPU intensive parallel query executing across all CPU cores, the same inserts might take minutes to complete. WRITELOG wait time will be very high in this case (both under sys.dm_io_virtual_file_stats and sys.dm_os_wait_stats). In addition you will notice a large number of WAITELOG waits since log records are written by LOG WRITER and hence very high signal_wait_time_ms leading to more query delays. However, Performance Monitor Counter, PhysicalDisk, Avg. Disk sec/Write will report very low latency times.

Such delayed IO handling also occurs to read operations with artificially very high PAGEIOLATCH_SH wait time (with number of PAGEIOLATCH_SH waits remaining the same). This problem will manifest more and more as customers start using SSD based storage for SQL Server, since they drive the CPU usage to the limits with faster IOs. We have a few workarounds for specific scenarios, but we think Microsoft should resolve this issue at the product level. We have a connect item open – https://connect.microsoft.com/SQLServer/feedback/details/744650/sql-server-io-handling-mechanism-can-be-severely-affected-by-high-cpu-usage - (with example scripts) to reproduce this behavior, please up vote the item so the issue will be addressed by the SQL Server product team soon.

Prefetch – Querying at the speed of SAN

Prefetch is a mechanism with which SQL Server can fire up many I/O requests in parallel for Nested Loop join.

The SAN administrator says your data volume has a throughput capacity of 400MB/sec. But your long running query is waiting for I/Os (PAGEIOLATCH_SH) and Windows Performance Monitor shows your data volume is doing 4MB/sec. Where is the problem?

When SQL Server does a Nested Loop join, it might enable Prefetch. Prefetch is a functionality where SQL Server fires up many I/O requests in parallel (many outstanding I/Os). Prefetch will lead to better performance for Nested Loop join queries when there are lots of rows in the outer input table. 25 is the magic number. When SQL Server estimates less than 25 rows for the outer input table Prefetch will be disabled. And when more than 25 rows are estimated Prefetch will be enabled.

To read additional articles I wrote click here. To watch the webcasts click here.

This is a perfect design (yes, I am an ex-Microsoft employee!); SQL Server wants to avoid Prefetch for light weight queries, especially in an OLTP environment as too many queries doing Prefetch will hurt performance. I remember from my days at SQL Server Development team, OLTP benchmarks including latest TPC-E disables Prefetch with trace flag 8744. Write me an email and I will send you the link.

Why this is a problem: I observe this at many customers; plan is cached with Prefetch disabled because first execution resulted in less than 25 rows estimation for the outer input table. Then the plan is used for parameters that are resulting in more than 25 rows, actually 1000s of rows for the outer input table. In this case since SQL Server cached the plan, Prefetch is disabled and SQL Server is firing up 1 I/O at a time. This leads to poor utilization of SAN. Symptoms: Query is waiting for I/Os (PAGEIOLATCH_SH) for a long time (reading page after page), but disk queue length is never more than 1.

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script.

–Example provided by www.sqlworkshops.com
–Create procedure that pulls orders based on City
drop proc RegionalOrdersProc
go
create proc RegionalOrdersProc @City char(20)
as
begin
declare @OrderID int, @OrderDetails char(200)
select @OrderID = o.OrderID, @OrderDetails = o.OrderDetails
from RegionalOrders ao inner join Orders o on (o.OrderID = ao.OrderID)
where City = @City
end
go

set statistics time on
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter SmallCity1
exec RegionalOrdersProc ‘SmallCity1′
go

–Right click on Clustered Index Scan
–to look at the properties
–Estimated number of rows in the
–outer input table is less than 25

–Right click on Nested Loops
–to look at the properties
–Notice Prefetch is disabled (missing)

–Clear data cache
checkpoint
dbcc dropcleanbuffers
go

Think about a business running a billing job looping over thousands of their customers using stored procedure. If the first customer has placed less than 25 orders for the billing period then Prefetch will be disabled and every customer is limited to using 1 spindle at a time.

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter BigCity
–We are using cached plan
exec RegionalOrdersProc ‘BigCity’
go

–Estimated number of rows in the
–outer input table is less than 25

–Notice Prefetch is disabled (missing)

–Query execution time 7 seconds
–What was the execution time?

–Clear procedure cache
–to trigger a new optimization
dbcc freeproccache
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter SmallCity2
exec RegionalOrdersProc ‘SmallCity2′
go

–Clear data cache
checkpoint
dbcc dropcleanbuffers
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter SmallCity
–We are using cached plan
exec RegionalOrdersProc ‘BigCity’
go

–Estimated number of rows in the
–outer input table is greater than 25

–Notice Prefetch is enabled

–Query execution time 3 seconds
–On a SAN it was less than a second
–What was the execution time?

–Example provided by www.sqlworkshops.com
–You can fix the issue by using any of the following
–hints
–Create procedure that pulls orders based on City

drop proc RegionalOrdersProc
go
create proc RegionalOrdersProc @City char(20)
as
begin
declare @OrderID int, @OrderDetails char(200)
select @OrderID = o.OrderID, @OrderDetails = o.OrderDetails
from RegionalOrders ao inner join Orders o on (o.OrderID = ao.OrderID)
where City = @City
–Hinting optimizer to use SmallCity2 for estimation
option (optimize for (@City = ‘SmallCity2′))
–Hinting optimizer to estimate for the currnet parameters
–option (recompile)
–Hinting optimize not to use histogram rather
–density for estimation (average of all 3 cities)
–option (optimize for (@City UNKNOWN))
–option (optimize for UNKNOWN)
end
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter SmallCity1
exec RegionalOrdersProc ‘SmallCity1′
go

–Clear data cache
checkpoint
dbcc dropcleanbuffers
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter BigCity
–We are using cached plan
exec RegionalOrdersProc ‘BigCity’
go

–Notice Prefetch is enabled
–Estimated number of rows in the
–outer input table is greater than 25
–Query execution time 3 seconds
–On a SAN it was less than a second
–What was your execution time?

Some of you might think plan guides or plan freezing might solve this issue, well it won’t help here. The estimation from your plan guide plan will be ignored by the optimizer. Optimizer makes new estimation based on current set of parameter values (compile time parameter values).

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.

If you search for trace flag 8744 you will end up with a KB Article 920093 titled: Tuning options for SQL Server 2005 and SQL Server 2008 when running in high performance workloads. There it is explained: Trace flag 8744 disables pre-fetching for the Nested Loops operator (with caution). Even though the topic looks interesting ‘Tuning options for high performance workloads’, don’t try these in production. From my point of view this article is about documenting undocumented trace flags used in benchmarks, let’s leave it at that!

Disclaimer and copyright information:
This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners.
Copyright of this article belongs to Ramesh Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work.
This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.

Previously published under http://sqlblogcasts.com/blogs/sqlworkshops/archive/2010/10/19/prefetch-querying-at-the-speed-of-san.aspx / http://www.mssqltips.com/sqlservertip/2173/speed-up-sql-server-queries-with-prefetch / http://www.sqlworkshops.com/prefetch.htm and in Level 400 Webcasts http://www.sqlworkshops.com/webcasts.