Data Compression in SQL Server – Pros and Cons

SQL Server supports row and page compression on tables, indexes and partitions. This can lead to reduced I/O and better performance. However, it can also result in additional CPU usage in some cases, outweighing the benefits of data compression. SQL Server query optimizer does not cost the overhead of expanding the compressed data, which can lead to plan regression as shown in the video.

The below video demonstrates this with hands-on example. SQLTest Tool simulates the hands-on labs for you, no registration necessary. You can practice the online hands-on example while watching the video.

Check out http://www.sqlvideo.com for additional hands-on videos.

Subscribe to our newsletter: https://newsletter.sqlworkshops.com

SQL Server Command Timeout – Application Timeout – Attention

When you use ODBC or SqlClient to access data from SQL Server, by default the query will be cancelled if there is no response from the server within a certain period of time (30 seconds by default). ODBC or SqlClient will start a timer after sending the query to SQL Server and if there are no results from the server within 30 seconds, the query will be cancelled and a timeout error message will be sent to the application. In case a network packet (containing query results) was received within the 30 seconds, and if we are expecting additional network packets (additional rows that couldn’t fit in the first network packet), then the timer is restarted, if the next network packet is not received within this 30 second period, the query will be cancelled.

The common reasons for SQL Command timeout are non-optimal schema or inefficient queries that execute for a long period of time or that miss indexes or lock wait issues. It is important to tune the queries rather than increase the SQL Command timeout settings.

In the video an example is demonstrated where decreasing the network packet size eliminates the timeout, this demonstration is to understand how network packets are involved in resetting the timer and to develop better understanding of timeout. Changing the network packet size is not a solution to avoid timeout, tuning non-optimal schema or inefficient queries or adding indexes or avoiding lock waits are the proper solution.

To identify the command that leads to SQL Command timeout error, you can use Extended Events and monitor event ‘sqlserver.attention’. The extended event script used in the video is available at www.sqlvideo.com/xevents.

In SQL Server Management Studio, the SQL Command timeout can be changed using ‘Options’ in the connection dialog. By default, this is set to ‘0’, which means no timeout. When a query or stored procedure is executing, if you click the ‘Cancel’ button (the red square), the same attention will be sent to the SQL Server (like SQL Command Timeout). And this will in-turn generate ‘sqlserver.attention’ event in Extended Events.

In the below video you can see an example of the SQL Command timeout. SQLTest Tool simulates the hands-on labs for you, no registration necessary. You can practice the online hands-on example while watching the video.

Check out http://www.sqlvideo.com for additional hands-on videos.

Subscribe to our newsletter: https://newsletter.sqlworkshops.com

SQL Server 2012: Indirect Checkpoint (Target Recovery Time), get the real scoop

SQL Server 2012 brings a new feature called Indirect Checkpoint. You can read more about it here: http://msdn.microsoft.com/en-us/library/ms189573.aspx.

With Indirect Checkpoint, you get smaller and too many I/Os

Checkpoint normally writes large blocks of data to disk optimally in single write operation, up to 256KB, depending on the number of contiguous dirty pages in cache that needs to be flushed to disk. Once you turn on the indirect checkpoint feature by setting target recovery time to a non-zero value, the checkpoint writes will turn into single page writes, 8KB writes.

This will lead to too many I/Os. We all know few large I/Os are more efficient than many small I/Os. This may slow down overall write throughput and performance. If you are hosting your server on the cloud based virtual servers, this means, it will increase the cost as you pay by the number of I/O operations you incur.

Too many outstanding I/Os

Normally check point throttles the number of outstanding I/Os by setting threshold based on write latency in order not to overload the storage subsystem. But once you enable indirect checkpoint, SQL Server can queue up hundreds of thousands of I/Os. This will not only overload the storage subsystem, if users want to read data from the database, their I/O requests will be queued up way behind  the indirect checkpoint writes and they may experience very slow response time and timeouts.

You might also see such errors in the errorlog as large number of I/Os in the queue may take a long time to complete:

A time-out occurred while waiting for buffer latch — type 4, bp 0000000283508040, page 1:1080, stat 0x60d, database id: 16, allocation unit Id: 0/281474976710656, task 0x00000002321BC558 : 0, waittime 300 seconds, flags 0x1a, owning task 0x00000002321BC558. Not continuing to wait.

High CPU usage

When you enable indirect checkpoint, you might encounter high CPU usage. One of the CPUs that is performing the write might spin at 100%. This can be monitored by measuring DP_LIST spins as shown below, more than 4 billion spins were noticed by running the below test just once. Also the writers will use much more CPU.

select * from sys.dm_os_spinlock_stats where name = ‘DP_LIST’

You will notice the updates are much slower when indirect checkpoint is enabled.

In extreme cases when you try to restart the SQL Server service, it will take a long time to stop. Meanwhile you can neither connect nor start the service as the service is already in the process of stopping, flushing the dirty pages before shutdown.

You can use the below scripts to reproduce the scenario

use master

if  exists (select * from sys.databases where name = ‘sqlworkshops’)

drop database sqlworkshops

go

use master

create database sqlworkshops on primary (name = ‘sqlworkshops_1′, filename = ‘P:sqlworkshops_1.mdf’, size = 5GB)

log on (name = ‘sqlworkshops_log_1′, filename = ‘Q:sqlworkshops_log_1.ldf’, size = 64MB)

go

use master

alter database sqlworkshops set recovery simple

go

use sqlworkshops

if  exists (select * from sys.objects where name = ‘tab71′)

drop table tab71

go

use sqlworkshops

create table tab71 (c1 int primary key clustered, c2 int, c3 char(2000))

insert into tab71 with (tablock) select top 500000 row_number() over (order by c1.object_id, c2.o) as c1, 1 as c2, replicate(‘a’, 2000) as c3

from sys.all_columns c1, sys.all_columns c2

update statistics tab71 with fullscan

checkpoint

go

use sqlworkshops

if  exists (select * from sys.objects where name = ‘tab72′)

drop table tab72

go

use sqlworkshops

create table tab72 (c1 int primary key clustered, c2 int, c3 char(2000))

insert into tab72 with (tablock) select top 500000 row_number() over (order by c1.object_id) as c1, 1 as c2, replicate(‘a’, 2000) as c3

from sys.all_columns c1, sys.all_columns c2

update statistics tab72 with fullscan

checkpoint

go

 

–Session 1:

use sqlworkshops

update tab71 set c2 = 2

go

use sqlworkshops

update tab71 set c2 = 3

go

 

–Session 2 (optional):

use sqlworkshops

update tab72 set c2 = 2

go

use sqlworkshops

update tab72 set c2 = 3

go

 

–To monitor spins:

select * from sys.dm_os_wait_stats

where wait_type = ‘DIRTY_PAGE_POLL’

go

 

 

Stale statistics on a newly created temporary table in a stored procedure can lead to poor performance

When you create a temporary table you expect a new table with no past history (statistics based on past existence), this is not true if you have less than 6 updates to the temporary table. This might lead to poor performance of queries which are sensitive to the content of temporary tables.

I was optimizing SQL Server Performance at one of my customers who provides search functionality on their website. They use stored procedure with temporary table for the search. The performance of the search depended on who searched what in the past, option (recompile) by itself had no effect. Sometimes a simple search led to timeout because of non-optimal plan usage due to this behavior. This is not a plan caching issue rather temporary table statistics caching issue, which was part of the temporary object caching feature that was introduced in SQL Server 2005 and is also present in SQL Server 2008 and SQL Server 2012. In this customer case we implemented a workaround to avoid this issue (see below for example for workarounds).

When temporary tables are cached, the statistics are not newly created rather cached from the past and updated based on automatic update statistics threshold. Caching temporary tables/objects is good for performance, but caching stale statistics from the past is not optimal.

We can work around this issue by disabling temporary table caching by explicitly executing a DDL statement on the temporary table. One possibility is to execute an alter table statement, but this can lead to duplicate constraint name error on concurrent stored procedure execution. The other way to work around this is to create an index.

I think there might be many customers in such a situation without knowing that stale statistics are being cached along with temporary table leading to poor performance.

Ideal solution is to have more aggressive statistics update when the temporary table has less number of rows when temporary table caching is used. I will open a connect item to report this issue.

Meanwhile you can mitigate the issue by creating an index on the temporary table. You can monitor active temporary tables using Windows Server Performance Monitor counter: SQL Server: General Statistics->Active Temp Tables.

 The script to understand the issue and the workaround is listed below:

set nocount on

set statistics time off

set statistics io off

drop table tab7

go

create table tab7 (c1 int primary key clustered, c2 int, c3 char(200))

go

create index test on tab7(c2, c1, c3)

go

begin tran

declare @i int

set @i = 1

while @i <= 50000

begin

insert into tab7 values (@i, 1, ‘a’)

set @i = @i + 1

end

commit tran

go

insert into tab7 values (50001, 1, ‘a’)

go

checkpoint

go

drop proc test_slow

go

create proc test_slow @i int

as

begin

declare @j int

create table #temp1 (c1 int primary key)

insert into #temp1 (c1) select @i

select @j = t7.c1 from tab7 t7 inner join #temp1 t on (t7.c2 = t.c1)

end

go

dbcc dropcleanbuffers

set statistics time on

set statistics io on

go

–high reads as expected for parameter ’1′

exec test_slow 1

go

dbcc dropcleanbuffers

go

–high reads that are not expected for parameter ’2′

exec test_slow 2

go

drop proc test_with_recompile

go

create proc test_with_recompile @i int

as

begin

declare @j int

create table #temp1 (c1 int primary key)

insert into #temp1 (c1) select @i

select @j = t7.c1 from tab7 t7 inner join #temp1 t on (t7.c2 = t.c1)

option (recompile)

end

go

dbcc dropcleanbuffers

set statistics time on

set statistics io on

go

–high reads as expected for parameter ’1′

exec test_with_recompile 1

go

dbcc dropcleanbuffers

go

–high reads that are not expected for parameter ’2′

–low reads on 3rd execution as expected for parameter ’2′

exec test_with_recompile 2

go

drop proc test_with_alter_table_recompile

go

create proc test_with_alter_table_recompile @i int

as

begin

declare @j int

create table #temp1 (c1 int primary key)

–to avoid caching of temporary tables one can create a constraint

–but this might lead to duplicate constraint name error on concurrent usage

alter table #temp1 add constraint test123 unique(c1)

insert into #temp1 (c1) select @i

select @j = t7.c1 from tab7 t7 inner join #temp1 t on (t7.c2 = t.c1)

option (recompile)

end

go

dbcc dropcleanbuffers

set statistics time on

set statistics io on

go

–high reads as expected for parameter ’1′

exec test_with_alter_table_recompile 1

go

dbcc dropcleanbuffers

go

–low reads as expected for parameter ’2′

exec test_with_alter_table_recompile 2

go

drop proc test_with_index_recompile

go

create proc test_with_index_recompile @i int

as

begin

declare @j int

create table #temp1 (c1 int primary key)

–to avoid caching of temporary tables one can create an index

create index test on #temp1(c1)

insert into #temp1 (c1) select @i

select @j = t7.c1 from tab7 t7 inner join #temp1 t on (t7.c2 = t.c1)

option (recompile)

end

go

set statistics time on

set statistics io on

dbcc dropcleanbuffers

go

–high reads as expected for parameter ’1′

exec test_with_index_recompile 1

go

dbcc dropcleanbuffers

go

–low reads as expected for parameter ’2′

exec test_with_index_recompile 2

go

Plan Caching and Query Memory Part 2: When not to use stored procedure or other plan caching mechanisms like sp_executesql or prepared statement

SQL Server estimates Memory requirement at compile time, when stored procedure or other plan caching mechanisms like sp_executesql or prepared statement are used, the memory requirement is estimated based on first set of execution parameters. This is a common reason for spill over tempdb and hence poor performance. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union. This article covers Hash Match operations with examples. It is recommended to read Plan Caching and Query Memory Part I before this article which covers an introduction and Query memory for Sort. In most cases it is cheaper to pay for the compilation cost of dynamic queries than huge cost for spill over tempdb, unless memory requirement for a query does not change significantly based on predicates.

This article covers underestimation / overestimation of memory for Hash Match operation. Plan Caching and Query Memory Part I covers underestimation / overestimation for Sort. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

To read additional articles I wrote click here. To watch the webcasts click here.

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script. Most of these concepts are also covered in our webcasts: www.sqlworkshops.com/webcasts

Let’s create a Customer’s State table that has 99% of customers in NY and the rest 1% in WA.Customers table used in Part I of this article is also used here.To observe Hash Warning, enable ‘Hash Warning’ in SQL Profiler under Events ‘Errors and Warnings’.

–Example provided by www.sqlworkshops.com

drop table CustomersState

go

create table CustomersState(CustomerID int primary key, Address char(200), State char(2))

go

insert into CustomersState(CustomerID, Address) select CustomerID, ‘Address’ from Customers

update CustomersState set State = ‘NY’ where CustomerID % 100 != 1

update CustomersState set State = ‘WA’ where CustomerID % 100 = 1

go

update statistics CustomersState with fullscan

go

Let’s create a stored procedure that joins customers with CustomersState table with a predicate on State.

–Example provided by www.sqlworkshops.com

create proc CustomersByState @State char(2) as

begin

declare @CustomerID int

select @CustomerID = e.CustomerID from Customers e

inner join CustomersState es on (e.CustomerID = es.CustomerID)

where es.State = @State

option (maxdop 1)

end

go

Let’s execute the stored procedure first with parameter value ‘WA’ – which will select 1% of data.

set statistics time on

go

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

The stored procedure took 294 ms to complete.

The stored procedure was granted 6704 KB based on 8000 rows being estimated.

The estimated number of rows, 8000 is similar to actual number of rows 8000 and hence the memory estimation should be ok.

There was no Hash Warning in SQL Profiler.To observe Hash Warning, enable ‘Hash Warning’ in SQL Profiler under Events ‘Errors and Warnings’.

Now let’s execute the stored procedure with parameter value ‘NY’ – which will select 99% of data.

-Example provided by www.sqlworkshops.com

exec CustomersByState‘NY’

go

The stored procedure took 2922 ms to complete.

The stored procedure was granted 6704 KB based on 8000 rows being estimated.

The estimated number of rows, 8000 is way different from the actual number of rows 792000 because the estimation is based on the first set of parameter value supplied to the stored procedure which is ‘WA’ in our case. This underestimation will lead to spill over tempdb, resulting in poor performance.

There was Hash Warning (Recursion) in SQL Profiler.To observe Hash Warning, enable ‘Hash Warning’ in SQL Profiler under Events ‘Errors and Warnings’.

Let’s recompile the stored procedure and then let’s first execute the stored procedure with parameter value ‘NY’.
In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts, www.sqlworkshops.com/webcasts for further details.

exec sp_recompileCustomersByState

go

–Example provided by www.sqlworkshops.com

exec CustomersByState‘NY’

go

Now the stored procedure took only 1046 ms instead of 2922 ms.

The stored procedure was granted 146752 KB of memory.

The estimated number of rows, 792000 is similar to actual number of rows of 792000. Better performance of this stored procedure execution is due to better estimation of memory and avoiding spill over tempdb.

There was no Hash Warning in SQL Profiler.

Now let’s execute the stored procedure with parameter value ‘WA’.

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

The stored procedure took 351 ms to complete, higher than the previous execution time of 294 ms.

This stored procedure was granted more memory (146752 KB) than necessary (6704 KB) based on parameter value ‘NY’ for estimation (792000 rows) instead of parameter value ‘WA’ for estimation (8000 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is ‘NY’ in this case. This overestimation leads to poor performance of this Hash Match operation, it might also affect the performance of other concurrently executing queries requiring memory and hence overestimation is not recommended.

The estimated number of rows, 792000 is much more than the actual number of rows of 8000.

Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined data range.Let’s recreate the stored procedure with recompile hint.

–Example provided by www.sqlworkshops.com

drop proc CustomersByState

go

create proc CustomersByState @State char(2) as

begin

declare @CustomerID int

select @CustomerID = e.CustomerID from Customers e

inner join CustomersState es on (e.CustomerID = es.CustomerID)

where es.State = @State

option (maxdop 1, recompile)

end

go

Let’s execute the stored procedure initially with parameter value ‘WA’ and then with parameter value ‘NY’.

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

exec CustomersByState‘NY’

go

The stored procedure took 297 ms and 1102 ms in line with previous optimal execution times.

The stored procedure with parameter value ‘WA’ has good estimation like before.

Estimated number of rows of 8000 is similar to actual number of rows of 8000.

The stored procedure with parameter value ‘NY’ also has good estimation and memory grant like before because the stored procedure was recompiled with current set of parameter values.

Estimated number of rows of 792000 is similar to actual number of rows of 792000.

The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit.

There was no Hash Warning in SQL Profiler.

Let’s recreate the stored procedure with optimize for hint of ‘NY’.

–Example provided by www.sqlworkshops.com

drop proc CustomersByState

go

create proc CustomersByState @State char(2) as

begin

declare @CustomerID int

select @CustomerID = e.CustomerID from Customers e

inner join CustomersState es on (e.CustomerID = es.CustomerID)

where es.State = @State

option (maxdop 1, optimize for (@State = ‘NY’))

end

go

Let’s execute the stored procedure initially with parameter value ‘WA’ and then with parameter value ‘NY’.

–Example provided by www.sqlworkshops.com

exec CustomersByState‘WA’

go

exec CustomersByState‘NY’

go

The stored procedure took 353 ms with parameter value ‘WA’, this is much slower than the optimal execution time of 294 ms we observed previously. This is because of overestimation of memory. The stored procedure with parameter value ‘NY’ has optimal execution time like before.

The stored procedure with parameter value ‘WA’ has overestimation of rows because of optimize for hint value of ‘NY’.

Unlike before, more memory was estimated to this stored procedure based on optimize for hint value ‘NY’.

The stored procedure with parameter value ‘NY’ has good estimation because of optimize for hint value of ‘NY’. Estimated number of rows of 792000 is similar to actual number of rows of 792000.

Optimal amount of memory was estimated to this stored procedure based on optimize for hint value ‘NY’.

There was no Hash Warning in SQL Profiler.

This article covers underestimation / overestimation of memory for Hash Match operation. Plan Caching and Query Memory Part I covers underestimation / overestimation for Sort. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case.

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.

Disclaimer and copyright information: This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners. Copyright of this article belongs to Ramesh Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work. This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk. Previously published under / http://sqlblogcasts.com/blogs/sqlworkshops/archive/2011/02/16/plan-caching-and-query-memory-part-ii-hash-match-when-not-to-use-stored-procedure-or-other-plan-caching-mechanisms-like-sp-executesql-or-prepared-statement.aspx / http://www.sqlworkshops.com/plancachingandquerymemory2.htm and in Level 400 Webcasts http://www.sqlworkshops.com/webcasts.