Prefetch – Querying at the speed of SAN

Prefetch is a mechanism with which SQL Server can fire up many I/O requests in parallel for Nested Loop join.

The SAN administrator says your data volume has a throughput capacity of 400MB/sec. But your long running query is waiting for I/Os (PAGEIOLATCH_SH) and Windows Performance Monitor shows your data volume is doing 4MB/sec. Where is the problem?

When SQL Server does a Nested Loop join, it might enable Prefetch. Prefetch is a functionality where SQL Server fires up many I/O requests in parallel (many outstanding I/Os). Prefetch will lead to better performance for Nested Loop join queries when there are lots of rows in the outer input table. 25 is the magic number. When SQL Server estimates less than 25 rows for the outer input table Prefetch will be disabled. And when more than 25 rows are estimated Prefetch will be enabled.

To read additional articles I wrote click here. To watch the webcasts click here.

This is a perfect design (yes, I am an ex-Microsoft employee!); SQL Server wants to avoid Prefetch for light weight queries, especially in an OLTP environment as too many queries doing Prefetch will hurt performance. I remember from my days at SQL Server Development team, OLTP benchmarks including latest TPC-E disables Prefetch with trace flag 8744. Write me an email and I will send you the link.

Why this is a problem: I observe this at many customers; plan is cached with Prefetch disabled because first execution resulted in less than 25 rows estimation for the outer input table. Then the plan is used for parameters that are resulting in more than 25 rows, actually 1000s of rows for the outer input table. In this case since SQL Server cached the plan, Prefetch is disabled and SQL Server is firing up 1 I/O at a time. This leads to poor utilization of SAN. Symptoms: Query is waiting for I/Os (PAGEIOLATCH_SH) for a long time (reading page after page), but disk queue length is never more than 1.

The best way to learn is to practice. To create the below tables and reproduce the behavior, join the mailing list by using this link: www.sqlworkshops.com/ml and I will send you the table creation script.

–Example provided by www.sqlworkshops.com
–Create procedure that pulls orders based on City
drop proc RegionalOrdersProc
go
create proc RegionalOrdersProc @City char(20)
as
begin
declare @OrderID int, @OrderDetails char(200)
select @OrderID = o.OrderID, @OrderDetails = o.OrderDetails
from RegionalOrders ao inner join Orders o on (o.OrderID = ao.OrderID)
where City = @City
end
go

set statistics time on
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter SmallCity1
exec RegionalOrdersProc ‘SmallCity1′
go

–Right click on Clustered Index Scan
–to look at the properties
–Estimated number of rows in the
–outer input table is less than 25

–Right click on Nested Loops
–to look at the properties
–Notice Prefetch is disabled (missing)

–Clear data cache
checkpoint
dbcc dropcleanbuffers
go

Think about a business running a billing job looping over thousands of their customers using stored procedure. If the first customer has placed less than 25 orders for the billing period then Prefetch will be disabled and every customer is limited to using 1 spindle at a time.

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter BigCity
–We are using cached plan
exec RegionalOrdersProc ‘BigCity’
go

–Estimated number of rows in the
–outer input table is less than 25

–Notice Prefetch is disabled (missing)

–Query execution time 7 seconds
–What was the execution time?

–Clear procedure cache
–to trigger a new optimization
dbcc freeproccache
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter SmallCity2
exec RegionalOrdersProc ‘SmallCity2′
go

–Clear data cache
checkpoint
dbcc dropcleanbuffers
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter SmallCity
–We are using cached plan
exec RegionalOrdersProc ‘BigCity’
go

–Estimated number of rows in the
–outer input table is greater than 25

–Notice Prefetch is enabled

–Query execution time 3 seconds
–On a SAN it was less than a second
–What was the execution time?

–Example provided by www.sqlworkshops.com
–You can fix the issue by using any of the following
–hints
–Create procedure that pulls orders based on City

drop proc RegionalOrdersProc
go
create proc RegionalOrdersProc @City char(20)
as
begin
declare @OrderID int, @OrderDetails char(200)
select @OrderID = o.OrderID, @OrderDetails = o.OrderDetails
from RegionalOrders ao inner join Orders o on (o.OrderID = ao.OrderID)
where City = @City
–Hinting optimizer to use SmallCity2 for estimation
option (optimize for (@City = ‘SmallCity2′))
–Hinting optimizer to estimate for the currnet parameters
–option (recompile)
–Hinting optimize not to use histogram rather
–density for estimation (average of all 3 cities)
–option (optimize for (@City UNKNOWN))
–option (optimize for UNKNOWN)
end
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter SmallCity1
exec RegionalOrdersProc ‘SmallCity1′
go

–Clear data cache
checkpoint
dbcc dropcleanbuffers
go

–Example provided by www.sqlworkshops.com
–Execute the procedure with parameter BigCity
–We are using cached plan
exec RegionalOrdersProc ‘BigCity’
go

–Notice Prefetch is enabled
–Estimated number of rows in the
–outer input table is greater than 25
–Query execution time 3 seconds
–On a SAN it was less than a second
–What was your execution time?

Some of you might think plan guides or plan freezing might solve this issue, well it won’t help here. The estimation from your plan guide plan will be ignored by the optimizer. Optimizer makes new estimation based on current set of parameter values (compile time parameter values).

I explain these concepts with detailed examples in my webcasts (www.sqlworkshops.com/webcasts), I recommend you to watch them. The best way to learn is to practice. To create the above tables and reproduce the behavior, join the mailing list at www.sqlworkshops.com/ml and I will send you the relevant SQL Scripts.

If you search for trace flag 8744 you will end up with a KB Article 920093 titled: Tuning options for SQL Server 2005 and SQL Server 2008 when running in high performance workloads. There it is explained: Trace flag 8744 disables pre-fetching for the Nested Loops operator (with caution). Even though the topic looks interesting ‘Tuning options for high performance workloads’, don’t try these in production. From my point of view this article is about documenting undocumented trace flags used in benchmarks, let’s leave it at that!

Disclaimer and copyright information:
This article refers to organizations and products that may be the trademarks or registered trademarks of their various owners.
Copyright of this article belongs to Ramesh Meyyappan / www.sqlworkshops.com. You may freely use the ideas and concepts discussed in this article with acknowledgement (www.sqlworkshops.com), but you may not claim any of it as your own work.
This article is for informational purposes only; you use any of the suggestions given here entirely at your own risk.

Previously published under http://sqlblogcasts.com/blogs/sqlworkshops/archive/2010/10/19/prefetch-querying-at-the-speed-of-san.aspx / http://www.mssqltips.com/sqlservertip/2173/speed-up-sql-server-queries-with-prefetch / http://www.sqlworkshops.com/prefetch.htm and in Level 400 Webcasts http://www.sqlworkshops.com/webcasts.