TIL: Queries against linked servers... Be careful how you structure them!

Two different execution plans for queries against a linked server that are functionally identical

The fact that a query across a SQL linked server (to another SQL server) is sometimes off-loaded to the remote server and sometimes not even when the entire query is operating on the remote data is something that's probably well known to the DBAs of this world, but it was a bit of a surprise to me. Even with a very simple query, and very simple table behind it, the impact of not knowing where the query will be executed can be quite high!

Take a very simple table that contains a date field, though the actual table I've run the queries against contain more columns than this that shouldn't have a maetrial impact on the queries that I'm executing or teh write-up about it. Here's the SQL to create a simplified version of the table I've used for the queries:

CREATE TABLE [dbo].[Orchard_AuditTrail_AuditTrailEventRecord]
(
	[CreatedUtc] [datetime] NULL
)

Now lets get a count of records that were created more than 100 days ago, two ways, starting by using the DATEDIFF function

SELECT	COUNT(1)
FROM	[ORCHARD].[www.robertwray.co.uk].[dbo].[Orchard_AuditTrail_AuditTrailEventRecord]
WHERE	DATEDIFF(day, CreatedUtc, GETDATE()) > 100

The problem with this query is that the DATEDIFF function gets executed on the local SQL Server, rather than the remote one. In order for it to do this, the data that's being processed by the DATEDIFF has to be transferred over the wire between the two servers. The execution plan for query looks like this:

The execution plan for the query when using DATEDIFF in the where clause

Look at the 'Output List' and 'Remote Query', here we can see that the output from the Remote Query is the content of the CreatedUtc column for every single row in the table. If we write the query in a different way:

SELECT	COUNT(1)
FROM	[ORCHARD].[www.robertwray.co.uk].[dbo].[Orchard_AuditTrail_AuditTrailEventRecord]
WHERE	DATEADD(day, -100, GETDATE()) > CreatedUtc

Then we end up with a muich different plan:

The execution plan when using DATEADD instead of DATEDIFF is much better!

This shows the remote server performing all the work which is really waht we want to see. I've not found any documentation that makes it clear what causes a query to be remoted fully, or partially. In the absence of definitive guidance (which I'm sure must be out there somewhere!) it just reinforces how important it is to look at the execution plan for your queries to make sure they're as performant as they can be.

About Rob

I've been interested in computing since the day my Dad purchased his first business PC (an Amstrad PC 1640 for anyone interested) which introduced me to MS-DOS batch programming and BASIC.

My skillset has matured somewhat since then, which you'll probably see from the posts here. You can read a bit more about me on the about page of the site, or check out some of the other posts on my areas of interest.

No Comments

Add a Comment