TIL: Queries against linked servers... Be careful how you structure them!
The fact that a query across a SQL linked server (to another SQL server) is sometimes off-loaded to the remote server and sometimes not even when the entire query is operating on the remote data is something that's probably well known to the DBAs of this world, but it was a bit of a surprise to me. Even with a very simple query, and very simple table behind it, the impact of not knowing where the query will be executed can be quite high!
Take a very simple table that contains a date field, though the actual table I've run the queries against contain more columns than this that shouldn't have a maetrial impact on the queries that I'm executing or teh write-up about it. Here's the SQL to create a simplified version of the table I've used for the queries:
CREATE TABLE [dbo].[Orchard_AuditTrail_AuditTrailEventRecord] ( [CreatedUtc] [datetime] NULL )
Now lets get a count of records that were created more than 100 days ago, two ways, starting by using the DATEDIFF function
SELECT COUNT(1) FROM [ORCHARD].[www.robertwray.co.uk].[dbo].[Orchard_AuditTrail_AuditTrailEventRecord] WHERE DATEDIFF(day, CreatedUtc, GETDATE()) > 100
The problem with this query is that the DATEDIFF function gets executed on the local SQL Server, rather than the remote one. In order for it to do this, the data that's being processed by the DATEDIFF has to be transferred over the wire between the two servers. The execution plan for query looks like this:
Look at the 'Output List' and 'Remote Query', here we can see that the output from the Remote Query is the content of the CreatedUtc column for every single row in the table. If we write the query in a different way:
SELECT COUNT(1) FROM [ORCHARD].[www.robertwray.co.uk].[dbo].[Orchard_AuditTrail_AuditTrailEventRecord] WHERE DATEADD(day, -100, GETDATE()) > CreatedUtc
Then we end up with a muich different plan:
This shows the remote server performing all the work which is really waht we want to see. I've not found any documentation that makes it clear what causes a query to be remoted fully, or partially. In the absence of definitive guidance (which I'm sure must be out there somewhere!) it just reinforces how important it is to look at the execution plan for your queries to make sure they're as performant as they can be.