I remember a forum thread from a while back. The question was on how to get rid of the index scan that was in the query plan. Now that’s a poor question in the first place, as the scan might not be a problem, but it’s the first answer that really caught my attention.
Since the primary key is on an identity column, you can add a clause like ID > 0 to the query, then SQL will use an index seek.
Technically that’s correct. If the table has an identity column with the default properties (We’ll call it ID) and the clustered index is on that identity column, then a WHERE clause of the form WHERE ID > 0 AND <any other predicates on that table> can indeed execute with a clustered index seek (although it’s in no way guaranteed to do so). But is it a useful thing to do?
Time for a made up table and a test query.
CREATE TABLE dbo.Orders( OrderID INT IDENTITY(1,1) NOT NULL PRIMARY KEY, OrderDate DATETIME2(7) NULL, ShipmentRef CHAR(10) NULL, ShipmentDate DATE NULL, Status VARCHAR(20) NOT NULL );
That’ll do the job. And then a few hundred thousand rows via SQL Data Generator and we’re good to go.
And for a query that has a nasty index scan, how about
SELECT OrderDate, ShipmentDate, Status FROM dbo.Orders WHERE LTRIM(RTRIM(Status)) = 'Delivered';
Now, that’s running as a clustered index scan because the predicate’s not SARGable and besides, there’s no index on that column, but let’s pretend we don’t know that.
If I add a WHERE clause predicate that filters no row out, can I get a query plan with an index seek?
SELECT OrderDate, ShipmentDate, Status FROM dbo.Orders WHERE LTRIM(RTRIM(Status)) = 'Delivered' AND OrderID > 0;
Why yes, I can.
Op Success? Well…
The goal of performance tuning is to improve the performance of a query, not to change operators in a query plan. The plan is a tool, not a goal.
Have we, by adding a WHERE clause predicate that filters out no rows, improved performance of the query? This needs an extended events session to answer. Nothing fancy, just a sql_statement_completed event will do the trick.
I ran each query 10 times, copied the captured events into Excel and averaged them:
Query with just the LTRIM(RTRIM(Status)) = ‘Delivered’
Query with LTRIM(RTRIM(Status)) = ‘Delivered’ AND OrderID > 0
We haven’t tuned that query. I won’t say we’ve made it slower either, the differences are well within the error range on our measuring, but there’s definitely no meaningful performance gain.
There’s no gain because we haven’t changed how the query executes. A scan, and in this case it will be a scan of the entire index, will likely use the non-leaf levels of the b-tree to locate the logical first page of the leaf level, then will read the entire leaf level. The seek we managed to generate will use the b-tree to find the value 0 in the clustered index key, that’s what makes it a seek. Since the column is an identity starting at 1, that means the first row read will be on the logical first page of the leaf level, then it will read the entire leaf level.
Both will do the same amount of work, and so we haven’t done anything useful to the query by adding a WHERE clause that filters out no rows.
Scans are not always bad. If a query needs to read every row of a table, that’s a scan and effort shouldn’t be expended trying to make it an index seek.
To improve the performance of a query, we need to make changes that reduce the work needed to run the query. That often starts with reducing the amount of data that the query reads, by changing the query so that it can use indexes effectively and/or adding indexes to support the query. Not by adding pointless pieces to a query just to change plan operators from ones that are believed to be bad to ones that are believed to be good. Doing that is just a waste of time and effort.