In a similar vein to last week’s blog post… I heard an interesting comment recently. “Change that Column != 2 to a Column > 2 or Column < 2 combination, it can use indexes better.”
Sounds like something that clearly needs testing!
I’ll start with simple numbers table.
CREATE TABLE dbo.Numbers ( Number INT NOT NULL ); ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number);
and put 1 million rows into it
INSERT INTO dbo.Numbers (Number) SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;
Let’s start, before we get into comparing things, with looking at the execution plan of a query with a != (or <>) operator.
SELECT Number FROM Numbers WHERE Number <> 12742; -- because 2 is on the first page of the index, and I don’t want any special cases here
That’s kinda complicated for a query with one table and one predicate. Let’s look at in pieces. The easiest place to start is the Clustered Index Seek. The seek predicate on the clustered index seek is
Seek Keys: Start: [Test].[dbo].[Numbers].Number > Scalar Operator([Expr1009]), End: [Test].[dbo].[Numbers].Number < Scalar Operator([Expr1010])
Hmm…Looks like the parser/optimiser has already made our intended change for us. There’s some funky stuff in the top part of the plan, but what it’s essentially doing is generating two rows for the nested loop join, both with just the value that we’re excluding from the query, then the seek runs twice. I suspect that’s once for the less than 12742 and once for the greater than 12742 portions of the original predicate.
But, let’s do the full due diligence, the plan may not tell the whole story.
The performance numbers for the inequality form of the query, gathered via Extended Events and aggregated with Excel are:
Logical reads: 1619
This is our baseline, the numbers we’re comparing against. If the comment mentioned at the beginning is correct, then the revised query will have a significantly better performance.
The revised query is:
SELECT Number FROM Numbers WHERE Number > 12742 OR Number < 12742;
Execution plan is much simpler, no constant scans, no joins. Just a single index seek operation that executes once.
Is is better though?
Logical reads: 1619
No, it’s not.
Yes, we have a simpler plan, but we do not have a more efficient query. We’re still reading every page in the index and fetching all but one row of the table. The work required is the same, the performance characteristics are the same.
But, maybe, if the numbers aren’t unique and we’re excluding more than just one row it’ll be different.
That needs a slightly different table to test on.
CREATE TABLE MoreNumbers ( SomeNumber INT NOT NULL, Filler CHAR(100) ); CREATE CLUSTERED INDEX idx_MoreNumbers ON dbo.MoreNumbers(SomeNumber); GO INSERT INTO dbo.MoreNumbers (SomeNumber, Filler) SELECT TOP (500000) NTILE(50) OVER (ORDER BY (SELECT 1)), '' FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;
I’m just going to look at the performance characteristics this time. The execution plans are the same as for the earlier query. The two queries are:
SELECT * FROM dbo.MoreNumbers WHERE SomeNumber != 24; SELECT * FROM dbo.MoreNumbers WHERE SomeNumber < 24 OR SomeNumber > 24;
Logical Reads 7624
Logical Reads 7624
Just like with the pointless WHERE clause predicate last week, we have a query change that has had no effect on the query performance. Now, to be honest, there are some query form changes that can improve performance. For example, converting a set of OR predicates to UNION can improve query performance sometimes (and leave it unchanged in others), and so these kinds of rewrites do need to be tested to see if they’re useful.
More importantly though, those of us who are posting on forums and advising others have a responsibility to do these tests before we recommend changes to others, as they may very well not do them. If we don’t, we’re propagating myths and not helping the advancement of our field.