Is a scan a bad thing?

This one comes up from time to time, so I thought I’d have a go at addressing it.

Let’s imagine a hypothetical DBA who’s doing some performance tuning. He looks at a query plan for a moderately complex query and panics because there’s a couple of index scans and he wants to rather see index seeks.

Is that correct, are index scans bad and index seeks good?

Well, most of the time yes. Most of the time a scan is a problem and indicates a missing index or a query problem, but there are other times that it’s the most optimal way to get the required rows from the table.

I’ve previously looked at the case where the index seeks actually reads the the entire table, in this post I’m going to be evaluating some common query constructs to see when a seek really is the most optimal operator.

Let’s start with the simplest case, and I’m going to use the AdventureWorks database for these queries.

[source:sql]select ProductID, Name from Production.Product[/source]

In this case I get an index scan on the AK_Product_Name index and that makes perfect sense. I’m asking for all the rows in the table. there is no way that SQL can use a seek to execute that query. For there to be a seek, there has to be a SARGable predicate within the query that can be used for the seek.

Now, how about this one

[source:sql]SELECT ProductID, Name from Production.Product WHERE UPPER(Name) LIKE ‘MOUNTAIN%'[/source]

There’s a predicate and it’s on a column that’s indexed, but we still get an index scan. The scan is there because that predicate is not a SARGable one because there is a function on the column. Since the DB in question is not case sensitive, there’s no need for that particular function and, if it’s removed, we get an index seek. So in this case, the scan is not optimal and we can convert it into a much more efficient index seek with a small modification.

So far, so good.

Let’s try adding another column to this select.

[source:sql]SELECT ProductID, Name, ProductNumber from Production.Product WHERE Name LIKE ‘MOUNTAIN%'[/source]

Again I have a scan, and not just an index scan, I have a clustered index scan. A read of the entire table. Why? The predicate is SARGable and there is an index with that as the leading column.

The index on ProductNumber is not covering for this query (it doesn’t have the columns ProductNumber in it). SQL has decided that, based on the number of rows returned, it’s better for it to scan the cluster than to seek on the noncluster and do a large number of lookups.

In this particular case, the scan of the cluster is better than the seeks and bookmark lookups. When tested with an index hint, the scan of the cluster did 15 logical reads and the seek with book mark lookup did 79. So, without widening the index, here the scan is the optimal way to run this query.

One more…

[source:sql]select p.Name, Sum(sod.LineTotal) AS TotalPerProduct
from Sales.SalesOrderDetail sod
inner join Sales.SalesOrderHeader soh on sod.SalesOrderID = soh.SalesOrderID
inner join Production.Product p on sod.ProductID = p.ProductID
where SalesPersonID = 277
Group by p.name[/source]

Here I’ve got two seeks, on the sales order tables, and a full index scan on the product table, specifically of the index on product name.

The scan is there because of the number of seeks that would otherwise be required to execute this. A seek can only return a single value or a range of values. If a seek is required to evalute a join, that seek has to run once for each row in the other table. This is the classic nested loop join. In this case, there are 246 rows in the resultset that the products table needs to be joined to. That means, if SQL evaluated this with seeks, it would have to do 246 seeks. There are only 504 rows in total in the products table.

To test this, I can use the ForceSeek hint (SQL 2008 only) to force a seek on the clustered index (the one on the join column) and compare the IOs

Scan of Products:
Table ‘Product’. Scan count 1, logical reads 5, physical reads 0

Forced seek of Products:
Table ‘Product’. Scan count 0, logical reads 492, physical reads 0

That’s a major difference in the number of IOs.

So, in conclusion…. Scans are not the ideal query operators, usually are not optimal and can indicate missing indexes or poorly written queries. However there are times that scanning an index or even a table is the most optimal way of processing a query, so if there’s a query that has an index/table scan in it, maybe spend a few minutes understanding why the scan’s there in the first place, before spending time trying to get rid of it.

11 Comments

Chris S 29 July 2009 at 10:47

Great article again Gail, I was just talking to someone about this the other day, and now I can point them to a great read.
Thanks

Reply ↓
Joe Smith 29 July 2009 at 11:59

The SQL markup got a bit borked in the middle of your article. Any chance of fixing it up because it makes it very hard to read. Otherwise it is a very nice article, and yet another thing for me to keep in mind when trying to clean up my old SQL.

Reply ↓
Gail 29 July 2009 at 15:32

Fixed. Sorry about that.

Reply ↓
Donovan 30 July 2009 at 14:47

Great read does bring insight to a common misconception

Reply ↓
Pingback: Something for the weekend 31/07/09 | John Sansom - SQL Server DBA in the UK
SQLDenis 31 July 2009 at 20:38

Gail, sometimes a scan will be exactly the same as a seek. See here for some code, execution plans and statistics IO http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/is-an-index-scan-always-better-or-faster

I know that it probably doesn’t make a lot of sense but people should not always panic when they see a scan 🙂

Reply ↓
Gail 31 July 2009 at 23:49

Yeah, I wrote about seeks that do the same amount of work as scans a while back. https://www.sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/

The point of this post was just want you said, scans are not always bad, don’t panic and try to get rid of a scan without understanding why it’s there first.

Reply ↓
DMJM 2 August 2009 at 23:54

Regarding the second query where the UPPER function was superfluous – I switched the collation on some columns where case was significant. My user is not going to assume case-insensitivity in her queries and I as database developer am not going to assume the database collation will remain case-insensitive. I would rather trade performance for the possibility of a future subtle error.

Reply ↓
Pingback: Weekly Link Post 104 « Rhonda Tipton’s WebLog
Pingback: Log Buffer #157: a Carnical of the Vanities for DBAs | Pythian Group Blog
stephanie b 20 February 2012 at 18:28

Is a scan a bad thing ?
As always, ‘It depends’.

Reply ↓

Is a scan a bad thing?

Share this:

11 Comments

Leave a Comment Cancel reply