Is a scan a bad thing?

This one comes up from time to time, so I thought I’d have a go at addressing it.

Let’s imagine a hypothetical DBA who’s doing some performance tuning. He looks at a query plan for a moderately complex query and panics because there’s a couple of index scans and he wants to rather see index seeks.

Is that correct, are index scans bad and index seeks good?

Well, most of the time yes. Most of the time a scan is a problem and indicates a missing index or a query problem, but there are other times that it’s the most optimal way to get the required rows from the table.

I’ve previously looked at the case where the index seeks actually reads the the entire table, in this post I’m going to be evaluating some common query constructs to see when a seek really is the most optimal operator.

Let’s start with the simplest case, and I’m going to use the AdventureWorks database for these queries.

[source:sql]select ProductID, Name from Production.Product[/source]

In this case I get an index scan on the AK_Product_Name index and that makes perfect sense. I’m asking for all the rows in the table. there is no way that SQL can use a seek to execute that query. For there to be a seek, there has to be a SARGable predicate within the query that can be used for the seek.

Now, how about this one

[source:sql]SELECT ProductID, Name from Production.Product WHERE UPPER(Name) LIKE ‘MOUNTAIN%'[/source]

There’s a predicate and it’s on a column that’s indexed, but we still get an index scan. The scan is there because that predicate is not a SARGable one because there is a function on the column. Since the DB in question is not case sensitive, there’s no need for that particular function and, if it’s removed, we get an index seek. So in this case, the scan is not optimal and we can convert it into a much more efficient index seek with a small modification.

So far, so good.

Let’s try  adding another column to this select.

[source:sql]SELECT ProductID, Name, ProductNumber from Production.Product WHERE Name LIKE ‘MOUNTAIN%'[/source]

Again I have a scan, and not just an index scan, I have a clustered index scan. A read of the entire table. Why? The predicate is SARGable and there is an index with that as the leading column.

The index on ProductNumber is not covering for this query (it doesn’t have the columns ProductNumber in it). SQL has decided that, based on the number of rows returned, it’s better for it to scan the cluster than to seek on the noncluster and do a large number of lookups.

In this particular case, the scan of the cluster is better than the seeks and bookmark lookups. When tested with an index hint, the scan of the cluster did 15 logical reads and the seek with book mark lookup did 79. So, without widening the index, here the scan is the optimal way to run this query.

One more…

[source:sql]select p.Name, Sum(sod.LineTotal) AS TotalPerProduct
from Sales.SalesOrderDetail sod
inner join Sales.SalesOrderHeader soh on sod.SalesOrderID = soh.SalesOrderID
inner join Production.Product p on sod.ProductID = p.ProductID
where SalesPersonID = 277
Group by p.name[/source]

Here I’ve got two seeks, on the sales order tables, and a full index scan on the product table, specifically of the index on product name.

The scan is there because of the number of seeks that would otherwise be required to execute this. A seek can only return a single value or a range of values. If a seek is required to evalute a join, that seek has to run once for each row in the other table. This is the classic nested loop join. In this case, there are 246 rows in the resultset that the products table needs to be joined to. That means, if SQL evaluated this with seeks, it would have to do 246 seeks. There are only 504 rows in total in the products table.

To test this, I can use the ForceSeek hint (SQL 2008 only) to force a seek on the clustered index (the one on the join column) and compare the IOs

Scan of Products:
Table ‘Product’. Scan count 1, logical reads 5, physical reads 0

Forced seek of Products:
Table ‘Product’. Scan count 0, logical reads 492, physical reads 0

That’s a major difference in the number of IOs.

So, in conclusion…. Scans are not the ideal query operators, usually are not optimal and can indicate missing indexes or poorly written queries. However there are times that scanning an index or even a table is the most optimal way of processing a query, so if there’s a query that has an index/table scan in it, maybe spend a few minutes understanding why the scan’s there in the first place, before spending time trying to get rid of it.

11 Comments

  1. Chris S

    Great article again Gail, I was just talking to someone about this the other day, and now I can point them to a great read.
    Thanks

    Reply
  2. Joe Smith

    The SQL markup got a bit borked in the middle of your article. Any chance of fixing it up because it makes it very hard to read. Otherwise it is a very nice article, and yet another thing for me to keep in mind when trying to clean up my old SQL.

    Reply
  3. Gail

    Fixed. Sorry about that.

    Reply
  4. Donovan

    Great read does bring insight to a common misconception

    Reply
  5. Pingback: Something for the weekend 31/07/09 | John Sansom - SQL Server DBA in the UK

  6. SQLDenis

    Gail, sometimes a scan will be exactly the same as a seek. See here for some code, execution plans and statistics IO http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/is-an-index-scan-always-better-or-faster

    I know that it probably doesn’t make a lot of sense but people should not always panic when they see a scan 🙂

    Reply
  7. Gail

    Yeah, I wrote about seeks that do the same amount of work as scans a while back. https://www.sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/

    The point of this post was just want you said, scans are not always bad, don’t panic and try to get rid of a scan without understanding why it’s there first.

    Reply
  8. DMJM

    Regarding the second query where the UPPER function was superfluous – I switched the collation on some columns where case was significant. My user is not going to assume case-insensitivity in her queries and I as database developer am not going to assume the database collation will remain case-insensitive. I would rather trade performance for the possibility of a future subtle error.

    Reply
  9. Pingback: Weekly Link Post 104 « Rhonda Tipton’s WebLog

  10. Pingback: Log Buffer #157: a Carnical of the Vanities for DBAs | Pythian Group Blog

  11. stephanie b

    Is a scan a bad thing ?
    As always, ‘It depends’.

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.