Does an index scan always read the entire index?

No.

That’s a bit short for a blog post, so let me explain. First, the difference between a seek and a scan.

A seek is an operation which navigates down the index’s b-tree looking for a row or for the start/end of a range of rows. A seek requires a predicate and that predicate must be of the form that can be used as a search argument (SARGable).

A scan is a read of the leaf level of an index, possibly also reading the intermediate pages as well.

The key there is that a seek requires a predicate. If there’s no predicate, we cannot have a seek and hence must have a scan.

Let’s look at a couple of examples. I’m going to use a simple Numbers table as it’s perfectly adequate for what we’re doing here.

CREATE TABLE Numbers (
  Number INT NOT NULL PRIMARY KEY CLUSTERED
);

INSERT INTO Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM msdb.sys.columns c1 CROSS JOIN msdb.sys.columns c2;

With the table created, I want to look at how many pages that clustered index has. It won’t be a huge number, the table is very small.

SELECT OBJECT_NAME(object_id) AS TableName, index_level, page_count, record_count, avg_record_size_in_bytes
  FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Numbers'),1,NULL, 'Detailed');

IndexPages

Three level deep clustered index, 1608 pages at the leaf and a total of 1614 pages in all levels.

Let’s start with something basic:

SET STATISTICS IO ON;
GO

SELECT * FROM Numbers;

FullTableScan

Table ‘Numbers’. Scan count 1, logical reads 1615, physical reads 0

That read the entire index, every page at every level. The extra page was likely the IAM page for that index. That’s kinda what we expect a scan to be.

For contrast, let’s try an obvious seek.

SELECT * FROM Numbers WHERE Number = 137;

Seek

Table ‘Numbers’. Scan count 0, logical reads 3, physical reads 0.

A single-row seek operation does three reads, which makes sense since we have a three-level deep clustered index.

Now, what about this?

SELECT TOP (1) * FROM Numbers;

It can’t be a seek operation, there’s no predicate. The only way this can be implemented is with a scan.

Top1Scan

It is indeed a scan, but did it read the entire table?

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

No. A scan of the entire index is over 1600 pages. This query read three. It’s a scan, but it’s a scan which stopped after reading one page of the leaf (the other two are likely the root and intermediate pages, used to locate the first page in the leaf).

The scan read one row and then stopped, because that’s all that was needed. It did that, because there was a 1 row row-goal added to the query. For more details on row goals, see Paul White’s article on the subject.

There’s other cases where a scan won’t read the entire index leaf level too.

Aggregations, well MIN and MAX of an indexed column:

SELECT MIN(Number) FROM Numbers;

IndexScanMin

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

EXISTS:

IF EXISTS(SELECT 1 FROM Numbers)
SELECT 1;

ExistsScan

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

In conclusion, a seek operation requires a predicate. Without a predicate, a query has to be evaluated with a scan, but a scan doesn’t always mean that the entire table is read.

9 Comments

  1. sateesh

    Thank you Gail.Your Posts are very well explained and easy to understand.Thanks for sharing knowledge

    Reply
  2. Holo

    You could also add SELECT TOP(1000) or something to show that the number of pages is > 3 and < 1600

    Reply
  3. Farhan

    Very interesting post

    Reply
  4. Sharon rimer

    Nice post

    Reply
  5. Joerg Hencke

    Thank you so much Gail!
    As a dba, I was recently asked to to give an internal session on sql-server-related do’s and don’ts for our .net-developers. What you presented here is a perfect starter for getting familiar with the topic of query optimization!
    Thanks a lot for this and your community related works (espescially for my favorite: https://www.simple-talk.com/sql/t-sql-programming/how-to-confuse-the-sql-server-query-optimizer/)!
    Joerg

    Reply
  6. Eddie

    Gail, So good! My only question after reading this blog is why does “SELECT * FROM Numbers WHERE Number = 137;” result in ZERO scans? “Table ‘Numbers’. Scan count 0, logical reads 3, physical reads 0.”

    How can data possibly be returned if scan count is zero?

    Reply
    1. Gail (Post author)

      Because Scans is not the number of times the table has been accessed. It’s meaningless, ignore it.

      Reply
  7. Edd

    Ok, then I am confused. BOL state that scan in set statistics IO refers to the “Number of seeks/scans started after reaching the leaf level in any direction to retrieve all the values to construct the final dataset for the output.”

    found here:
    https://docs.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql

    Why does MS state that scan count refers to the leaf level pages that have been access if that not what this is?

    Thanks,
    Eddie

    Reply
    1. Gail (Post author)

      No idea, I didn’t write the docs. I guess that’s what it’s supposed to be, but as you can see from the 0 scans, it’s not.

      Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.