One wide index or multiple narrow indexes?

TSQL2sDay150x150 Or “If one index is good, surely many indexes (indexes? indices? indi?) will be better

This is a question that comes up very often on the forums. Something along the lines of:

I have a query with multiple where clause conditions on a table. Should I create one index for each condition, or one index with all the columns in it?

The question basically boils down to this: Which is more optimal and more likely for the optimiser to pick, a single seek operation against a wide index that seeks on all three conditions in one go, or three seek operations against three indexes followed by a join to get back the final set of rows.

One thing to keep in mind is that one of the jobs of an index is to reduce the number of rows in consideration for a query as early as possible in the query’s execution.

So let’s take a made-up example. Let’s say we have a table with a number of columns in it. A query is run against that table with three conditions in the where clause

WHERE ColA = @A AND ColB = @B AND ColC = @C

Let’s further say that 1000 rows qualify for the condition ColA = @A, 15000 rows qualify for ColB = @B and 30000 rows qualify for ColC = @C. The total number of rows that qualify for all three conditions is 25.

Which sounds like it would be more efficient?

  • Seek on an index with all three columns and retrieve just 25 rows
  • Seek on an index on ColA, retrieve 1000 rows, seek on an index on ColB, retrieve 15000 rows, seek on an index on ColC, retrieve 30000 rows then join the three result-sets together to get the desired 25 rows (called an Index Intersection)

Time for some tests to find out.

CREATE TABLE TestingIndexUsage (
id INT IDENTITY PRIMARY KEY,
FilterColumn1 INT,
FilterColumn2 INT,
FilterColumn3 INT,
Filler CHAR(500) DEFAULT ''-- simulate other columns in the table.
)
GO

INSERT INTO TestingIndexUsage (FilterColumn1, FilterColumn2, FilterColumn3)
SELECT TOP ( 1000000 )
ABS(CHECKSUM(NEWID()))%200,
ABS(CHECKSUM(NEWID()))%40,
ABS(CHECKSUM(NEWID()))%20
FROM msdb.sys.columns a CROSS JOIN msdb.sys.columns b
GO

First off, I’m going to create three individual indexes on the three filter columns and see what kind of plan SQL comes up with.

CREATE INDEX idx_Temp_FilterColumn1 ON dbo.TestingIndexUsage (FilterColumn1)
CREATE INDEX idx_Temp_FilterColumn2 ON dbo.TestingIndexUsage (FilterColumn2)
CREATE INDEX idx_Temp_FilterColumn3 ON dbo.TestingIndexUsage (FilterColumn3)

And the query…

SELECT ID FROM dbo.TestingIndexUsage
WHERE FilterColumn1 = 68 -- 4993 matching rows
AND FilterColumn2 = 26 -- 24818 matching rows
AND FilterColumn3 = 3  -- 49915 matching rows

The comments show how many rows each predicate returns alone. Combined they return 19 rows.

The plan shows the semi-expected index intersection. Seeks on 3 indexes, two merge join operators to join the three resultsets into one.

IndexIntersection

But what about the performance characteristics?

Table ‘TestingIndexUsage’. Scan count 3, logical reads 150, physical reads 0.

SQL Server Execution Times:
CPU time = 422 ms,  elapsed time = 435 ms.

The reads aren’t very high (as the indexes are extremely narrow), but that CPU time is not exactly low. Almost half a second on the CPU to return 19 rows from a 1 million row table? Not good, especially if this is going to run often.

Right, so that’s the three separate indexes. What about the case of a single index with all three columns. In this case, because all three are SARGable equality predicates, the order of the columns isn’t critical for index usage, so I’ll put them in order of selectivity.

DROP INDEX idx_Temp_FilterColumn1 ON dbo.TestingIndexUsage
DROP INDEX idx_Temp_FilterColumn2 ON dbo.TestingIndexUsage
DROP INDEX idx_Temp_FilterColumn3 ON dbo.TestingIndexUsage

CREATE INDEX idx_Temp_FilterColumn123 ON dbo.TestingIndexUsage (FilterColumn1, FilterColumn2, FilterColumn3)

And run the query again.

As kinda expected, the execution plan has a single index seek operation. Exec plan looks cleaner, what do the performance characteristics say?

Table ‘TestingIndexUsage’. Scan count 1, logical reads 3, physical reads 0.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

Just about says it all. 147 fewer reads and a 100% reduction in CPU cost. The reduction in reads isn’t going to make a major difference, the reads were low anyway, but the reduction in CPU cost is going to make an impact if this query is frequently run.

So what can we conclude from this?

The optimal index for a query with multiple conditions in the where clause is a single index with all the columns that are used in the where clause in it. The order of these columns may matter, depending on how they are used in the where clause (see Equality predicates and Inequality predicates)

SQL can use multiple indexes on a single table (Index Intersection), but it’s not the most efficient option. It’s worth nothing that SQL won’t always chose to do the index intersection. It may quite well decide that a table/clustered index scan is faster than the multiple seeks and joins that the intersection will do. Or, if one of the conditions is very selective, it may decide to seek on one of the indexes, do key lookups to fetch the rest of the columns and then do secondary filters to evaluate the rest of the predicates.

Now it may not always be possible to create a perfect index for all queries on a table, so in some cases, especially for less important queries, having multiple indexes that SQL can seek and intersect may be adequate, but for the more critical, more frequently run queries you probably want a single index with the appropriate columns.

As an aside, this is why the often-mentioned index ‘strategy’ of a single column index on each column of a table is near-useless and certainly not worth the title ‘strategy’.

31 Comments

  1. Klaus Aschenbrenner

    Hi,

    What would be the ideal index design, when each column in the WHERE clause is optional, eg. “WHERE ColA = @A AND ColC = @C”, or “WHERE ColB = @B AND ColC = @C”? In this case it would be the better option to work with 3 separate indexes, or?

    Thanks

    -Klaus

    Reply
  2. Pradeep Singh

    Thanks for this article. Very clear explaination of the difference between these two types of index usage.

    Reply
  3. Gail (Post author)

    Klaus, in that cast you’d probably want 2 different 2-column wide indexes. (A, C) and (B, C), but it really depends on how the queries are designed and working, and how selective those columns are.

    What you want is the minimum number of indexes to fully (if possible) support the queries.

    Reply
  4. Andy Leonard

    Great post!

    Question: Does INCLUDE impact this? What are your thoughts on creating an index on FilterColumn1 that includes values for FilterColumn2 and FilterColumn3?

    :{> Andy

    Reply
  5. Gail (Post author)

    Not if all three columns are filtered on. With FilterColumns 2 and 3 only as include columns, SQL can’t (obviously) seek on them. It can only seek on Column1.

    Now sure, those columns are in the index, so no index intersection or additional lookups are required, but SQL still has to retrieve far more data than is actually needed and then filter out the rows it doesn’t need.

    It goes back to one of my key principles for indexes – allow SQL to reduce the number of rows as much as possible as early as possible.

    Would you prefer to read all the Browns in the telephone directory looking for M. Brown who lives in Benoni, or would you prefer to be able to go straight to Brown, M. Benoni?

    Reply
  6. Thomas LeBlanc

    Gila,

    Always detailed and informative, great post. Thanks for helping those in need of solid direction on SQL Server.

    Thomas LeBlanc
    TheSmilingDBA

    Reply
  7. Pingback: T-SQL Tuesday #10 Round up, An index of participants | Michael J. Swart

  8. Steve Dorris

    Gail,
    Not only a great article, but excellent examples. I’ve learned a great deal about the internals of SQL Server from all of your articles. Thanks for sharing!
    Steve

    Reply
  9. Gail (Post author)

    Glad you’re finding them useful.

    Reply
  10. Eddie Zambrana

    Hi Claus

    The one index solution will still be the best choice even if you have three separate where clauses. The reason for this is that the execution plan will be the same for each of the where statements and you get the added benefit of having only one index for Insert, Deletions and Additions which will save a lot of time.

    Hope it helps out.

    Reply
  11. Gail (Post author)

    I don’t agree.

    If the index is on three columns (A, B, C), any query that does not filter on A cannot seek on that index. See https://www.sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

    Reply
  12. Mark

    I’ve wondered about these questions myself… Thank you for the post and for answering the questions in the comments Gail.

    Reply
  13. JJEugene

    So well written, easy to understand, and to the point. Thanks.

    Reply
  14. Khalid Flynn

    Excellent article, thanks for sharing.

    Reply
  15. Prashant

    Great Explanation..!!!!!

    Reply
  16. Sean Smith

    Direct, to the point, and brings about a lot of clarity. Great stuff!

    Reply
  17. Gail (Post author)

    Thanks. Glad you found it useful.

    Reply
  18. MadTester

    Once again your knowledge puts DBA’s to shame.

    Reply
  19. Anurag

    Excellent.. !!!

    Reply
  20. Utsav

    What a learning it was to read this article. I have a question here from architecture point of view.When you are not sure that what queries will be written over created table; what should be indexing strategy? We are using independent indexes in such cases and I think I am going to follow it even after getting convinced from this article.

    Reply
  21. Gail (Post author)

    That’s not all that uncommon a scenario.

    You trace and monitor and create indexes to support the most frequent, most important queries that you pick up in your monitoring.

    Reply
  22. Aaron

    Post is really enlightening. How do i create indexes for table A, on which i sometimes i query on 3 columns and sometimes i query on 2 columns. From the above Post i should create 2 Index one have 3 columns and another one which has 2 columns ? Please advise

    Reply
  23. Aaron

    Forgot to mention that Table A has only Col1, Col2 and Col3 columns.

    Reply
  24. Gail (Post author)

    I suggest you post that as a forum question somewhere like SQLServerCentral. Without details of what then where clauses actually look like, that’s very hard to answer.

    See https://www.sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/ and https://www.sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/

    Reply
  25. laifukang

    Well, I was curious as to what multi-index advantages MS SQL Server had over MySQL… turns out none. Thanks for the analysis.

    Reply
  26. Unmesh

    I have a table which has around 50 columns. But there are only 6 columns on which we perform regular search. When I am trying to create a non clustered index on it I am getting the warning of 900 bytes. This would be a problem as we also do bulk update/inserts. So can onyone help me in finding a solution?

    If I put few columns out of those 6 in INCLUDE, will that cause any performance issue?

    Lets say the datatypes for the columns are –

    ColumnA (nvarchar(255),null)
    ColumnB (Int, null)
    ColumnC (nvarchar(255),null)
    ColumnD (Datetime,Null)
    ColumnE (nvarchar(255),null)

    NOTE: All these above columns are used in the same query to retrieve data. Please let me know an efficient way of doing it.

    Reply
  27. JK Wood

    @Unmesh:

    Do your nvarchar columns HAVE to be 255 characters? If you’re not using it, it’s a waste of space. Figure out a length that fits your data, then use it. That should make your NCI creation much easier – this is a data type problem, not an indexing problem.

    Reply
  28. Kerneels Roos

    Great post and nice comments, thanks!

    However, the single wide index would not be the best choice for a query where the predicates of the where clause are disjunctive (combined with OR) in contrast with to your example where they are conjunctive (AND), i.e.:

    WHERE col1 = @col1 OR col2 = @col2 OR col3 = @col3

    Intuitively one can see that the single wide index would not be of much use here, since finding the sub branch where col1 = @col1 means very little in terms of col2 and col3 (they may or may not match in the sub branch of the index tree, and might or might not match even where col1 != @col1).

    I suspect three seperate indexes would be a better choice here.

    – Kerneels

    Reply
    1. Gail (Post author)
  29. moojjoo

    In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query. The term is derived from a contraction of Search ARGument ABLE. Interesting to learn something new.

    Reply
  30. Sudhanshu

    very nice post Gail, good explained with example.

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.