SQL University: Advanced Indexing – Filtered Indexes

Welcome back to day 2 of Advanced Indexing. Today we’re going to look at a feature that was added in SQL Server 2008 – filtered indexes.

In versions previous, indexes were always on the entire table. An index would always have the same number of rows as the table it was built on did (which is why COUNT(*) can just scan the smallest index on the table)

With filtered indexes, it’s possible to have an index that’s built on a subset of the rows in the table. The definition for a filtered index contains a WHERE clause predicate that determines if a row in the table will be in the index or not.

This can be a major advantage on really large tables where most queries are only interested in a small fraction of the table. A normal index would be based on the entire table regardless of the fact that most of the table is of no interest, meaning the index would be larger than necessary, deeper than necessary and take up more space than would be ideal. With a filtered index on just the interesting portion of the table, the index size is kept to a minimum, meaning it’s shallower than an index on the entire table and hence more efficient.

A simple example of a filtered index would be

CREATE NONCLUSTERED INDEX idx_Example
ON Account (AccountNumber)
WHERE Active = 1;

There are two main uses of a filtered index:
1) Enforcing moderately complex uniqueness requirements
2) Supporting queries that filter on common subsets of a table

Filtered indexes and unique columns

One very interesting use of filtered indexes is in enforcing uniqueness over portions of a table. One requirement that come up again and again is to have a nullable column that must have unique entries in it, but whose entries are optional. Basically, the column must be unique or null. Sounds easy, but the problem is that a unique index allows only one null. So much for nulls not being equal to anything including other nulls.

Prior to SQL 2008 implementing such a constraint required computed columns, indexed views or triggers. With SQL 2008’s filtered indexes, it’s trivial.

CREATE UNIQUE NONCLUSTERED INDEX idx_SomeTable_SomeColumn
ON SomeTable (SomeColumn)
WHERE SomeColumn IS NOT NULL;

It has to be a unique index not a unique constraint as indexes can be filtered, constraints cannot.

This can be extended to various forms of moderately complex unique requirements and is certainly an improvement over using indexed views or complex calculated columns (or just trusting the application to do things right).

Supporting queries

The really interesting use of filtered indexes though is for supporting queries. Here filtered indexes are very useful in cases where a queries against a table frequently include a specific filter.

A couple common cases of this are table that flag rows as active or inactive and most queries are interested in only the active rows, or in a database design where deletes are logical (an IsDeleted column) and almost every query filters for rows not marked as deleted.

Let’s have a look at a couple examples here. I’m not using AdventureWorks because the database design doesn’t include these kinds of patterns. The table design is given at the end of the post and a SQLDataGenerator project file is attached.

First let’s look at a simple example. This table stores support calls, and this query is looking for all recent open calls logged to one of the support people.

SELECT CallID, LogDate, AssignedTo
FROM dbo.CallLog AS cl
WHERE CallStatus = 'Open'
AND AssignedTo = 42
AND LogDate > DATEADD(ww,-1,GETDATE());

Now, we could create a normal nonclustered index with CallStatus, AssignedTo and LogDate in the key, but let’s say that while the AssignedTo and LogDate filters change for this query, the filter is always, always, always for CallStatus = ‘Open’. This table has around 200 open calls and 100000 closed calls. Creating an index with the closed calls as well is just wasting space and time, no one’s interested. So, what I can do is this:

CREATE NONCLUSTERED INDEX idx_CallLog_AssignedToLogDate
ON dbo.CallLog (AssignedTo, LogDate)
WHERE CallStatus = 'Open';

FilteredIndex1

One thing to note here is that if the query filter exactly matches the index filter, the index doesn’t need to have that column as either a key or include column. It’s not being selected and the filter is entirely taken care of with the indexes filter.

It’s worth noting that there’s bug relating to this, specifically around filtered indexes with an IS NULL filter. See http://connect.microsoft.com/SQLServer/feedback/details/454744/filtered-index-not-used-and-key-lookup-with-no-output/ The bug is still unfixed in the latest CTP of SQL Server 2012.

Let’s have a look at a second example, where the query’s filter is a subset of the index’s filter.

Let’s say that a very frequent query is for the urgent or urgent and high priority calls (severity 1 and 2). So these are two common queries:

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity < 3 -- urgent and high
AND AssignedTo = 1;

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 1  -- urgent
AND AssignedTo = 1;

So, given that, I can create a filtered index on the larger of those ranges

CREATE NONCLUSTERED INDEX idx_CallLog_Severity
ON dbo.CallLog (AssignedTo,    Severity)
INCLUDE (CallStatus, LogDate)
WHERE Severity < 3

FilteredIndex2

In this case the filtered index can be used, but the column that’s being filtered on must also be in the index key, because the second query is filtering for a subset of the rows that the index contains.

An examination of the properties of the index seeks shows that for the first query there’s only one seek predicate – AssignedTo, whereas for the second query there are two seek predicates – AssignedTo and Severity

FilteredIndex3 FilteredIndex4

There are some limitations around filtered indexes and the matching of the filters. There are cases where a query and a filtered index have predicates that are logically equivalent, but where the filtered index can’t be used.

An example of this is not hard to generate. Let’s try a table that has an IsDeleted bit column (defined as not nullable)

CREATE TABLE Users (
UserID INT IDENTITY PRIMARY KEY,
UserName VARCHAR(50),
DepartmentID INT NOT NULL,
IsDeleted BIT NOT NULL DEFAULT 0
);

CREATE NONCLUSTERED INDEX idx_Users_DepartmentID
ON dbo.Users (DepartmentID)
INCLUDE (UserName)
WHERE IsDeleted = 0;

The IsDeleted column is a non-nullable bit column. Hence it can only have two possible values, 0 and 1. Hence, these two queries are completely equivalent in their results

SELECT UserName, DepartmentID
FROM dbo.Users
WHERE IsDeleted = 0 AND DepartmentID = 3;

SELECT UserName, DepartmentID
FROM dbo.Users
WHERE IsDeleted != 1 AND DepartmentID = 3;

FilteredIndex7

The first one uses the filtered index, the second does not. The second one scans the clustered index because despite the filter essentially being the same as the index filter, it’s not the same and the query hence cannot use the filtered index.

Another limitation has to do with parametrisation. If the query is passed in a parametrised form, or is subject to simple or forced parametrisation, then by the time that the optimiser gets the query there may not be sufficient information to tell that a filtered index is usable or not.

If we imagine the case of the table with the IsDeleted column again, a query that has a filter IsDeleted = 0 is definitely capable of using a filtered index that has the predicate IsDeleted = 0, but if the query gets parametrised and arrives at the optimiser in the form IsDeleted = @p1, there’s no way that query can match the filtered index because the value of @p1 on future executions could be 0, 1, NULL or 42, and in any case other than 0, if the cached plan used the filtered index it would produce incorrect results.

We can see this by setting the database parametrisation to forced and re-running an earlier example.

With parametrisation simple and the index with a filter on severity < 3, these two queries produce different execution plans

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 1
AND AssignedTo = 1;

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 4
AND AssignedTo = 1;

FilteredIndex5

However if the database is set for forced parameterisation, then the query is only seen in its parametrised form, and both queries have the same plan, one that does not use the filtered index

ALTER DATABASE Testing SET PARAMETERIZATION FORCED;
GO

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 1
AND AssignedTo = 1;

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 4
AND AssignedTo = 1;

FilteredIndex6

Right, so, homework for today. Given the table design below and the following queries, design one filtered index that both queries can use effectively (a filtered index that doesn’t filter out any rows is not an acceptable answer). Assume those are fixed queries that are frequently run with exactly that structure and exactly those values.

CREATE TABLE CallLog (
CallID INT IDENTITY PRIMARY KEY,
CallStatus CHAR(6) NOT NULL,
LogDate DATETIME NOT NULL,
LastUpdateDate DATETIME,
Title VARCHAR(500),
Severity TINYINT,
AssignedTo INT,
UserID INT
);

SELECT CallID, CallStatus, AssignedTo
FROM dbo.CallLog
WHERE CallStatus = 'Open' AND LastUpdateDate IS NOT NULL AND Severity = 1;

SELECT CallID, LogDate, LastUpdateDate FROM dbo.CallLog
WHERE LastUpdateDate < DATEADD(dd,7,GETDATE()) AND Severity IN (1,2);

FilteredIndexes (SQL DataGenerator project)

Answers for Monday’s homework:

1) Yes, this can use an index for both filter and group by. Index key columns would be (TransactionType, ReferenceOrderID, ProductID), in that order, and index include columns would be (Quantity, ActualCost)

2) No, because of the inequality we can use an index to support filtering or aggregating but not both. So there will either be a plan with an index seek and a hash aggregate (or sort and stream aggregate) or a plan with an index scan and a stream aggregate, but there’s no way to get a seek and stream aggregate without a sort

5 Comments

  1. tobi

    I reported the bit column problem a while ago because it frequently comes up when using OR mappers. If it does, usually the only workaround is to fall back to manual SQL.

    http://connect.microsoft.com/SQLServer/feedback/details/690623/usage-of-filtered-index-on-bit-column-depends-on-exact-sql-expression-used-in-where-clause

    Please vote for it if you also think it is important to improve this.

    Reply
  2. Alex Feng

    Homework:
    CREATE NONCLUSTERED INDEX idx_
    ON dbo.CallLog(Severity, LastUpdateDate,CallStatus)
    INCLUDE (LogDate, AssignedTo)
    WHERE Severity < 3;

    Reply
  3. jermteam

    this is a great post! helped me a lot about handling null values while still making unique to the others that has values.

    Reply
  4. Pingback: Something for the Weekend - SQL Server Links 11/11/11

  5. Mike DiSibio

    “if the query filter exactly matches the index filter, the index doesn’t need to have that column as either a key or include column”

    BOL listed this rule as well and for the life of me I could not make heads or tails of what they were trying to say until I read your examples. Thank you for that! Pretty much only clear explanation I’ve found.

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.