All indexes are unique

Well, that’s a rather contentious title. There are probably several people shaking their heads at this point. Let me explain.

I was listening to a podcast with Kimberly Tripp this morning, and she mentioned this briefly. I thought it would be a good discussion to end a short series on indexes and selectivity.

The Clustered Index

A clustered index has to be unique, because the clustering key acts as the row’s location in the table. If the index is not defined as unique, SQL will make it unique by adding a uniquifier, a 4-byte integer that’s hidden behind the scenes and is added when necessary to make the clustered index unique.

It’s not documented anywhere clearly, but it is mentioned in a couple of places. From msdn:

If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

So all clustered indexes are unique.

The Nonclustered Index

A nonclustered index contains, in addition to the index key and any include columns, a pointer to the actual row. This is so that the row can be retrieved when other columns are needed for a query (A bookmark lookup)

When the table has a clustered index, this pointer is the clustered index key. When the table does not have a clustered index, the pointer is the RID, a combination of file ID, page ID and slot index (which gives the row’s logical position on the page). These pointers are not just stored at the leaf level of the index, they’re stored at the higher levels as well, something that a bit of poking with DBCC Page can easily verify. (Unless the nonclustered index is defined unique, in which case it’s just at the lead level)

As was proven above, the clustering key is unique. The RID, since it points to the row’s actual position, is also unique. There’s no way that two rows can be in the same place on a page.

Hence, since part of the nonclustered index is unique, the entire index has to be unique.

So all nonclustered indexes are also unique.

Q.E.D.

11 Comments

  1. Grant Fritchey

    That’s a really weird way to think about it, but you know what, it’s utterly true. Well done.

    Reply
  2. Gail

    The main intention was to make people think about index structure. So many questions on indexing are simple if the structure if the index is understood.

    Reply
  3. Pingback: Now syndicating Gail Shaw and Kendal Van Dyke | Brent Ozar - SQL Server DBA

  4. Pingback: SQL Server Uniqueifier Values and the Number of the Beast

  5. Henrico

    just came across this post, really opened my eyes to indexes (know it sounds very dramatic) 😛

    Reply
  6. Victor

    Hmmm…

    This has been documented in SQL Server 7.0 courseware.

    Reply
    1. Gail

      It’s in the 2005/2008 documentation as well, if one is willing to read through lots of different topics.

      Reply
  7. Rich Yarger

    Hello – have been recently reading your posts, and wanted to ask a very rookie-like question…When is it good to use a NonClustered Index on the Primary Key of a table, over a Clustered one? I am about to deploy a new setup, and went through to verify that the Primary Keys on all of my tables were in fact – Clustered. They weren’t, and when I asked why the only explanation I got was s shoulder shrug. These are programmers that put it together.

    So needless to say – I changed them all to Clustered Index Primary Keys. Can you think of a reason or time when it would be a good thing to go with a NonClustered Primary Key?

    Thanks and keep up the great work!

    Reply
  8. Gail (Post author)

    There are many times that it’s a good thing to have a nonclustered primary key. Main one is when there’s a better place for the clustered index (eg because of access path or data type(s) of the primary key)

    Unfortunately there is no simple, always applicable, straightforward answer here.

    Just make sure that if the PK is nonclustered, there is a clustered index somewhere else.

    Reply
  9. Richard

    Hi Gail,

    You say that in a non-unique, non-clustered, index the clustered index keys appear both in the intermediate & leaf levels. In your excellent Redgate session (https://www.youtube.com/watch?v=y_bl9dArtmA) you say the same thing, then say that you might explain why later on, and then (unless I missed it) don’t!

    Could you please explain why clustered index keys appear throughout a non-clustered, non-unique, index, but only appear in the leaf rows of a non-clustered, unique, index?

    Many thanks.

    Reply
    1. Gail (Post author)

      For deletes and updates. In a unique index, there’s never any need to read more than one leaf page to locate a specific row (assume that you’re updating the NC indexes after a delete/update, and hence have all the key values).
      In a non-unique, you might have the same key values on multiple pages, so adding the cluster key to the upper levels ensures minimal reads to locate rows in the NC index for update/delete

      Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.