All indexes are unique

Well, that’s a rather contentious title. There are probably several people shaking their heads at this point. Let me explain.

I was listening to a podcast with Kimberly Tripp this morning, and she mentioned this briefly. I thought it would be a good discussion to end a short series on indexes and selectivity.

The Clustered Index

A clustered index has to be unique, because the clustering key acts as the row’s location in the table. If the index is not defined as unique, SQL will make it unique by adding a uniquifier, a 4-byte integer that’s hidden behind the scenes and is added when necessary to make the clustered index unique.

It’s not documented anywhere clearly, but it is mentioned in a couple of places. From msdn:

If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

So all clustered indexes are unique.

The Nonclustered Index

A nonclustered index contains, in addition to the index key and any include columns, a pointer to the actual row. This is so that the row can be retrieved when other columns are needed for a query (A bookmark lookup)

When the table has a clustered index, this pointer is the clustered index key. When the table does not have a clustered index, the pointer is the RID, a combination of file ID, page ID and slot index (which gives the row’s logical position on the page). These pointers are not just stored at the leaf level of the index, they’re stored at the higher levels as well, something that a bit of poking with DBCC Page can easily verify. (Unless the nonclustered index is defined unique, in which case it’s just at the lead level)

As was proven above, the clustering key is unique. The RID, since it points to the row’s actual position, is also unique. There’s no way that two rows can be in the same place on a page.

Hence, since part of the nonclustered index is unique, the entire index has to be unique.

So all nonclustered indexes are also unique.

Q.E.D.

11 Comments

Grant Fritchey 10 February 2009 at 16:00

That’s a really weird way to think about it, but you know what, it’s utterly true. Well done.

Reply ↓
Gail 10 February 2009 at 18:37

The main intention was to make people think about index structure. So many questions on indexing are simple if the structure if the index is understood.

Reply ↓
Pingback: Now syndicating Gail Shaw and Kendal Van Dyke | Brent Ozar - SQL Server DBA
Pingback: SQL Server Uniqueifier Values and the Number of the Beast
Henrico 10 November 2009 at 14:59

just came across this post, really opened my eyes to indexes (know it sounds very dramatic) 😛

Reply ↓
Victor 18 December 2009 at 07:37

Hmmm…

This has been documented in SQL Server 7.0 courseware.

Reply ↓
1. Gail 18 December 2009 at 18:02
  
  It’s in the 2005/2008 documentation as well, if one is willing to read through lots of different topics.
  
  Reply ↓
Rich Yarger 21 January 2011 at 20:44

Hello – have been recently reading your posts, and wanted to ask a very rookie-like question…When is it good to use a NonClustered Index on the Primary Key of a table, over a Clustered one? I am about to deploy a new setup, and went through to verify that the Primary Keys on all of my tables were in fact – Clustered. They weren’t, and when I asked why the only explanation I got was s shoulder shrug. These are programmers that put it together.

So needless to say – I changed them all to Clustered Index Primary Keys. Can you think of a reason or time when it would be a good thing to go with a NonClustered Primary Key?

Thanks and keep up the great work!

Reply ↓
Gail (Post author)21 January 2011 at 23:18

There are many times that it’s a good thing to have a nonclustered primary key. Main one is when there’s a better place for the clustered index (eg because of access path or data type(s) of the primary key)

Unfortunately there is no simple, always applicable, straightforward answer here.

Just make sure that if the PK is nonclustered, there is a clustered index somewhere else.

Reply ↓
Richard 25 July 2019 at 17:42

Hi Gail,

You say that in a non-unique, non-clustered, index the clustered index keys appear both in the intermediate & leaf levels. In your excellent Redgate session (https://www.youtube.com/watch?v=y_bl9dArtmA) you say the same thing, then say that you might explain why later on, and then (unless I missed it) don’t!

Could you please explain why clustered index keys appear throughout a non-clustered, non-unique, index, but only appear in the leaf rows of a non-clustered, unique, index?

Many thanks.

Reply ↓
1. Gail (Post author)3 August 2019 at 17:53
  
  For deletes and updates. In a unique index, there’s never any need to read more than one leaf page to locate a specific row (assume that you’re updating the NC indexes after a delete/update, and hence have all the key values).
  In a non-unique, you might have the same key values on multiple pages, so adding the cluster key to the upper levels ensures minimal reads to locate rows in the NC index for update/delete
  
  Reply ↓

All indexes are unique

Share this:

11 Comments

Leave a Comment Cancel reply