Monthly Archive: January 2009

A Bookmark lookup, by any other name…

I think I may have confused some people by talking about bookmark lookups. I’ll attempt to clarify things.

The operator that I’m talking about is the one that fetches extra columns from the clustered index when the nonclustered index that’s used to retrieve the rows doesn’t have all of the columns required.

In SQL 2000, that operator appeared in the execution plan as a bookmark lookup and it appeared as follows:

In SQL 2005, the name was changed, and the bookmark lookup no longer appeared in the execution plan. In it’s place appeared a clustered index seek, joined back to the original index seek by a nested loop join. It appeared as follows (the highlighted operator is the ‘bookmark lookup’)

This change made it harder to see what was going on as clustered index seeks are usually considered ‘good’. The trick to see if it is actually a ‘bookmark lookup’ is to look at the objects involved. When the nonclustered index seek and the clustered index seek are both on the same table, then it’s a ‘bookmark lookup’

I don’t recall what this appeared as when the base table was a heap, not a cluster.

In SQL 2005 SP2, the name of the operator was changed again, now appearing as a key lookup (when the base table has a clustered index) or a RID lookup (when the base table is a heap). It now looks like this:

The thing to note is that it’s not the version of the server that’s important. The format of the XML for the exec plan has not changed since SQL 2005 RTM (I can and have created a .sqlplan file from SQL 2008 and opened that file in SQL 2005’s management studio).

It’s the version of management studio that affects how the execution plans are displayed. If the server is SQL 2005 SP3, but the client tools are still RTM, the bookmark lookup will appear as a clustered index seek. Another reason to patch the client as well as the server

I hope that clears up some of the confusion around the naming. So, in future, what should I refer to this as? A bookmark lookup? A Key lookup?

Bad Advice

It’s no secret that I’ve been fairly active on a couple of SQL forums for a couple of years. In that time I’ve seen all sorts of behaviour on forums, good and bad. By this point, most of that just doesn’t bother me any longer. There is one thing however that still gets me angry when I see it. Blatently bad advice.

Now, I’m not talking about first-attempt solutions that solve half of the query problem, I’m not talking about honest mistakes and I’m not talking about attempted answers to questions so vague they’re near-impossible to understand. I’m talking about advice that’s so bad it’s dangerous. I’m talking about things like this:

Q: My transaction log’s very large. What can I do to fix this?
A: Stop SQL, find the ldf file, delete it and then start SQL

Q: Dropping a clustered index on a large table takes a long time using drop index. Is there a faster way?
A: Run sp_configure “allow updates, 1 and then run delete from sysindexes where id = OBJECT_ID(‘MyTableName’) AND indid = 1

Ouch!

There’s two main problems with bad advice.

Firstly, the person asking possibly doesn’t understand enought to realise the advice is risky, and if they follow it without testing they could end up in a much worse situation than they were. Dependiong on the circumstances they may end up in trouble with their boss, they may even end up getting fired. Whether they realise the advice is bad before or after trying it out, it’ll erode their faith both in the forum and in the person who provided the information. That leads to the the second problem.

The second problem is damaged credibility and reputation. A good reputation is so hard to get in this industry and so easy to lose. Bad advice damages the forum’s reputation and the reputation of the person posting the advice. Brent Ozar puts it very well: “Being an expert means having credibility. It doesn’t matter how much you know if people don’t trust your answers.”

Then, of course, some other person on the forum has to come along and correct the bad advice and explain why it’s bad so that people who find the thread via google won’t think it’s useful

Bottom line, bad advice helps no one and hurts a number of people.

</soapbox>

Index columns, selectivity and equality predicates

Or “Which column goes first?

There’s a common piece of advice given about columns in an index key that says that the most selective column should go first. I’m not going to say that’s incorrect, because it’s not. The problem is that it’s often given without any explanation as to why the most selective column should go first, nor are the other considerations for index key order mentioned.

This can lead to misunderstandings like, in the extreme case, where one person after hearing that advice went and added the primary key column as the leading column of every single nonclustered index (because it’s highly selective), and then wondered why his database performance decreased dramatically.

The comment about selectivity is because of the way SQL keeps statistics on indexes (see my post on statistics for more info on what they are). SQL only keeps the histogram for the first column of the index. That means that it only knows the actual distribution of values of the first column. If the first column is not selective, the index may not be used. However, that’s not the whole story.

SQL also, in addition to the histogram, keeps density values for all of the left-based subsets of the index keys. So, for a 3 column index key, SQL knows the density of the first column, of the first and second and of all three. The density is, in a nutshell, a value that shows how unique the set of columns is. It’s 1/(distinct values). The value can be seen for any index using DBCC Show_Statistics with the DENSITY_VECTOR option.

This means, while SQL only knows the actual data distribution of the first column, it does know, on average, how many rows will be returned by an equality match on any left-based subset of the index keys

So, what’s my rule for the order of columns in an index key? Put the most selective columns first, when all other considerations are equal.

(more…)

SQL Server usergroup – Jan meeting

The January meeting of the SA SQL user group will be on Tuesday 20th of Jan 2009 at 18h30 at the Microsoft Offices in Bryanston.

We’re still trying to finalise a speaker, there’ll be an update on that in a day or two. We definitely will have someone, that’s not in doubt.

Please let me know if you’ll be attending. You can RSVP by mail, by replying to this post, or by sending me a PM either from SADeveloper.net or SQLServerCentral.com (Username – GilaMonster)

Update:

The speaker we were hoping to line up isn’t available, so I’ll be presenting again.

I’ll be doing the session that I presented at the PASS conference in Seattle last year – The dirty dozen… 12 Ways to write badly performing T-SQL.

Seek or scan?

One very common question that I see on the forums is on index seeks and index scans. A query is resulting in a table/clustered index scan, even though there’s an index on one or more of the columns been searched on.

One of the more common reasons that this happens is because the index in question is not covering, and SQL has determined that the cost of doing the lookups to fetch the extra columns is higher than the cost of scanning the entire table.

If an index does not cover a query, then bookmark lookups are required to get the additional columns, bookmark lookups are run one row at a time, and are seeks on the clustered index. Hence it’s clear that bookmark lookups on a large number of rows are exceedingly expensive and that is why SQL will switch to a clustered index/table scan when lookups are required on a significant percentage of the rows in the table.

So, what constitutes a significant percentage of the rows in the table? 50%? 20%? 10%?

(more…)

Goals for 2009

So… Goals for the next 6 months.

  • Finish reading the IT books that I’m currently busy with and read at least one more
  • Watch a webcast each month.
  • Listen to a podcast each week. This is more difficult now, since I’m not commuting every day
  • Get up to speed with Sharepoint.
  • Write at least two SQL articles to be published on the web (like at SQL Server Central)
  • Write two certification exams. At least one of those will be SQL Server
  • Get my MCT certification.
  • Get the experiments for my Master’s thesis designed. This depends on whether I’m allowed to re-register this year, after the lack of progress last year.

And my goals for the local SQL Usergroup for the next 6 months:

  • Monthly meetings with at least four different speakers
  • A SQL 2008 launch event in March or April
  • A website for the usergroup
  • At least 15-20 regular members