PASS Summit Session Votes

I blogged a week ago about the sessions that I submitted for the PASS Summit this year. At the point that I scheduled the post I hadn’t seen the announcement from PASS, hence a second post on the same topic.

This year, you can vote for sessions that you want to see. How much influence this will have over the Program Committee isn’t defined anywhere, but anything is better than nothing.

Voting closes on the 20th May (Friday), so if you have any interest in what gets presented at PASS Summit this year, go over to the speaker preference survey and vote.

Oh, and to make things easier, if you want to vote for any of my sessions, they’re here

On Transactions, errors and rollbacks

Do errors encountered within a transaction result in a rollback?

It seems, at first, to be a simple question with an obvious answer. Transactions are supposed to be atomic, either the entire transaction completes or none of it completes.

Maybe too simple…

CREATE TABLE TestingTransactionRollbacks (
 ) ;
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
-- Fails. Cannot insert null into a non-null column
INSERT INTO TestingTransactionRollbacks (ID)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
-- fails. Duplicate key
INSERT INTO TestingTransactionRollbacks (ID)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
SELECT ID, SomeDate FROM TestingTransactionRollbacks
DROP TABLE TestingTransactionRollbacks

If a transaction rolled back at the first failure, that final select would return no rows. But it doesn’t, it returns 3 rows. The failure of the individual statements was ignored and the transaction completed and committed. If that had been an important business process, not a made-up example, that could have some nasty consequences for transactional consistency of data.

What’s really going on here? Aren’t transactions supposed to be atomic? Isn’t SQL supposed to roll them back if they don’t complete successfully?

Well, kinda.


Pass summit abstracts

I missed the Pass summit last November for a number of reasons, but I’m hoping I can attend this year. Given that, I submitted a number of abstracts for consideration.

The limits on number were as follows: Up to 4 abstracts for regular, spotlight or half-day sessions (ie main conference sessions) plus, providing some requirements are met, up to 2 abstracts for pre-con sessions.

I submitted five abstracts, four were ones I’ve been thinking about for some time, the last was a last minute surprise.

Bad plan! Sit!

Bad execution plans are the bane of database performance everywhere they crop up. But what is a bad execution plan? How do you identify one in your system and, once identified how do you go about fixing it?

In this top-rated session from the 24 Hours of PASS we’ll look at some things that make a plan ‘bad’, how you might detect such plans and various methods of fixing the problem, both immediately and long-term.

I wasn’t planning on submitting this one to be honest. It’s one I did for the 24 hours of Pass back in February, and I didn’t think it appropriate to redo at the Summit the same year. Some people at Pass disagreed with me on that. The session was one of the top 3 from 24 hours of Pass and as such got a guaranteed slot at Pass summit, and a chance for a Spotlight slot.

Is that a parameter I smell?

All too often a forum post on erratic query performance is met with a reply ‘Oh, it’s parameter sniffing. You can fix it with .’ The problem with that answer, even if it has identified the cause, is that it’s only part true. Parameter sniffing is not simply a problem that needs fixing, it’s an essential part of well-performing queries. In most cases.

Come to this session to learn what Parameter Sniffing really is and why it’s a good thing, most of the time. Learn how to identify the scenarios where it’s not good, why a feature that is supposed to improve query performance sometimes degrades it, and what your options are for resolving the problems when they do occur.

Performance improvements in 60 min or less. Guaranteed

The system is slow, users cry
It’s impacting our bottom line
In meetings the curses fly
The situation is far from fine

While up on the IT floor
The DBA tears his hair
He’ll soon be shown the door
If he cannot the performance repair,

It often seems, from looking at forum posts and client requests, that the steady-state of databases is ‘too slow’ and all too often the people who are tasked with resolving performance problems are overwhelmed by the shear scope of the problem, and aren’t really sure where to start.

In this demo-heavy session, we’ll look at a fictional company’s database and website and work through finding the worst offenders in terms of poor performance, identifying the causes of the problems and at least starting to get the queries to perform better and the users to stop phoning and complaining.

Yes, part of the abstract is written in verse. It’s something I was threatening to do since last year, and I decided that it shouldn’t harm my chances, much.

Dos and don’ts of database corruption

Database corruption is one of the worst things you can encounter as a DBA. It can result in downtime, data loss, and unhappy users. What’s scary about corruption is that it can strike out of the blue and with no warning. If maintenance is not being done regularly on the database it’s easy for corruption to go unnoticed until it’s too late to repair without losing data.
In this session we’ll look at

  • Easy maintenance operations you should be running right now to ensure the fastest possible identification and resolution of corruption
  • Best practices for handling a database that you suspect may be corrupted
  • Common actions that can worsen the problem.
  • Appropriate steps to take and methods of recovery

I did this session initially for Quest, for their Pain of the Week webcast back in February. I enjoyed doing it a lot, and it got some good feedback, so I decided to submit it for Summit as a contrast to my usual performance-related presentations.

All about Execution Plans

Last, but far from least…

Grant Fritchey (blog | twitter) and I submitted a join pre-con session on execution plans. This is an area Grant is a well-known expert on, having written a book on it, and we’re hoping that we’ll be given the chance to devote a full day to this topic.

The key to understanding how SQL Server is processing your queries is the execution plan.

This full day session focuses on the execution plan. We will start right at the beginning and talk about the compile process. We’ll also go over how, and more importantly, why, plans are stored in cache and how they are removed.

We’ll spend time exploring the key differences between actual and estimated plans, and why those descriptions are more than a little misleading. We’ll also show you assorted methods to obtain a query’s execution plan and what the differences and tradeoffs of each are.

A full day class on execution plans would not be complete without spending time learning to reading them. You’ll learn where to find useful information in execution plans, what the common operators are and how to decipher the sometimes cryptic messages the plans are sending to you. We’ll also debunk some myths surrounding query operators and execution plans.

All of this is meant to further your understanding of how queries work in order to improve the queries you’re responsible for. With this in mind, we’ll show how you can use execution plans to tune queries. All of the information presented will be taken from real world examples. We’ll build on the information through the day so that at the end, after following us through multiple examples at your own computer, you’ll have a stronger understanding of how to read, interpret and actually use execution plans in your day-to-day job.

Sql Saturday #83

imageFrikkie Bosch opened the day with a short keynote giving an overview of Denali, talking mostly from a marketing point of view about the major features coming in the next version.

The virtual sessions were very popular, we don’t get many good international speakers down in our part of the world and the opportunity to listen to some new speakers was well received.

Benjamin kicked off the virtual sessions with his ’10 query optimiser topics for better performance’. He covered parameter sniffing, showing how the optimiser estimates row counts based on statistics and the sniffed value of parameters, cardinality estimation errors (poor stats or missing statistics) and scalar expressions (hard for the optimiser to guess rows affected). Also discussed were the missing index DMVs and DTA (the database tuning advisor)

My session on poorly performing SQL went very well. It’s a session I’ve done several times and enjoy thoroughly every time. The audience was great, lots of good questions, good comments and good fun. I ran a bit short on time to discuss the fixes for the multiple execution path problem, so I’ll just direct anyone reading to a blog post on it:

Big thanks to all the organisers, all the speakers, all the attendees. Special thanks to the international speakers who either stayed up late or woke early to give their sessions.



SQLSat83-3 SQLSat83-4

SQLSat83-5 SQLSat83-6





Indexing for ORs

All of the indexing strategy posts I’ve written in the past have been concerned with predicates combined with ANDs. That’s only one half of the possibilities though. There’s the case of predicates combines with ORs, and the guidelines for indexing that work well with ANDs don’t work with ORs

When dealing with predicates combined with AND, the predicates are cumulative, each one operates to further reduce the resultset.

For this reason, multi-column indexes support multiple predicates combined with AND operators.

If we look at a quick example, consider the following.

CREATE TABLE Customers (
  Surname VARCHAR(30) NOT NULL,
  FirstName VARCHAR(30),
  Title VARCHAR(5),
  CustomerType CHAR(1) NOT NULL,
CREATE INDEX idx_Customers_SurnameFirstName ON Customers (Surname, FirstName);

Again I’m going to be lazy and get SQLDataGenerator to generate a few rows.

With that two column index on those columns and a query that looks for Surname = ‘Kelley’ AND Name = ‘Rick’, SQL can do a double column seek to go directly to the start of the range then just read down the index to the end of the range, basically until it finds the first row that it’s not interested in.

So how does that that differ when the operator is an OR?

The main difference is that with an OR, the predicates are independent. The second doesn’t serve to reduce the recordset, but rather to expand it. It’s similar to evaluating two separate predicates and combining the result. Let’s have a look at that 2 column index again when the two predicates are combined with an OR.

  FROM Customers
  WHERE Surname = 'Kelley' OR FirstName = 'Rick';

If we try to use that index to evaluate Surname = ‘Kelley’ OR Name = ‘Rick’, there’s a problem. While the first of those predicates can be evaluated with a seek (it’s a sargable predicate on the left-most column of an index), the second predicate cannot. It’s sargable, but it is on the second column of the index (and for the moment let’s assume there are no other indexes on the table). Seeks are only possible if the predicate filters on a left-based subset of the index key.

Hence to evaluate that predicate SQL will have to do an index scan. Since it has to do a scan to evaluate the one predicate, it won’t bother also doing a seek to evaluate the first predicate as it can also evaluate that during the scan.

Hence, in this case, the query will execute with a single index scan.


So how do we get this query to rather seek?


Meme Monday: I Got 99 SQL Problems And the Disk Ain’t One

From Tom LaRock’s latest idea

For a change I’m not going to mention performance (well, not more than once anyway) and I’m not going to preach backups or recoverability. Nor am I going to talk about technical issues like many others are doing.

Instead I’m going to briefly mention several (well, 9) problems that I’ve seen a lot in last year or so. These certainly aren’t specific to databases, but I’ve been seeing them with regards to databases, so…

Poor/missing development environment

A development environment that’s months out of date with production, has different databases from production (eg has databases that in the production environment are on different servers) will cause problems. Either code will never work in dev (eg missing linked servers) and hence can’t be properly tested or works fine in dev and fails in production.

Either way, the appropriate response from the senior architect when told that the development environment is months out of date is not "Well, yes. I could have told you that. So what?"

Lax security

When everyone, from developers to help desk to business analysis has full sysadmin access, things are going to go wrong, sooner or later. Probably sooner. From new databases that no one knows anything about, to missing databases, altered settings that break backups or other maintenance, code changes that ‘no one’ made, missing data, etc.

This kind of setup is a bloody pain to fix; without strong support from management it’s near-impossible to fix. Far better to start correctly if possible

Lack of change control

There must be some control over what changes are made, by who, to what and when. Developers should not be just deploying their changes whenever they feel like it with no regard to what other people are doing, whether the system is in use or not, or who is inconvenienced.

Likewise there needs to be some record of what was done, when and why. If no one knows what is being done then when something breaks (and it’s when, not if) there will be no way to tell where to start looking.



To TOP or not to TOP an EXISTS

Earlier this year I had a look at a query pattern that I often see on forums and in production code, that of the Distinct within an IN subquery. Today I’m going to look at a similar patters, that being the use of TOP 1 within an EXISTS subquery.

Three tests. First a straightforward exists with no correlation (no where clause linking it to an outer query). Second, an exists with a complex query (one with a non-sargable where clause and a group by and having). Third an exists subquery correlated to the outer query.

Table structures are nice and simple, in fact, for ease I’m going to use the same tables as I did back on the exists, in and inner join tests. Code to create and populate the tables it attached to the end of the post.

First up, a simple exists query, in an IF, just to be different.

IF EXISTS (SELECT 1 FROM PrimaryTable_Medium)
PRINT 'Exists'

IF EXISTS (SELECT TOP (1) 1 FROM PrimaryTable_Medium)
PRINT 'Exists too'

For a benchmark, a SELECT 1 FROM PrimaryTable_Medium has the following IO characteristics

Table ‘PrimaryTable_Medium’. Scan count 1, logical reads 89, physical reads 0.

SQL Server Execution Times:
CPU time = 15 ms,  elapsed time = 510 ms.

Ignore the elapsed time, that’s likely mostly from displaying the records. I’m going to focus mostly on the CPU and IO.

Execution plans of the two exists variations are absolutely identical.


The index operators are scans because there is no way they could be anything else, there’s no predicate so a seek is not possible. That said, it’s not a full index scan. The properties of the Index Scan show 1 row only (actual and estimated). So SQL did not read the entire index, just enough to evaluate the EXISTS, and that’s what it did in both cases. IO stats confirm that.


Statistics, row estimations and the ascending date column

SQL’s auto-updating statistics go a fair way to making SQL Server a self-tuning database engine and in many cases they do a reasonably good job

However there’s one place where the statistics’ auto-update fails often and badly. That’s on the large table with an ascending column where the common queries are looking for the latest rows.

Let’s have a look at a common scenario.

We have a large table (imaginatively called ‘Transactions’) with a date time column (even more imaginatively called ‘TransactionDate’). This table gets about 80,000 new records a day and currently has around 8,000,000 records in it. So we can say roughly that another 1% is added to the table size daily. No records are updated and there’s a monthly purge of old data so that the total size remains about the same. A reasonably common real-life scenario.

AccountNumber CHAR(8),
AccountType CHAR(2),
AccountHolder VARCHAR(50),
Filler CHAR(50) -- simulating other columns

CREATE TABLE Transactions (
TransactionType CHAR(2),
Amount NUMERIC(18,6),
Filler CHAR(150) -- Simulating other columns
CREATE CLUSTERED INDEX idx_Transactions_TransactionDate
ON Transactions (TransactionDate)

ON Transactions (AccountID)

ON Accounts (AccountType)

-- Using RedGate's SQLDataGenerator to generate some data for this.

Accounts Transactions

Day 1 of the month, the indexes have just been rebuilt (after the data purge) and the statistics associated with those have been updated. The latest value in the TransactionDate column is ‘2011/01/31’ and the last value in the statistics histogram is ‘2011/01/31’. Life is good.

Day 2 of the month, there have been 80,000 new records added for the previous day. Only 1% of the table has been updated, so the automatic statistics update would not have triggered. The latest value in the TransactionDate column is ‘2011/02/01’ and the last value in the statistics histogram is ‘2011/01/31’. Doesn’t look like a problem.

Fast forwards another couple of days. Day 5 of the month. By this point 300,000 rows have been added since the beginning of the month. This amounts to around 5% of the table. Hence the statistics auto-update (triggered at 20%) still would not have run. The latest value in the TransactionDate column is ‘2011/02/04’ and the last value in the statistics histogram is ‘2011/01/31’. Starting to look less than ideal.

So, what kind of effect does this have on the queries against that table? (more…)

Q & A from 24 Hours of PASS session

Earlier this week I took part in the 24 Hours of PASS live webcast event. There were far more questions at the end of the presentation than what I could answer online, so the answers to the rest are given here. (I have edited some of the questions for spelling, grammar and readability). My slide deck and demos I was using are linked at the end. I’m not including a backup of the DB, as it’s 60MB. The schema is there, as is a SQL Data Generator project.

Q: Is the problem here that addition of checking if the variable is null? (This was related to catch-all queries. )

No, not specifically. The problem is the query pattern of <column> = <parameter> OR <parameter> = <constant> . It’s not specific to the case where that constant is NULL, though that is the more common form that I’ve seen.

Q: What are the good alternatives for catch-all queries?

If you are on SQL 2008 SP2 or later you can add OPTION(RECOMPILE) to the catch all query and get a plan that is optimal for each particular set of parameters.

On SQL 2005 and earlier, even with a recompile you would not get an optimal plan. This is because the optimizer was required to create a plan that was safe for reuse, even if it was never going to be reused.

On builds on SQL 2008 prior to SP1, the use of OPTION (RECOMPILE) worked, but there was a bug related to that which could produce inaccurate results, so use with caution.

On builds of SQL 2008 between SP1 and SP2 (or SP1 CU5), the behaviour of a catch-all with recompile was the same as for SQL 2005

If you’re using SQL 2005 (or horrors SQL 2000) or a build of SQL 2008 that did not have the recompile options working, you can use dynamic SQL (there are advantages to dynamic SQL even on the latest builds of SQL 2008). Use the parameters passed to build up a string that just filtered on the columns for which parameters were passed, then use sp_executesql to run the parameterised dynamic SQL statement.

Details are available on my blog –

Q: If you don’t have access to run the execution plan, but have to have DBAs run it for you, is there a good way to have it captured?

Absolutely. In SQL 2005 and above, the execution plan is an XML document. You can save the plan as a .sqlplan file. That file can then be copied, emailed, etc. You can open it in management studio and SSMS will display the graphical plan.

Or, you could download Plan Explorer and open the plan in that.


Full backups, the log chain and the COPY_ONLY option.

There have been a large number of posts made regarding the interaction between full database backups and the log chain (like this one). However there still seems to be a lot of confusion around regarding how they interact. I doubt I can clear up the confusion with yet another blog post, but I’ll try.

One that seems to come up now and again is the COPY_ONLY Option on backups. Typically the comment is something like ‘Use the COPY_ONLY option when taking ad-hoc full backups so that you don’t impact the log backups.’ Now we know from the blog posts linked above that full backups don’t ever break the log chain (and I’m not going to run yet more tests to prove it) so what is the copy only option there for?

Books Online states the following regarding the COPY_ONLY option for backups – “Specifies that the backup is a copy-only backup, which does not affect the normal sequence of backups. A copy-only backup is created independently of your regularly scheduled, conventional backups. A copy-only backup does not affect your overall backup and restore procedures for the database.”

Well, that doesn’t clear things up much. It does however go on to say this: “When used with BACKUP DATABASE, the COPY_ONLY option creates a full backup that cannot serve as a differential base. The differential bitmap is not updated, and differential backups behave as if the copy-only backup does not exist. Subsequent differential backups use the most recent conventional full backup as their base.”

So it’s not the log chain that copy only is there to not affect, it’s the differential base. Let’s test and see how it works.