Latest Posts

Revisiting catch-all queries

I originally wrote about catch-all queries early in 2009, just as something that I’d seen several times in client code. It turned into the 3rd most popular post ever on my blog.

A lot’s changed since 2009. When I wrote the original post, most production servers were SQL 2005 or SQL 2000. SQL 2008 had been out less than a year and its fix for catch-all queries, the RECOMPILE hint, didn’t even work properly (it had an incorrect results bug in RTM, was pulled in SP1 and fixed in SP2)

As such, my feelings on how to solve the problem with catch-all queries has changed over the years.

Before I get to solutions, let’s start with the root cause of the problem with catch-all queries – plan caching and the need for plans to be safe for reuse.

Let’s take a sample query. I’ll use the same one I used in the original post.

(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost
FROM Production.TransactionHistory
WHERE (ProductID = @Product Or @Product IS NULL)
AND (ReferenceOrderID = @OrderID OR @OrderID Is NULL)
AND (TransactionType = @TransactionType OR @TransactionType Is NULL)
AND (Quantity = @Qty Or @Qty is null)

There are two nonclustered indexes on the TransactionHistory table, one on ProductID, one on ReferenceOrderID and ReferenceLineID.

For the initial discussion, let’s just consider two of the clauses in the WHERE. I’ll leave the other two in the stored proc, but they won’t be used.

WHERE (ProductID = @Product Or @Product IS NULL)
AND (ReferenceOrderID = @OrderID OR @OrderID Is NULL)

We would expect, if the ProductID parameter is passed, to get a seek using the index on ProductID, if the ReferenceOrderID parameter is passed, to get a seek using the index on ReferenceOrderID, and if both are passed, then either an index intersection or a seek on one of the indexes, key lookup and secondary filter for the other, plus, in all cases, a key lookup to fetch the columns for the SELECT.

That’s not what we get (I cleared the plan cache before running each of these).



The expected indexes are used, but they’re used for scans not seeks. Why? Let’s just consider the second plan for a bit.

The index aren’t used for seeks, because plans must be safe for reuse. If a plan was generated with an index seek, seeking for ReferenceOrderID = @OrderID, and that plan was cached and reused later when @OrderID was NULL, we’d get incorrect results. ReferenceOrderID = NULL matches no records.

And so we have index scans with the full predicate (ReferenceOrderID = @OrderID OR @OrderID Is NULL) applied after the index is read.

This is not particularly efficient, as the properties on the index seek shows.


The entire index, all 113443 rows were read, to return a single row. Not ideal, but it’s far from the largest problem with this form of query.

The plan’s got an index scan on the index on ReferenceOrderID, and then a key lookup back to the clustered index. That key lookup has a secondary filter on it, (ProductID = @Product Or @Product IS NULL). The optimiser assumed that a small number of rows would be returned from the index seek on ReferenceOrderID (1.47 to be specific), and hence the key lookup would be cheap, but that’s not going to be the case if the plan is reused with a ProductID passed to it instead of a ReferenceOrderID.

Before we look at that, the performance characteristics for the procedure being called with the ReferenceOrderID parameter are:


The duration and CPU are both in microseconds, making this a very fast query, despite the index scan.

Now, without clearing the plan cache, I’m going to run the procedure with only the ProductID parameter passed.


CPU’s gone from an average of 8ms to around 120ms. Duration has gone from average around 6ms to about 125ms and reads have jumped from 271 (2 MB of data processed) to 340 597 (2.6 GB of data processed)

And this is for a table that has 113k records and a query that returned 4 rows.

The key lookup, which was fine when an OrderID was passed, is not fine when @OrderID is NULL and the index scan returns the entire table.



The plans that the optimiser has come up with for this query form aren’t stable. They’re safe for reuse, they have to be, but performance-wise they’re not stable.

But, maybe it’s just this form of query, there are other ways to write queries with multiple optional parameters.

Let’s try the CASE and COALESCE forms.

CREATE PROCEDURE SearchHistory_Coalesce
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost
FROM Production.TransactionHistory
WHERE ProductID = COALESCE(@Product, ProductID)
AND ReferenceOrderID = COALESCE(@OrderID, ReferenceOrderID)
AND TransactionType = COALESCE(@TransactionType, TransactionType)
AND Quantity = COALESCE(@Qty, Quantity)

(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost
FROM Production.TransactionHistory
WHERE ProductID = CASE WHEN @Product IS NULL THEN ProductID ELSE @Product END
AND ReferenceOrderID = CASE WHEN @OrderID IS NULL THEN ReferenceOrderID ELSE @OrderID END
AND TransactionType = CASE WHEN @TransactionType IS NULL THEN TransactionType ELSE @TransactionType END
AND Quantity = CASE WHEN @Qty IS NULL THEN Quantity ELSE @Qty END



These both give us full table scans, rather than the index scan/key lookup we saw earlier. That means their performance will be predictable and consistent no matter what parameter values are used. Consistently bad, but at least consistent.

It’s also worth noting that neither of these will return correct results if there are NULL values in the columns used in the WHERE clause (because NULL != NULL). Thanks to Hugo Kornelis (b | t) for pointing this out.

And then two more forms that were mentioned in comments on the original post, slightly more complicated:

CREATE PROCEDURE SearchHistory_Case2
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
SELECT  ProductID,
FROM    Production.TransactionHistory
WHEN @Product = ProductID THEN 1
END) = 1
WHEN @OrderID = ReferenceOrderID THEN 1
END) = 1
AND (CASE WHEN @TransactionType IS NULL THEN 1
WHEN @TransactionType = TransactionType THEN 1
END) = 1
WHEN @Qty = Quantity THEN 1
END) = 1

CREATE PROCEDURE SearchHistory_Complex
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
SELECT  ProductID,
FROM    Production.TransactionHistory
WHERE ((ProductID = @Product AND @Product IS NOT NULL) OR (@Product IS NULL))
AND ((ReferenceOrderID = @OrderID AND @OrderID IS NOT NULL) OR (@OrderID IS NULL))
AND ((TransactionType = @TransactionType AND @TransactionType IS NOT NULL) OR (@TransactionType IS NULL))
AND ((Quantity = @Qty AND @Qty IS NOT NULL) OR (@Qty IS NULL))

These two give the same execution plans as the first form we looked at, index scan and key lookup.

Performance-wise, we’re got two different categories of query. We’ve got some queries where the execution plan contains an index scan on one or other index on the table (depending on parameters passed) and a key lookup, and others where the execution plan contains a table scan (clustered index scan) no matter what parameters are passed.

But how do they perform? To test that, I’m going to start with an empty plan cache and run each query form 10 times with just the OrderID being passed and then 10 times with just the ProductID being passed, and aggregate the results.

Procedure Parameter CPU (ms) Duration (ms) Reads
SearchHistory OrderID 5.2 50 271
ProductID 123 173 340597
SearchHistory_Coalesce OrderID 7.8 43 805
ProductID 9.4 45 805
SearchHistory_Case OrderID 12.5 55 805
ProductID 7.8 60 804
SearchHistory_Case2 OrderID 10.5 48 272
ProductID 128 163 340597
SearchHistory_Complex OrderID 7.8 40 272
ProductID 127 173 340597


The query forms that had the clustered index scan in the plan have consistent performance. On large tables it will be consistently bad, it is a full table scan, but it will at least be consistent.

The query form that had the key lookup have erratic performance, no real surprise there, key lookups don’t scale well and looking up every single row in the table is going to hurt. And note that if I ran the queries in the reverse order on an empty plan cache, the queries with the ProductID passed would be fast and the queries with the OrderID would be slow.

So how do we fix this?

When I first wrote about this problem 7 years ago, I recommended using dynamic SQL and discussed the dynamic SQL solution in detail. The dynamic SQL solution still works very well, it’s not my preferred solution any longer however.

What is, is the RECOMPILE hint.

Yes, it does cause increased CPU usage due to the recompiles (and I know I’m likely to get called irresponsible and worse for recommending it), but in *most* cases that won’t be a huge problem. And if it is, use dynamic SQL.

I recommend considering the RECOMPILE hint first because it’s faster to implement and far easier to read. Dynamic SQL is harder to debug because of the lack of syntax highlighting and the increased complexity of the code. In the last 4 years, I’ve only had one case where I went for the dynamic SQL solution for a catch-all query, and that was on a server that was already high on CPU, with a query that ran many times a second.

From SQL 2008 SP2/SQL 2008 R2 onwards, the recompile hint relaxes the requirement that the generated plan be safe for reuse, since it’s never going to be reused. This firstly means that the plans generated for the queries can be the optimal forms, index seeks rather than index scans, and secondly will be optimal for the parameter values passed.


And performance-wise?


Reads down, duration down and CPU down even though we’re recompiling the plan on every execution (though this is quite a simple query, so we shouldn’t expect a lot of CPU to generate the plan).

How about the other forms, do they also improve with the RECOMPILE hint added? As I did before, I’m going to run each 10 times and aggregate the results, that after adding the RECOMPILE hint to each.

Procedure Parameter CPU (ms) Duration (ms) Reads
SearchHistory OrderID 0 1.3 28
ProductID 0 1.2 19
SearchHistory_Coalesce OrderID 6.2 1.2 28
ProductID 3.2 1.2 19
SearchHistory_Case OrderID 1.6 1.3 28
ProductID 0 1.2 19
SearchHistory_Case2 OrderID 7.8 15.6 232
ProductID 7.8 11.7 279
SearchHistory_Complex OrderID 1.5 1.4 28
ProductID 0 1.2 19


What can we conclude from that?

One thing we note is that the second form of case statement has a higher CPU, duration and reads than any other. If we look at the plan, it’s still running as an index scan/key lookup, despite the recompile hint.

The second thing is that the more complex forms perform much the same as the simpler forms, we don’t gain anything by adding more complex predicates to ‘guide’ the optimiser.

Third, the coalesce form might use slightly more CPU than the other forms, but I’d need to test a lot more to say that conclusively. The numbers we’ve got are small enough that there might well be measuring errors comparable to the number itself.

Hence, when this query form is needed, stick to the simpler forms of the query, avoid adding unnecessary predicates to ‘help’ the optimiser. Test the query with NULLs in the filtered columns, make sure it works as intended.

Consider the RECOMPILE hint first, over dynamic SQL, to make it perform well. If the query has long compile times or runs very frequently, then use dynamic SQL, but don’t automatically discount the recompile hint for fear of the overhead. In many cases it’s not that bad.

Homebuilt sequential columns

I gave my introductory session on transactions at all three of the South African SQL Saturdays in 2016, as well as at SQL Saturday Oregon in October 2017, and something that came up in most of them was the ‘manual sequence’, the idea of using a column in a table to store a max value and using that in place of an identity column or sequence object.

To be clear, I don’t think this is a good idea. The identity column works well if a sequential series of numbers are needed. If the number sequence needs to that spans tables, then the sequence object is a good replacement.

But, there will always be some requirements that insist on gap-less sequences, or insist on not using identity (probably for ‘compatibility’ reasons), so let’s see how to do it properly.

To start, the common attempt (taken from a random Stack Overflow answer)

SET @next = (SELECT (MAX(id) + 1) FROM Table1)

VALUES (@next)

or, a slightly different form

SELECT @next = SequenceNumber + 1 FROM Table1

SET SequenceNumber = @Next;

— Then use @Next in another table for an insert

This doesn’t work. Oh, to be sure it’ll work in testing, but once we get some heavy concurrent access, its flaws become apparent.

To test the first one, I’m going to use a table that just has an ID (supposed to be unique) and a second column to record which session_id did the insert

CREATE TABLE TestSequence (
SessionID INT

And then run this 100 times from 10 different sessions

SET @next = (SELECT (MAX(ManualID) + 1) FROM TestSequence)

INSERT INTO TestSequence
VALUES (@next, @@SPID)


And it doesn’t work because the select statement takes a shared lock. Shared locks are shared, and so multiple sessions can read the same max value from the table, then write back that same value+1 to the table, either generating duplicate rows or primary key/unique constraint violations (hopefully the latter)

So how do we fix it?

One option is to wrap the two statement in a transaction and add the UPDLOCK hint to the select. This ensures that no one else will be able to read the same max value from the table, but depending on indexes it could also cause some blocking and resultant slow queries.

Another way is to make the insert (or update) and the select a single atomic operation, by returning the inserted (or updated) value from the insert (or update) statement. We can use the OUTPUT clause for this.

Option one would have code similar to this:


SET @next = (SELECT (MAX (ManualID) + 1) FROM TestSequence WITH (TABLOCKX, HOLDLOCK));

INSERT  INTO TestSequence
VALUES  (@next, @@SPID);


And option 2 looks like

INSERT INTO TestSequence
OUTPUT inserted.ManualID

The locking hints are, unfortunately, necessary. I tried several variations with less restrictive hints and they either:
– Produced duplicates
– Deadlocked when the table was small
– Deadlocked all the time

None of which are desired, hence the use of an exclusive table lock to serialise access. Of course, the restrictive locks will make this slow under concurrent usage. An index on ManualID will help, a bit.

Now we can test both of those the same way we tested the first version. An easy way to see whether there are any duplicates is to check the count and the distinct count.


To reiterate something I said earlier, I do not recommend using this. Identity columns, with their gaps, are fine for the majority of cases, especially the use of them for artificial primary keys. Artificial keys, if used, are meaningless numbers that should not be exposed to users, and hence gaps should be irrelevant.

The need for a gap-less sequence, stored in the table, should be an exceptional one, not a common one.

Obsessing over query operator costs

A common problem when looking at execution plans is attributing too much meaning and value of the costs of operators.

The percentages shown for operators in query plans are based on costs generated by the query optimiser. They are not times, they are not CPU usage, they are not IO.

The bigger problem is that they can be completely incorrect.

Before digging into the why of incorrect percentages, let’s take a step back and look at why those costs exist.

The SQL query optimiser is a cost-based optimiser. It generates good plans by estimating costs for each query operator and then trying to minimise the total cost of the plan. These costs are based on estimated row counts and heuristics.

The costs we see in the query plan are these compilation time cost estimates. They’re compilation-time estimations, which means that they won’t change between one execution of a query using a particular plan and another query using the same plan, even if the parameter values are different, even if the row counts through the operators are different.

Since the estimations are partially based on those row counts, that means that any time the query runs with row counts different to what were estimated, the costs will be wrong.

Let’s look at a quick example of that.


There are no customers with an ID of 0, so the plan is generated with an estimation of one row being returned by the index seek, and one row looked up to the clustered index. Those are the only two operators that do any real work in that plan, and each is estimated to read and fetch just one row, so each gets an estimation of 50% of the cost of the entire query (0.0033 it be specific)

Run the same query with a different parameter value, plans are reused and so the costs are the same.


That parameter returns 28 rows, the index seek is probably much the same cost, because one row or 28 continuous rows aren’t that different in work needed. The key lookup is a different matter. It’s a single-row seek always, so to look up 28 rows it has to execute 28 times, and hence do 28 times the work. It’s definitely no longer 50% of the work of executing the query.

The costs still show 50%, because they were generated for the 0-row case and displayed here. They’re not run-time costs, they’re compile time, tied to the plan.

Another thing can make the cost estimations inaccurate, and that’s incorrect costing calculations by the optimiser. Scalar user-defined functions are the easiest example there.


The first query there, the one that’s apparently 15% of the cost of the batch, runs in 3.2 seconds. The second runs in 270 ms.

The optimiser gives scalar UDFs a very low cost (they have their own plans, with costs in them though) and so the costs for the rest of the query and batch are meaningless.

The costs in a plan may give some idea what’s going on, but they’re not always correct, and should not be obsessed over, especially not when the plan’s a simple one with only a couple of operators. After all, the cost percentages add to 100% (usually).

Books of 2017

Right, beginning of a new year, so time to look at what I read in the last year and what of it I can recommend.

Please ignore that it’s almost February. I’m going to pretend that the year starts with February, that way I don’t have to think about where January went.

First thing to mention is a change of tracking method. The blog plugin I was using to track my books doesn’t work under HTTPS. It gives odd errors when adding or updating books. I wasn’t in the mood for debugging php, instead the entire tracking of books and reading dates has been moved to GoodReads (with a few hiccups along the way)

That does make it easy to get the retrospective of 2017 at least:

77 books read, my goal was 75 and I was surprised that this time I made the goal (helped partially by a week vacation in December where I read 10 books)

There were a few of the year’s books that I consider standouts.

Caliban’s War, Abaddon’s Gate and Cibola Burns

The Expanse series is fantastic, and in these three books things go from bad (proto-molecule infested moon crashes into Venus) to worse, to weird, to really, really bad.

Complex characters, complex plot, very little black and white and every solution to a problem creates a few more problems along the way.

Ordination and StillBright

I’m a sucker for paladins. A friend pointed out a few years back that most of my D&D characters have paladin tendencies. Case in point, my current character is a 5e LG Fighter with 2 levels of Cleric.

These are the story about a battle-weary knight who, after being chosen by a forgotten goddess, strives to become a beacon of hope to the war-torn world

Stiger’s Tigers, The Tiger, The Tiger’s Fate

Roman-type empire in a fantasy world. Junior legion officer thrown to the wolves and manages to change the world in the process.

I like the ‘roman legions in a fantasy world’ idea and, other than this and Codex Alera, I haven’t seen good treatments of the idea.

Arcanum Unbounded

Short stories by Brandon Sanderson in the Cosmere shared universe.
Enough said.


For 2018 I’m setting the same goal, 75 books. I have piles of unread books all over the house, so I’m not short on material, just on time.

Hunting for the True Location, with Machine Learning

Some context first.

My company puts on a year end function every year. It’s at some resort or other, and the important thing for this post is that we’re not told the location in advance. We find out when we get there (by bus).

What we are told, about a month ahead of the event, is approximate distances from 3-4 locations. These are where the bus pickup sites are. The locations are:

  • Head Office
  • Near Clearwater Mall
  • Fourways
  • Centurion

The distances given aren’t correct. And, as a result, there’s usually several attempts by various people to figure out where the year end function will be in advance.

I thought I’d join in this year, using some machine learning on those distances.

Now, I should mention that this is a very poor use for ML. Mainly because of a lack of data. I should have hundreds of data points for a decent prediction. I have 2 or 3 data points, for 4 different locations. Still, it’s what I have to work with.

First, the starting data. The distances for this year are:

  • Clearwater mall: 63 KM
  • Centurion: 56 KM
  • Fourways: 43 KM
  • HQ: 20 KM
  • Cape Town: 1447 KM

I’m going to ignore Cape Town for training, as it only had a distance previously specified in 2015, and so I only have one piece of data.

Plotting this on a map makes it clear that the distances have been ‘massaged’ (I’m plotting ‘as the bird flies’, not driving distance for ease of plotting, I’ll use driving distances for the training)


Let’s look at previous years.


Actual location: Seasons Sport and Spa (pin on the map below)

Actual distances calculated with Google Maps, driving distance, shortest route.

  • Clearwater: Given distance – 80KM. Actual distance – 67KM
  • Centurion : Given distance – 88KM. Actual distance – 47KM
  • Fourways : Given distance – 68KM. Actual distance – 52KM
  • HQ: Given distance – 115KM. Actual distance – 75KM



Actual location: Vaal River Country Lodge.

Actual distances calculated with Google Maps, driving distance, shortest route.

  • Clearwater Mall: Given distance –  51KM. Actual distance – 79KM
  • Centurion: Given distance – 110KM. Actual distance – 118KM
  • Fourways: Given distance – 89KM. Actual distance – 97KM



Actual location: Askari Game Lodge.

Actual distances calculated with Google Maps, driving distance, shortest route.

  • Clearwater Mall: Given distance –  52KM. Actual distance – 79KM
  • HQ: Given distance – 90KM. Actual distance – 118KM


With that, I have the following training data:

Location Given Distance Error in Distance (Given – Actual)
Clearwater Mall 80 13
Clearwater Mall 51 -28
Clearwater Mall 52 -27
Centurion 88 41
Centurion 110 -8
Fourways 68 -6
Fourways 89 -8
HQ 115 40
HQ 90 -28

Now to stick those into a linear regression and see if I can predict the error on this year’s measurements.

I need to mention that with so little data, the accuracy of the linear regression is going to be very low. I’m as likely to get the correct results from linear regression as I am to get correct results from rolling a couple of d20s.

That said, onwards to untrustworthy results.

Once the starting values are loaded into R, creating a simple model is as easy as

m <- lm(Error ~ Location + Distance, data=YEF)

Then load up this year’s values into another data frame, and predict.

predict(m, YEFPredict)

The errors come out as:

  • Clearwater Mall: -12
  • Centurion: -18
  • Fourways: -35
  • HQ: -60

Giving final estimated distances (Given – Error) as

  • Clearwater Mall: 75KM
  • Centurion: 74KM
  • Fourways: 78KM
  • HQ: 80KM


Maybe I should have stuck to using dice.

Books of 2016

I set myself a reading goal of 75 books for last year, and managed 73. I’m not overly happy about that, there were months where I barely managed to read anything


The full list, with Amazon links is at the end of this post, I’ll mention a few of the standout books first.

Dust and Light, and its sequel Ash and Silver

A novel magic system, complex politics, a war, an ancient mystery and the main character is slap in the middle of all of them, and he doesn’t remember why.

An interesting theme in these is on memory and what we are if our memory is stripped away.

Halting State

Near-future Scotland. The book starts with a bank robbery, and the suspects are a bunch of orcs and a dragon. The robbery occurred in a persistent, online world, and the police are a little out of their depth. It gets more complicated from there.

Song for Arbonne

A beautifully written story of the land of Arbonne, land of troubadours and joglars and courtly love, worshipping a goddess and ruled by a Queen; and a land to the north where only the warrior god is worshipped and the king and high priest have sworn to conquer Arbonne.

Pandora’s Star

The first story of the Commonwealth saga, a futuristic society where space travel is almost unknown as wormholes link the worlds of the commonwealth together, and where people can live forever thanks to memory implants and rejuvenation techniques.

It all starts when an astronomer observes a star disappearing, enveloped in an instant by some form of Dyson sphere.

The Bands of Mourning

The last in the sequel series to Mistborn, we return to the world of Allomancy and mists. It’s hundreds of years after the end of “Hero of Ages”, the world is in an early Industrial Age.

This book completes the adventures of Wax and Wayne, started in Allow of Law and continued in Shadows of Self.

City of Stairs and its sequel City of Blades

Another completely different fantasy setting. For centuries the Divinities had ruled and protected the continent, their miracles feeding the people, protecting them, etc. Then on one day, the Divinities were killed and civilisation on the continent collapsed.

Almost 100 years later strange things with a divine feel to them are happening and must be investigated.

What If?

A book full of strange questions and well-researched answers, such as “What would happen if the Earth stopped spinning?” (Hint: Bad things would happen), or “What would happen if you tried to hit a baseball travelling at 90 percent of the speed of light?” (Hint: Bad things would happen).

It’s hilarious, it’s well-researched, it’s fantastic.

Full list:

Dust and Light: A Sanctuary Novel by Carol Berg
Skin Game: A Novel of the Dresden Files by Jim Butcher
Sacrifice (Star Wars: Legacy of the Force, Book 5) by Karen Traviss
Inferno (Star Wars: Legacy of the Force, Book 6) by Troy Denning
Fury (Star Wars: Legacy of the Force, Book 7) by Aaron Allston
Revelation (Star Wars: Legacy of the Force, Book 8) by Karen Traviss
The Republic of Thieves (Gentleman Bastards) by Scott Lynch
Ash and Silver: A Sanctuary Novel by Carol Berg
Arthur (The Pendragon Cycle, Book 3) by Stephen R. Lawhead
City of Stairs (The Divine Cities) by Robert Jackson Bennett
This May Go On Your Permanent Record by Kelly Swails
Words of Radiance: Part Two (The Stormlight Archive) by Brandon Sanderson
Rookie Privateer (Privateer Tales) (Volume 1) by Jamie McFarlane
Invincible (Star Wars: Legacy of the Force, Book 9) by Troy Denning
Shattered: The Iron Druid Chronicles by Kevin Hearne
White Tiger (Dark Heavens, Book 1) by Kylie Chan
Something More Than Night by Ian Tregillis
Crown of Renewal (Legend of Paksenarrion) by Elizabeth Moon
Halting State (Ace Science Fiction) by Charles Stross
The Crimson Campaign (The Powder Mage Trilogy) by Brian McClellan
The Long Way Down (Daniel Faust) (Volume 1) by Craig Schaefer
London Falling by Paul Cornell
Learning R by Richard Cotton
Song for Arbonne by Guy Gavriel Kay
Pendragon (The Pendragon Cycle, Book 4) by Stephen R. Lawhead
The First Casualty by Mike Moscoe
Dragons In The Stars (Star Rigger) by Jeffrey A. Carver
Girl on the Moon by Jack McDonald Burnett
Skinwalker (Jane Yellowrock, Book 1) by Faith Hunter
Terms of Enlistment (Frontlines) by Marko Kloos
Rath’s Deception (The Janus Group) (Volume 1) by Piers Platt
Valour by John Gwynne
Death from the Skies!: The Science Behind the End of the World by Philip Plait Ph.D.
The Fabric of the Cosmos: Space, Time, and the Texture of Reality by Brian Greene
Flex by Ferrett Steinmetz
Sword Coast Adventurer’s Guide by Wizards RPG Team
Lines of Departure (Frontlines) by Marko Kloos
The Dark Ability (Volume 1) by D.K. Holmberg
Shadows of Self: A Mistborn Novel by Brandon Sanderson
ATLAS (ATLAS Series) by Isaac Hooke
Pandora’s Star (The Commonwealth Saga) by Peter F. Hamilton
Interim Errantry: Three Tales of the Young Wizards by Diane Duane
Leviathan Wakes (The Expanse Book 1) by James S.A. Corey
Path of Destruction (Star Wars: Darth Bane, Book 1) by Drew Karpyshyn
Manifold: Time by Stephen Baxter
The Bands of Mourning: A Mistborn Novel by Brandon Sanderson
Physics of the Future: How Science Will Shape Human Destiny and Our Daily Lives by the Year 2100 by Michio Kaku
Ruin (The Faithful and the Fallen) by John Gwynne
Virtual Destruction: Craig Kreident (Craig Kreident Thrillers) (Volume 1) by Kevin J Anderson, Doug Beason
Before the Awakening (Star Wars) by Greg Rucka
The Weapon of a Jedi: A Luke Skywalker Adventure by Jason Fry
Parley (Privateer Tales) (Volume 3) by Jamie McFarlane
Calamity (The Reckoners) by Brandon Sanderson
Grail (The Pendragon Cycle, Book 5) by Stephen R. Lawhead
Into the Black (Odyssey One) by Evan Currie
Avalon:: The Return of King Arthur by Stephen R. Lawhead
Meeting Infinity by Gregory Benford, James S.A. Corey, Madeline Ashby, Aliette de Bodard, Kameron Hurley, John Barnes, S
Desert Rising by Kelley Grant
Deepsix by Jack McDevitt
The Steel Remains by Richard Morgan
Child of the Daystar (The Wings of War Book 1) by Bryce O’Connor
The Terran Privateer (Duchy of Terra) (Volume 1) by Glynn Stewart
Throne Of Jade by Naomi Novik
Wireless by Charles Stross
Outriders by Jay Posey
The Vorrh by Brian Catling
The Engines Of God by Jack McDevitt
Parallel Worlds: A Journey Through Creation, Higher Dimensions, and the Future of the Cosmos by Michio Kaku
What If?: Serious Scientific Answers to Absurd Hypothetical Questions by Randall Munroe
City of Blades (The Divine Cities) by Robert Jackson Bennett
To Hold the Bridge: Tales from the Old Kingdom and Beyond by Garath Nix
Footfall by Larry Niven, Jerry Pournelle
Brandon Sanderson’s White Sand Volume 1 by Brandon Sanderson, Rik Hoskin

What is a SARGable predicate?

‘SARGable’ is a weird term. It gets bandied around a lot when talking about indexes and whether queries can seek on indexes. The term’s an abbreviation, ‘SARG’ stands for Search ARGument, and it means that the predicate can be executed using an index seek.

Lovely. So a predicate must be SARGable to be able to use an index seek, and it must be able to use an index seek to be SARGable. A completely circular definition.

So what does it actually mean for a predicate to be SARGable? (and we’ll assume for this discussion that there are suitable indexes available)

The most general form for a predicate is <expression> <operator> <expression>. To be SARGable, a predicate must, on one side, have a column, not an expression on a column. So, <column> <operator> <expression>

WHERE Number = 42;


WHERE Number + 0 = 42;


WHERE Number = 42 + 0;


Any1 function on a column will prevent an index seek from happening, even if the function would not change the column’s value or the way the operator is applied, as seen in the above case. Zero added to an integer doesn’t change the value of the column, but is still sufficient to prevent an index seek operation from happening.

While I haven’t yet found any production code where the predicate is of the form ‘Column + 0’ = @Value’, I have seen many cases where there are less obvious cases of functions on columns that do nothing other than to prevent index seeks.

UPPER(Column) = UPPER(@Variable) in a case-insensitive database is one of them, RTRIM(COLUMN) = @Variable is another. SQL ignores trailing spaces when comparing strings.

The other requirement for a predicate to be SARGable, for SQL Server at least, is that the column and expression are of the same data type or, if the data types differ, such that the expression will be implicitly converted to the data type of the column.

WHERE StringColumn = 0;


WHERE StringColumn = ‘0’;


There are some exceptions here. Comparing a DATE column to a DATETIME value would normally implicitly convert the column to DATETIME (more precise data type), but that doesn’t cause index scans. Neither does comparing an ascii column to a unicode string, at least in some collations.

In generally though, conversions should be explicit and decided on by the developer, not left up to what SQL server decides.

What about operators?

The majority are fine. Equality, Inequality, IN (with a list of values), IS NULL all allow index usage. EXIST and IN with a subquery are treated like joins, which may or may not use indexes depending on the join type chosen.

LIKE is a slight special case. Predicates with LIKE are only SARGable if the wildcard is not at the start of the string.

SELECT 1 FROM SomeStrings


SELECT 1 FROM SomeStrings


There are blog posts that claim that adding NOT makes a predicate non-SARGable. In the general case that’s not true.

WHERE NOT Number > 100;


WHERE NOT Number <= 100;


WHERE NOT Number = 137;


These index seeks are returning most of the table, but there’s nothing in the definition of ‘SARGable’ that requires small portions of the table to be returned.

That’s mostly that for SARGable in SQL Server. It’s mostly about having no functions on the column and no implicit conversions of the column.

(1) An explicit CAST of a DATE column to DATETIME still leaves the predicate SARGable. This is an exception that’s been specifically coded into the optimiser.

I’m a Pluralsight author!

My first course, Identifying & Fixing Performance Issues Caused by Parameter Sniffing, was published two weeks ago. It won’t be the last.

Recording the course was a voyage of discovery. Until then, I’d only ever done live presentations, blog posts and articles. I initially thought that the recording would be similar to live presentations, but it’s nothing close.

For presentations, I do them off-the-cuff. Oh, I rehearse them beforehand, but I have no script, no speakers notes, no cue cards. My slides are as much for me, to direct what I’m talking about at each point, as they are for the audience. Two presentations using the same slide deck are not going to be the same.

I initially tried that method for recording, and it was a mess. Because the mistakes and half-sentences and corrections that are fine in a live presentation are not fine with a recording. The recording has to be near-perfect, all those little mistakes have to be edited out and re-recorded.

The first recording I did I had to record three times and there were still places wrong.

So I switched to scripting the entire thing, and then just ‘reading’ the script, taking care that it didn’t sound like reading a script. That works much better, but the time to write the script is huge. I speak at roughly 160 words/minute. A full page in Word, with default font and spacing, is around 600 words. 30 minutes of recording means around 8-9 pages of script.

The recording, I’ve found, is the least time-consuming part of the exercise (which is good, because it’s only quiet enough to record after 8PM, I live just off a busy road with a school a block away)

The editing is the most tedious. 20 minutes of finished video requires around 40 minutes or more of recording and probably 1.5-2 hours of editing, more if it was a demo.

The demos are still a problem and one where I need to figure out a good process. What I did for this course was to record the video of the demo, do a quick edit to take out mistakes, then record the voice over the top, then edit them together. While it works, it’s a monumental pain. A 15 minute demo took 3 hours of editing to put together.

For the next one I’ll try recording the demos, video and audio, in smaller chunks. Hopefully will make it easier to piece the audio and video together. The finished clips can easily be edited together at the end. Hopefully that’ll make the demos less of a pain.

Pass Summit 2016 abstract reviews

Following on Steve’s blog post on summit abstracts, I decided to publish mine.

My comments are not intended as an attack on the program committee, it’s an incredibly hard job that you couldn’t pay me to do (I’ve done similar work before and hated it). What I hope comes out of this, and the other posts which Steve reviewed, are that comments are more constructive and can be used to improve the abstract.

When the initial mails were sent out, I had one general session and one lightning talk accepted. Since then, I’ve been asked to also present one of the sessions that was listed as alternate.

So, without any editing, the abstracts and comments:

All about Indexes (half-day, level 300, declined)

Indexes are essential to good database performance, but it can be hard to decide what indexes to create and the &#39;rules&#39; around indexes often appear to be vague or downright contradictory.

In this session we&#39;ll dive deep into indexes, have a look at their architecture and internal structure and how that affects the way that indexes are used in query execution. We’ll look at why clustered indexes are recommended on almost all tables and how their architecture affects the choice of columns. We’ll look at nonclustered indexes; their architecture and how query design affects what indexes should be created to support various queries.


Abstract starts off really well, but last sentence has some structure/repetition issues. Level might be appropriate, but I wonder if in a 3 hour session you can cover enough detail to justify a 300 level.

Nice abstract and overall flow between title, abstract and goals.

Abstract seems well written and gives decent insight into session contents. It would be better to mention 2014 & 2016 features to make the session more attractive. The topic of indexes is somewhat overdone and may have some difficulty attracting attendees. Goals are decent but somewhat generic and could benefit from being more tangible. 25% demo on this topic seems a bit low to keep attendees engaged for a 3 hour session.

Need more meat in abstract. Are you going to talk about column, filtered or other index types? Only 25% demos, you have 3 hours???

Could cover more breadth in 3 hours. Filtered, sparse, columnstore etc.

I don’t see this for a half-day session. The goal are just too weak. And only 25% demo’s? You mean you are going to be talking for 3 hours and demoing for an hour? zzzzz. Need to flesh this abstract out a lot more.

Subjective: For a 3h long session, even at lvl300 , would expect a bit more demos to keep the attendees from nodding off…

Interesting combination. One wonders if I can cover enough material in 3 hours to make it a level 300 and others imply that there’s not enough material to spend 3 hours on.

Thing is, I’ve given this presentation twice before, as a 3 hour session. Pass Summit in Charlotte and SQLBits in London. It fits into 3 hours, providing there aren’t too many questions, that is. If I was going to add clustered and nonclustered columnstores and Hekaton’s range and hash indexes (and the interactions between them, eg non-clustered rowstore index on a clustered columnstore), then it would be a full day precon at least.

There are no 2014 or 2016 features mentioned, because the rowstore indexes didn’t change much in either version. Sure, I could call out the increased key size, but that’s not exactly abstract material. Maybe a footnote on one slide.

As for the topic being overdone and will have difficulty attracting attendees, there are only two sessions on rowstore indexes this year, a precon (Kendra Little) and a 100-level general session (Kathi Kellenberger), plus a few internals presentations and performance-related presentations that include indexing. Every time I’ve given an indexing presentation at a SQLSaturday, Pass Summit or SQLBits I’ve had a packed room. Sure, it isn’t shiny and new, but it’s probably relevant to most systems we design, develop and administer.

“You mean you are going to be talking for 3 hours and demoing for an hour? “ <snark_mode = on> No, I planned to do the middle hour as a mime act. </snark_mode>

Last time I checked, a 3 hour session meant talking for 3 hours. I’m sorry that indexing sounds so dry and boring that attendees are sure to fall asleep during it. I haven’t yet had anyone falling asleep in my indexing sessions. Maybe everyone who did, did so quietly and I didn’t notice. Maybe I should bring fireworks next time.

Go, Go, Query Store! (general session, level 200, accepted)

One of the hardest things to do in SQL is to identify the cause of a sudden degradation in performance. The DMVs don’t persist information over a restart of the instance and, unless there was already some query benchmarking (and there almost never is), answering the question of how the queries behaved last week needs a time machine. Up until now, that is. The addition of the Query Store to SQL Server 2016 makes identifying and resolving performance regressions a breeze.

In this session we’ll take a look at what the Query Store is and how it works, before diving into a scenario where overall performance suddenly degraded, and we’ll see why Query Store is the best new feature in SQL Server 2016, bar none.


Topic: Great topic. SQL Server 2016 and new shiny features like QS will definitely draw a crowd.
Abstract: Very well written with clear supportive goals.
Subjective: This sounds like a great session I would like to attend. The one issue I have is that it is listed as only 25% demo. Demos are the best way to who off QS functionality. 50% demo would be better.

I would like to attend this session

The outline seems to clearly describe the contents of the presentation. The topic and goals should be compelling to attendees. The target audience should be big enough to support this session. There appears to be a reasonable amount of live demonstrations in relation to the topic being presented.

Abstract: detailed, clear
topic: relevant and new, one of the more interesting feature of sql server 2016
Subjective rating: a good session

Abstract: well written, good topic
Topic: good topic, eye catching
Subjective: good session, would be interested in this

Nothing much to say here. The session will likely be closer to 50% demos, the original plan was 25%, but when I built the slides for SQL Saturday Iceland, I ended up putting more demos in and less slides.

Is that a parameter I smell? (general session, level 300, declined)

All too often a forum post on erratic query performance is met with a reply &quot;Oh, it&#39;s parameter sniffing. You can fix it with &lt;insert random solution here&gt;.&quot; The problem with answer, even if it has identified the cause, is that it’s only part true. Parameter sniffing is not simply a problem that needs fixing; it&#39;s an essential part of well-performing queries. In most cases.

Come to this session to learn what Parameter Sniffing really is and why it’s a good thing, most of the time. Learn how to identify the scenarios where it’s not good, why a feature that is supposed to improve query performance sometimes degrades it, and what your options are for resolving the problems when they do occur.


Abstract: Well presented, explains purpose of session.
Topic: catchy title, still timely and relevant.
Subjective: Nice mix of demo for this topic.

title not cute, if someone does not know about this they would not come Good abstract and good goals

Abstract: Grammar is a bit rough making the abstract difficult to read.

topic is decent . while it not “latest” or “hot”, it address a real life problem that developers can face. Abstract is a bit too generic and could benefit from more tangible details. Goals are okay but somewhat generic. demo % is decent.

Thanks for submitting a session that covers a really hot button topic in database queries! Looking forward to seeing it.

Again, fair enough, nothing much I can say about these comments.

On Transactions and Atomic Operations (general session, level 200, declined)

If there’s one thing that we, as SQL developers, do, it&#39;s not use enough transactions.

Transactions are critical when multiple changes need to be made entirely or not at all, but even given that it’s rare to see transactions used at all in most production code

In this session, we&#39;ll look at what transactions are and why we should use them. We&#39;ll explore the effects transactions have on locking and the transaction log. We&#39;ll investigate methods of handling errors and undoing data modifications, and we&#39;ll see why nested transactions are a lie.


Abstract: Grammar in first sentence is difficult to read. Too many commas.

good topic, need more meat in abstract and goals.

topic may not be “latest” but its a foundation subject and of relevance to most database developers. abstract clearly conveys what to expect from the session. goals are somewhat terse and could benefit from tangible details. demo % is a bit low.

Atomic Operations is in the title, but not used/defined in the abstract itself.

Abstract: Second sentence is a bit run-on and incoherent.
Topic: Fit for purpose, but likely a bit niche, as the author already seems to fears.
Subjective: Would have expected at least 50% demo – after all the easiest way to understand the need for TRANS is to see data buggered up by colliding updates.

Yes, I do have a tendency to torture my commas (they always confess at the end) and play with grammar. I will try to remember next year to write with simple, plain grammar. I do need to flesh this out a bit for next time it’s submitted anywhere.

I’m not sure what in the abstract conveys my fear that it’s a niche session, I don’t think it is. From working on various clients’ production systems over the last few years, I very seldom see explicit transactions. It’s something we don’t do enough, not a niche topic.

I’m also not sure that the reviewer meant by ‘colliding updates’. Updates, like all statements, are always part of a transaction, even if it is just one automatically started and committed. If two sessions were to try and update the same column, same rows at the same time to different values, the outcome will be as if one or the other had run, not a mixture of the two. And wrapping the update in an explicit transaction won’t change that behaviour

The Many Latencies of TempDB (general session, level 400, originally alternate, later accepted)

TempDB gets a bad rap when it comes to performance and scalability, and it’s all-too-often well deserved. A badly configured TempDB can have devastating effects on throughput. Combine that with poor queries and, well, you didn’t have any plans for the weekend, right?

In this session we’ll look at some common causes of TempDB contention, both query-based and configuration-based. We’ll look at guidelines for configuring TempDB and when and why you’d make various changes, and we’ll cover more ways to monitor TempDB than you can shake a stick at.

Now, about those weekend plans…


Abstract: Clear and well written abstract. No prerequisite listed.
Topic: Topic and goals seem like they would be appealing to attendees.
Subjective: Sounds like a good session that people would benefit from.

Abstract: interesting
Topic: relevant
Subjecttive rating: compelling, high level

Abstract: The outline and details of this abstract are well written
Topic: This is a good topic
Subjective: I may attend this session

The session prerequisites are helpful. A great topic and the Abstract is almost good. It is funny but not enough information about the session.

Session prerequisites: TBC?
Goal 1: Correct the “&#39;”
Objective: I would like to attend this session.

The pre-reqs here were a mistake on my part. Despite checking and re-checking the abstracts, that mistake slipped through (tbc = to be completed).

I would love to have corrected the “&#39;”, as well as the &quot; and &gt; that appeared scattered through the abstracts, but they appeared on submit and editing afterwards didn’t help. Something, somewhere in the website messed up the HTML encoding a bit.

Watch a query run. Run, query, run! (Lightning talk, level 100, declined)

Previously if you wanted to get any run-time statistics for a query, you had to include the actual execution plan and run the query to completion. No more! New in SQL 2016 is the live query statistics that let you watch the execution of a query, in real time, and see when the operators run and where the data flows.

In this lightning talk we’ll look at how the live query statistics works and discuss some scenarios where this will really help debugging strange query behaviour.


Perfect topic/abstract for Lightning Talk, sign me up!

interesting and current topic. well written abstract with good details of session contents. Goals are terse but clear.

Sounds like a good topic for a Lightning Talk.

Great topic and good intro to query store. Hopefully this talk will provide concise knowledge.
Topic is good, will interest a large amount of attendess.
Level is excellent – it’s a new feature so 100 level is great. I would have liked to see mention of query store in the title.

good topic

New topic on new technology. Small enough to be interesting

Interesting and great topic. Abstract and Goals talks to each other. New features is always a good place to grab peoples attention.

The comment on wanting query store to appear in the title is mystifying, because this session has nothing to do with query store. It’s showing off the Live Query Statistics (which, while nice, is too small to warrant anything more than a lightning session alone)

Why Temporal Tables and Stretch Database Are Best Friends (Lightning talk, level 200, accepted)

SQL Server 2016 introduced, among several other new features, Temporal tables and Stretch database.

From a distance, they appear not to have much to do with each other. Temporal tables allow you to query the table as it was at a point in time. Stretch allows for tables to span the earthed SQL Server and the cloud.

In this short session we’ll look at why these two features work spectacularly well together and why a stretched temporal table makes perfect sense.


Clear understanding of the objective, and as a “new” feature people will drawn to this.

Perfect Lightning Talk abstract, focused topic and something new with the latest version of SQL Server.

Sounds interesting!

interesting topic – new and relevant. Abstract is brief but gives good insight into session contents. Goals are terse, but clear and tangible. No demos may be off-putting to some attendees.

Earthed SQL is an interesting expression and certainly made me think.
There’s two new features here – is that too much for a 10 minute lightening talk?
If good connection is made early between the two then this could be good
Three goals seem to sum up the presentation well.

Try just one topic in 10 minutes or create a 75 or 3 hour session on new features of 2016

No real “pull” for a short session.

“No demos may be off-putting”. Um, what? This is a 10-minute lightning talk, I’m not demoing features in 10 minutes and I’d be very surprised if attendees expected demos in a 10 minute session.

I can’t take credit for the “Earthed SQL” expression, that one’s from Rimma’s Keynote presentation in 2014

SQL Server 2016 features: R services

One of the more interesting features in SQL 2016 is the integration of the R language.

For those who haven’t seen it before, R is a statistical and data analysis language. It’s been around for ages, and has become popular in recent years.

R looks something like this (and I make no promises that this is well-written R). Taken from a morse-code related challenge

MessageLetters <- str_split(Message, "")

MessageEncoded <- list(1:length(MessageLetters))

ListOfDots <- lapply(lapply(c(MaxCharacterLength:1), function(x)".", times = x)), function(x) str_c(x, collapse=''))
ListOfDashes <- lapply(lapply(c(MaxCharacterLength:1), function(x)"-", times = x)), function(x) str_c(x, collapse=''))

If you’re interested in learning R, I found the Learning R book to be very good.

SQL 2016 offers the ability to run R from a SQL Server session. It’s not that SQL suddenly understands R, it doesn’t. Instead it can call out to the R runtime, pass data to it and get data back

Installing the R components are very easy.


And there’s an extra licence to accept.


It’s worth noting that the pre-installed Azure gallery image for RC3 does not include the R services. Whether the RTM one will or not remains to be seen, but I’d suggest installing manually for now.

Once installed, it has to be enabled with sp_configure.

EXEC sp_configure 'external scripts enabled', 1

It’s not currently very intuitive to use. The current way R code is run is similar to dynamic SQL, with the same inherent difficulties in debugging.

EXEC sp_execute_external_script
  @language = N'R',
  @script = N'data(iris)
    OutputDataSet <- head(iris)'
  WITH RESULT SETS (([Sepal.Length] NUMERIC(4,2) NOT NULL, [Sepal.Width] NUMERIC(4,2) NOT NULL, [Petal.Length] NUMERIC(4,2) NOT NULL, [Petal.Width]  NUMERIC(4,2) NOT NULL, [Species] VARCHAR(30)));

It’s possible to pass data in as well, using a parameter named @input_data_1 (there’s no @input_data_2) and from what I can tell from the documentation @parameter1, which takes a comma-delimited list of values for parameters defined with @params. There’s no examples using these that I can find, so it’s a little unclear how they precisely work.

See and for more details.

It’s not fast. The above piece of T-SQL took ~4 seconds to execute. This is on an Azure A3 VM. Not a great machine admittedly, but the R code, which just returns the first 6 rows of a built-in data set, ran in under a second on my desktop. This is likely not something you’ll be doing as part of an OLTP process.

I hope this external_script method is temporary. It’s ugly, hard to troubleshoot, and it means I have to write my R somewhere else, probably R Studio, maybe Visual Studio, and move it over once tested and working. I’d much rather see something like

  WITH Language = 'R' -- or USQL or Python or …

Maybe in SQL Server 2020?