I heard this one over at SSC a while back. “Avoid IF statements in stored procedures as they result in recompiles”
Ok, it sounds reasonable, if the optimiser optimises for the execution path taken on first execution it’ll have to go back and optimise the other paths when they are executed.
But is that really how it works? Now that I’ve spend some time looking at how the cache monitoring trace events behave, it’s possible to find out using those.
Let’s start with a simple example
CREATE TABLE Tbl1 ( ID INT ); CREATE TABLE Tbl2 ( ID VARCHAR(10), SomeDate DATETIME ); GO CREATE PROCEDURE TestingRecompiles (@SomeParam INT) AS IF (@SomeParam = 1) SELECT ID FROM Tbl1; ELSE SELECT SomeDate FROM Tbl2; GO
Simple enough. First execution will be with the parameter value of 1. I’m going to use Profiler to see what’s happening. Events traced are SP:CacheInsert, T-SQL:StmtRecompile and the XML Plan for query compile, so I can see exactly what plan was generated. I’m using the ‘For Query Compile’ event so that I can catch the plan at optimisation time, not at execution time.
With that trace running, I’m going to run the proc once with a value of 1 for the parameter. Important to note is that there is no cached plan for the procedure at this point.
I have two plan for query compile events, one cache insert for the procedure and a batch completed. I want to focus on the two compile events.
The first one shows a table scan on tbl1. The second shows another table scan, on tbl2. So it appears that both branches of the IF statement went through the optimiser and had plans created for them.
Before jumping to any conclusions, I want to see what’s sitting in the plan cache for this procedure.
One plan for that procedure and the plan shows a conditional (the if) and query operators for both branches.
So far it looks like the plan that was generated on the first execution contains plans for all of the statements and is sufficient for any execution, but to be sure, let’s see what happens when I run the query again with a different parameter.
No query compile, no cache insert, no recompile statement. So in this case there was no recompile from the IF.
Before drawing any conclusions, I want to check a more complex procedure, as the plans here were Trivial. Maybe full optimisation will show a different result.
USE Adventureworks GO CREATE PROCEDURE TestingRecompiles2 (@SomeParam INT) AS IF (@SomeParam = 1) SELECT p.[Title], p.[FirstName], p.[MiddleName], p.[LastName] FROM [HumanResources].[Employee] e INNER JOIN [Person].[Person] p ON p.[BusinessEntityID] = e.[BusinessEntityID] INNER JOIN [Person].[BusinessEntityAddress] bea ON bea.[BusinessEntityID] = e.[BusinessEntityID] INNER JOIN [Person].[Address] a ON a.[AddressID] = bea.[AddressID] ELSE SELECT s.[Name], a.[AddressLine1],a.[AddressLine2],a.[City],a.[PostalCode] FROM [Sales].[Store] s INNER JOIN [Person].[BusinessEntityAddress] bea ON bea.[BusinessEntityID] = s.[BusinessEntityID] INNER JOIN [Person].[Address] a ON a.[AddressID] = bea.[AddressID] GO
These queries are complex enough to go through full optimisation, but the profiler trace shows the same as before
Two showplan events prior to the completion of the first execution, one for each branch of the IF, one cache insert event and no showplan events, no recompile events when the second execution (with a different parameter value) began. (I’ll leave examining the plan in cache as an exercise to the reader)
One more, just to be thorough.
CREATE PROCEDURE TestingRecompiles3 AS IF EXISTS (SELECT 1 FROM tbl1 WHERE ID>0) SELECT ID FROM tbl1 ELSE SELECT ID FROM tbl2 GO INSERT INTO tbl1 (ID) -- Pre-populate with 1 row so that stats-based recompiles don't confuse the issue. VALUES (0) GO EXEC TestingRecompiles3 GO INSERT INTO tbl1 (ID) VALUES (1) GO EXEC TestingRecompiles3 GO
Much the same as we saw in the earlier tests. This time there are three showplan events, one for the subquery in the IF and one for each branch. Again there is one CacheInsert for the entire procedure and no recompile events on the second execution
So what is actually happening?
When the optimiser receives a batch or object to generate execution plans for it does not execute any portion of that code. That’s not its job, that’s what the Query Execution engine is there for. The optimiser’s job is to optimise each statement within the procedure or batch(1), to generate execution plans that are good enough and, in most cases, are reusable.
When it comes across an IF statement, the optimiser is not going to execute the expression to see which branch will be taken on this execution(2). It’s going to optimise both branches because it has no idea which one will be executed on this or subsequent executions.
For each statement within the procedure, the optimiser generates the best plan it can find based on the parameter values for the current execution. This is not necessarily ideal as it can end up optimising queries for parameter values that they will never be called with. As can be imagined, this can generate some not-so-optimal execution plans. I wrote about this problem some time back, so I’m not going to touch on it again
So, in conclusion… Do IF statements cause recompiles? No. The optimiser processes all branches of a conditional no matter which branch will be taken during execution, even if the branch can never be executed. Use of IF statements can however result in some really awful plans and make you sincerely wish they had caused a recompile. Use with caution (or recompile hints) if plans can be radically different depending on parameter values on the first execution.
(1) There are cases where the optimiser won’t optimise a statement within a procedure during the initial optimisation and will leave the statement to be optimised later (called deferred compile), but this is not due to IF statements. The one thing that immediately comes to mind that does cause this behaviour is when the statement depends on an object that does not exist at the point that the first optimisation is done, such as when a table is created within the procedure and then used within the same procedure.
(2) As proof that the optimiser doesn’t evaluate conditional expressions, consider the following:
IF (42/0 = 137) SELECT name, sys.objects.type_desc FROM sys.objects
If that is executed, it fails with a divide by zero error (as expected). Requesting the estimated execution plan succeeds and returns a valid execution plan and executing it with the profiler trace that I used above (adding in the user error event) shows the Showplan for query compile (as the select within the IF is optimised), then a cache insert for that plan, and then only is the divide by zero error thrown, just before the SQL:BatchCompleted event.