Or “Shot gun query tuning”
There have been a fair few forums questions in recent months asking for help in removing index scans, loop joins, sorts or other, presumed, slow query operators. There have been just as many asking how to replace a subquery with a join or a join with a subquery or similar aspects of a query usually for performance reasons.
The first question that I have to ask when looking at requests like that is “Why?”
Why is removing a particular query operator the goal? Why is changing a where clause predicate the goal? If it’s to make the query faster, has the query been examined and has it been confirmed that query operator or predicate really is the problem?
The title of this post refers to a comment I’ve seen again and again in blogs or articles about front-end development. “Premature optimisation is the root of all evils.” It’s true in the database field as well.
While optimisation is very important in database development, trying to optimise queries without any idea where the problem with the query is, or even if the query is a problem at all is about as effective in fixing a database performance problem as using a shotgun from 100 meters is in killing mosquitoes. If you hit the problem, it’s by shear luck and nothing else.
There’s two sides to this problem.
The first aspect of this is, during development, spending time on optimising a query (or stored procedure) without any idea whether or not the query is inefficient and no idea whether or not the changes made make any improvement or not.
Firstly this is a waste of time that could be better spent developing other queries. Second it creates an incorrect impression that the queries have been optimised when in fact nothing of the sort has been done.
The second aspect when, with a production database that is performing badly, queries are modified almost at random in an attempt to fix the performance problem quickly.
This almost never works. It wastes time fixing stuff that very likely isn’t broken in the first place all the while the database performance deteriorates and management curses SQL Server as ‘nonscalable’
So, what is the right approach for the above two scenarios?
- Don’t optimise queries without knowing if they need it.
- Don’t optimise queries without knowing if they need it. 1
When writing queries and stored procedures they need to be tested against a representative data set on a server with representative workload and their performance characteristics evaluated to see if they are acceptable. If the query’s performance characteristics are acceptable, then that query requires no optimisation2
This doesn’t mean write bad code and push it to production. It means write good, solid code, following accepted coding standards, ensure that it runs acceptably against production-volumes of data, and do not spend hours or days trying to get it running a couple of milliseconds faster.
And if the query doesn’t perform acceptable, identify the problematic portion and fix that, don’t flail around rewriting bits of the query in the hope that the problem will magically go away.
The execution plan is the primary tool here, along with the output of Statistics IO.
Fixing existing code
When evaluating existing databases with know performance problems, limit the performance tuning to queries that really are performing badly and need optimisation. It’s often true that fixing the top 5-10 worst performing queries will have massive effects in overall system performance, far more than tuning twice that number of queries that aren’t really a problem.
The best tool for finding which queries really are the worst offenders is SQL Trace.
When looking at queries that are a problem, identify the portions that are inefficient and target attempts at optimisation towards those problems.
Optimise if necessary.
(1) No, that wasn’t a typo.
(2) At that time. Later changes to schema or data volume may require existing queries to be revised.
For more details on exactly how to identify problematic queries, refer to the series I wrote at Simple Talk last year.