Identifying inaccurate statistics

I wrote previously about statistics, what they’re needed for and I briefly mentioned what can happen when they’re inaccurate.

One question I’ve seen asked a few times is on how to identify stats that are inaccurate, what DMV to use. The bad new is that there is no DMV that identifies inaccurate statistics. The Stats_Date function can be used to see when the stats were last updated, but that doesn’t say if that was too long ago ort not. If a table is readonly, or is changed twice a year, statistics that are four months old are perfectly valid.

The rowmodcnt column in sysindexes can give an approximate count of the number of rows that have changed since the stats were last updated, although sysindexes is deprecated and will disappear in the next version of SQL and the rowmodcnt column is no longer completely accurate. The rowmodcnt however is just a count of changes. It gives no indication of whether that number is too high and causing a problem. In some cases 50% of the table may have changed before there’s a problem with queries. In other cases 5% is enough to cause problems.

The only certain way to identify statistics that sufficiently inaccurate as to cause problems is to see the effect on queries. The typical sympton is a query that runs well one day and terribly the next with no changes to schema and only normal changes to data. Identifying those cases requires that there’s a benchmark of how queries normally run and that it’s possible to identify abnormal performance.

If a stats update (sampled or with full scan) fixes the performance problem, then it confirms that the problem was indeed statistics.

A look at the execution plan of a badly-performing query can also identify statistics problems. If the estimated and actual rows affected by an index seek are very different (more than about 10%) then it can indicate that the statistics are old or inaccurate. I say can, because there are other reasons why the estimates are incorrect.

The truth is, whether stats are inaccurate or not depends more on the specific queries that are using the statistics than anything else. It’s even possible that two queries running against the table will be affected completely differently by old stats. One may run terribly and the other not be affected at all. It all depends on what the queries do and how the optimiser optimises them.

4 Comments

  1. Adam

    Hi, I just wanted to simply ask about other reasons why the estimates can vary from actual rows. I, for example, have often situation that there is estimate 1 row, and actual more than 30000, and in consequence there is nested loop. It seems that is, somewhat, optimal for this case, but I’m still concerned about this.
    And thanks for your articles, these are brilliant!

    Reply
  2. Gail (Post author)

    Well first you need to check whether that is correct or not (https://www.sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/)

    It could be the use of a table variable (estimates 1 row), it could be a multi-statement table-valued function for the same reason. Multiple predicates can also result in incorrect estimates as the optimiser assumes they are independent, when they may not be. Use of a function in a predicate. Lots of possible reasons

    Reply
  3. Kiran

    I have a doubt regarding Performance issue with select statement on a Varchar max type Value in the below query. This query is running fine for 6 hours and after that it became too slow. The actual and estimated number of rows in 1 and Row size is 4035B.Can you suggest me on y this is occuring?

    SELECT [VALUE] as Token FROM [Configuration] WHERE [Key]= ‘Token’

    Reply
    1. Gail (Post author)

      Please ask this on a SQL forum like SQLServerCentral or DBA Stack Exchange. It’s not related to this post, and cannot be debugged without a lot more information

      Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.