Or “Why are all my processes waiting on memory. There’s tonnes of memory”
It’s probably not new news that there was a fairly nasty memory-related bug in SQL 2005 RTM and SP1 that was related to the relaxing of limits on cache size. Specifically the TokenAndPermUserStore cache.
On systems with large amounts of memory (20GB+) and frequent ad-hoc queries or significant usage of dynamic SQL, the cache can grow quite large, and by quite large I’m talking upwards of 2GB. I think I saw the cache at close on 8GB at one time on one of my servers.
The problem with this is that is takes quite a bit of time to search through several GB of cache to find the required tokens. Making matters worse, access to that cache is synchronised, so only a single thread may have access at a time.
The main symptom of that problem is lots of CMEMTHREAD waits without an apparent wait resource and a higher than normal CPU usage.
But that problem was fixed in SP2 with a change to the caching behaviour. Right?
Well, not really.
While investigating memory-related waits on one of my servers, I ran into a few blog entries that seemed to suggest that the caching bug was not fixed in SP2, not completely anyway.
Troubleshooting Performance issues with TokenAndPermUserStore in SQL2005 x64
Wesley’s blog point to an entry in the list of fixed included in the 4th cumulative update that related to the token cache. Interesting that there was no KB article for it.
To sum things up. If you’ve got a SQL 2005 server with lots of memory, you’re running a build before 3179 and you’re seeing lots of CMEMTHREAD waits, you may be running into this bug. If flushing the cache (DBCC FREESYSTEMCACHE (‘TokenAndPermUserStore’)) makes the problems go away, then it’s almost certain. Get hold of the latest cumulative rollup and see if it helps.
p.s. There has been at least one report that the problem persists even in the latest build. If I find more on that, I’ll post it.