A couple weeks ago I presented to the DBA Fundamentals virtual chapter. The presentation was recorded and is available from their site.
While I answered some questions during the presentation, I couldn’t answer all of them. Hence this blog post with the rest of the questions and some answers.
Q1: Is monitoring any different in Azure SQL DB?
A1: Completely different. What I was talking about when the question was asked was perfmon counters and wait stats. Since you don’t have access to the server with the SQL DB, you can’t run perfmon. Even if you could, there’s unknown other workloads on the server which would make any such monitoring useless. Instead you can use the DMV
sys.dm_db_resource_stats, which gives you the resource consumption relative to the maximum allowed for the tier that you’re paying for. For more details, see https://azure.microsoft.com/en-us/blog/azure-sql-database-introduces-new-near-real-time-performance-metrics/
The wait stats can be monitored with the DMV sys.dm_db_wait_stats, instead of sys.dm_os_wait_stats that you’d use on an earthed SQL Server. See https://msdn.microsoft.com/en-us/library/dn269834.aspx
Q2: What interval should we use for perfmon and how long should it be run?
A2: Personally I’m happy using the 15 second default in most cases. Perfmon has minimal overhead and the files aren’t large. If I’m trying to pin down an intermittent issue I’ll reduce the time, but I’ll very rarely increase it.
When analysing a server, I want minimum a day and that’s bare minimum. A week is good, that way I can see trends over several days and not be caught out by any non-standard workloads on one day.
Q3: Use performance monitor or sys.dm_os_performance_counters
Perfmon. Running a job every 15 seconds is hard and only the SQL counters are available through the DMV, so I’ll just use a performance monitor counter trace and save out as a binary file.
Q4: Is high CXPacket a problem?
By itself, all CXPacket waits mean is that queries are running in parallel. To determine whether that’s a problem or not requires looking at the queries that are running in parallel and seeing whether they should be, or whether they should be serial.
Most cases I’ve seen recently with very high CXPacket waits and very high Access_Methods_Dataset_Parent latch waits have been a result of inefficient queries and poor indexing, not a problem with parallelism itself.
Q5: What should average PLE be?
The higher the better. It measures how long, on average, a page stays in cache. Lower numbers mean more churn of the buffer pool. There’s no one number where above is good and below is bad.