When is a critical IO error not a critical IO error?

When it succeeds on the second try.

Anyone who’s done any research at all on database corruption and IO problems has probably seen an example of the 823 and 824 errors.

Error 823 is a physical IO error. It means that, for some reason, the OS was incapable of completing the requested IO. It may mean that the disk has failed, the disk is missing, a filter driver is misbehaving, or a number of other things. On SQL 2005 it looks something like this:

Error: 823, Severity: 24, State: 2.
The operating system returned error 21(The device is not ready.) to SQL Server during a read at offset 0x0000d2364e2000 in file ‘D:\Data\SomeDB.mdf’. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Error 823 is classified as a severity 24, the second highest severity error that SQL has. As the error message says, it’s a very serious condition.

Error 824 is a logical IO error. It means that the page requested was read from disk successfully, but there was something wrong with the page. The page header may have invalid values in it, the checksum may be incorrect, the torn page bits may be incorrect, etc. On SQL 2005, it looks something like this:

Msg 824, Level 24, State 2, Line 1.
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7532c420; actual: 0x320e4240). It occurred during a read of page (1:342) in database ID 15 at offset 0x00000020e24000 in file ‘D:\Data\SomeDB.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Error 824 is also classified a severity 24, which is appropriate, seeing as it’s saying that some pages within the data file are absolute garbage.

So far, all well and good. They’re severe errors with very high severities and will be picked up by agent alerts. But there’s another IO error that’s less well known and harder to notice, though in my opinion, no less severe.

Error 825 is called a read-retry error. What it means is that an IO was requested and either the IO failed or the resulting page was corrupt. SQL then tried a couple more times and on either the 2nd, 3rd or 4th try, the IO succeeded.

The error looks something like this:

Msg 825, Level 10, State 2, Line 1.
A read of the file ‘D:\Data\SomeDB.mdf’ at offset 0x00000020e24000 succeeded after failing 1 time(s) with error: incorrect checksum (expected: 0x7532c420; actual: 0x320e4240). Additional messages in the SQL Server error log and system event log may provide more detail. This error condition threatens database integrity and must be corrected. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

The big thing to note about this is that it’s a severity 10. An informational message. That means that agent alerts set for the higher severity errors will completely miss this, and intermittent IO errors is perhaps not something that DBAs want to be ignorant of. The read may have eventually succeeded this time, but there’s no guarantee that it will eventually succeed the next time it’s tried.

1 Comment

  1. mahendhran

    What could be the possible root cause for the msg below. Please advise. Thanks.
    causeMsg 823, Level 24, State 2, Line 2
    The operating system returned error 1117(failed to retrieve text for this error. Reason: 15105) to SQL Server during a read at offset
    0x000000003a2000 in file ‘C:\ProgramData\MicroEngine\xPortalNet v1\Server\Database\XPNDB.mdf’. Additional messages in the SQL Server
    error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity
    and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors;
    for more information, see SQL Server Books Online.

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.