Quantcast
Channel: Forum SQL Server Database Engine
Viewing all articles
Browse latest Browse all 15889

Error chain starting with write failure (error 655) during a DBCC CHECKDB caused the instance to restart with error 3449

$
0
0

Hello,

We experienced the following error chain (extracted from the SQL Server logs) that ultimately caused our instance to restart and fail over one of our primary AGs, causing us unexpected (albeit brief) downtime.  We've never experienced (or though it was possible that) a CHECKDB could shut down the instance - does anyone have any ideas?

From the error logs (all spid 408):

The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x0000034a90e000 in file 'redacted:\database.mdf:MSSQL_DBCC12'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Error: 3314, Severity: 17, State: 3. During undoing of a logged operation in database 'database', an error occurred at log record ID (742678:3728:21). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.

Error: 831, Severity: 20, State: 1. Unable to deallocate a kept page.

 

Error: 3449, Severity: 21, State: 1. SQL Server must shut down in order to recover a database (database ID 12). The database is either a user database that could not be shut down or a system database. Restart SQL Server. If the database fails to recover after another startup, repair or restore the database.

And about 2 minutes later, on spid 10... SQL Server was unable to close sessions and connections in a reasonable amount of time and is aborting "polite" shutdown.

It is worth noting we do not have a database with an id of 12 - I've assumed that is referring to the internal snapshot DBCC was using though I haven't been able to confirm/deny that yet.

Several databases had completed their CHECKDB without issue (including one large database which takes ~2 hours to complete).  The database which was being checked was under moderate load at the time, and was participating in an availability group as the primary replica, with synchronous commits going to a secondary.

We're running SQL2012 SP1 CU3 with hotfix KB2832017 (11.0.3350 x64 Enterprise).

When searching we found some references to the 'Kept Page' error though they all seemed to relate to 2005 and were fixed by a SP.  We also investigated the file system filter driver route (as per Jonathan's blog entry) but we don't have any filters present.

This is the first time we've had the issue after several weeks of very similar workload patterns during the CHECKDB, and we're out of ideas for investigation.

One thing we have decided to do is to reschedule the CHECKDB for the database to occur at a time when less updates are happening and pay the CPU price during business hours (when the database has almost no writes), which our servers have the capacity for.  We're hoping this prevents the situation happening but would like to gain a better understanding of what happened, and so far tests on development hardware (pushing write heavy workloads through a database while running a concurrent CHECKDB) haven't reproduced the issue, though we get IO bottlenecks on dev that don't exist in production.

Any thoughts, suggestions, or comments are welcome.

Kind Regards,

Tim

References:

Kept Page -www.sqlskills.com/blogs/paul/not-a-checkdb-error-unable-to-deallocate-a-kept-page/
Jonathan's blog entry - sqlblog.com/blogs/jonathan_kehayias/archive/2009/12/10/a-tale-of-checkdb-failures-cause-by-3rd-party-file-system-drivers.aspx

[Sorry for the lack of a hyperlink - account not yet verified]



Viewing all articles
Browse latest Browse all 15889

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>