Hi all,
I have a system which is experiencing paging out of SQL memory to disk and subsequent slowdown while this happens.
The system runs SQL 2005 on an active/passive cluster (2 node) with 48GB RAM on each node - SQL’s ‘Max Server Memory’ property is set to 40GB.
The following is what I've managed to gather so far (I'm not a SQL expert! so please correct me if any of the assumptions below are wrong..)
Some SQL procedures work outside of the buffer pool memory, meaning the ‘Max Server Memory’ value is _NOT_ the total memory in use for the entire SQL process.
I think this is where it’s falling down, since 40GB was allocated as the value (with us thinking 8GB was left for Windows & other processes) but in fact SQL is using 40GB + some other value.
‘some other value’ equates to any pages > 8kb block size which are allocated by the multi page allocator, anything allocated from this comes from outside of the SQL buffer pool (in our case 40GB).
I have been digging in the SQL log from when the server paged out and dumped output from DBCC MemoryStatus to the log (other output snipped);
Memory Manager
VM Reserved = 66697516 KB
VM Committed = 48680308 KB <-- 46.4 GB RAM has been committed by SQL
AWE Allocated = 0 KB
Reserved Memory = 1024 KB
Reserved Memory In Use = 0 KB
MEMORYCLERK_SQLBUFFERPOOL (Total)
VM Reserved = 50413568 KB
VM Committed = 38655576 KB <-- 36.8GB RAM is committed for the Buffer Pool (from our 40GB value)
AWE Allocated = 0 KB
SM Reserved = 0 KB
SM Committed = 0 KB
SinglePage Allocator = 0 KB
MultiPage Allocator = 392 KB
CACHESTORE_OBJCP (Total)
VM Reserved = 0 KB
VM Committed = 0 KB
AWE Allocated = 0 KB
SM Reserved = 0 KB
SM Committed = 0 KB
SinglePage Allocator = 1071288 KB
MultiPage Allocator = 9749176 KB <-- 9.3GB RAM is in use by Object Plans – This seems to be what eats up all our remaining RAM and causes SQL to begin paging out (there are a few other multipage allocators consuming a tiny amount ~300mb total which brings us to our total memory consumption of 46.4GB)
Just before the system had paged out (around half an hour), by chance I'd been looking into the problem and had run;
Select * from master.dbo.syscacheobjects where pagesused > 1 order by pagesused desc
and DBCC MemoryStatus
CACHESTORE_OBJCP was using just over 2GB at that time.... and then over 9GB half an hour later, the first query above showed around 5 single use prepared plans that used up most of that 2GB, and a whole lot of other plans using hardly anything.
Unfortunately when the system pages out it becomes totally unresponsive, so I couldn’t run any queries when the system was in the process of dying.
It always seems to be CACHESTORE_OBJCP taking up the largest chunk of memory when the server runs into this problem, but after it becomes responsive, memory usage is low and I can't see what caused it.
I have lowered the ‘Max Server Memory’ value on the SQL instance to 35GB in the hope we can have a slightly smaller buffer pool and leave some more headroom for the SQL requirements outside of the buffer pool, but I'm not sure if this is the correct course of action.
I'd really appreciate some advice from some SQL gurus on how I can track down WHAT uses all the memory inside the CACHESTORE_OBJCP area, since when it is large, I invariably cannot run any queries against the server to figure out what the problem might be :(