I've been chasing around performance issues on this new system for a few weeks now, and it's been frustrating. I wonder if this is why.
I isolated down to a semi-random little update statement:
update dbo.myhottable set anattribute = f.anattribute from dbo.myhottable t inner join dbo.thefulltable f on t.mypk = f.thepk;
This was taking 3 seconds to update 21k rows. That didn't seem right unless I fell through a time warp to 1994. Profiler said it was CPU limited, with about 34k logical reads. CPU limited, three seconds, 21k rows, integer PK to PK, say what? Even hot, all the data in buffers?
So I did what one never wants to do, I gave SQL a gentle hint: inner LOOP join.
That did the trick, took the 3.0 seconds down to about 0.3 seconds. Yes, it increased the logical reads to 120k, but so what? Also why, but so what?
So, what is the problem here? Looks like SQL optimized for resources AND NOT TIME. I've caught it doing this before, but this seems really pathological.
Here's the thing, thefulltable has 6m rows. The plan scans a 6m index containing thepk and anattribute and does a hash match. That's a pretty nasty ratio, scanning 6m rows to hash match to 21k. And it turns out to be a bad idea.
So, why is SQL Server going that way, and is there some magic word I can give it to bias it globally in my direction, CPU/duration?
--
I'm jumping to the conclusion here that I have a lot more of these going on, because some other tricks I usually use have not been succeeding.
I thought I just might have a difficult app here because the data is very lumpy and the statistics would give bad guidance, but in this case it's PK to PK so there should be no ambiguity.
SQL Server 2008 R2 64 bit standard 8gb max memory on 12gb server with 4 cores on Windows Server 2008 standard on VMWare, fwiw.
Josh
ps - I should mention we have maxdop=1 - which I never like to do, but on just four cores, ... hey maybe I did fall through a time warp to 2005 at least! The thing is, when I override that with an option maxdop=0 guess what, it uses a parallel plan that runs in about 1.5 seconds, IOW the single-thread plan is poor by any standards. So you ask - if I use both maxdop=0 *and* the LOOP hint, I get the best plan of all, but I probably don't want to be overriding the maxdop.