Quantcast
Channel: Forum SQL Server Database Engine
Viewing all articles
Browse latest Browse all 15889

SQL server is lazy, how do I motivate it ?

$
0
0

I have a very large DB with fact tables with billons of entries.

My problem is that I can't get SQL server to make use the HW I have. Before anyone states the obvious, ofcourse I have looked for bottlenecks...But I have really high performance HW and have tried this on several configurations...

The DB is about 1 TB page compressed to it is not reasonable to cache all in RAM...

I typically see my requests "simmering" at 3-4 % and IO performance less than 50 MB/s... It is very annoying... it just sits there...

I do get the IO queue runs up to 100 or a 1000, so it is processing alot of IOPS, why ? I suspect that it is because it generates millions of SEEK operations.

But even when I remove all indexes it keeps producing this typical result.

I have one really high performance server with 256 GB ram, 30 + CPUs and 500 GB RAID 5 SSD's... I also have access to a SAN with other 10 GB/s capasity... I know it is capable to perform > 50 000 IOPs but...

A complete table scan of a a month (my partition size) of data (roughly 10- 15 GB)should that less than few seconds... so 12 months whould be computed in less then a minute.... 

A typical benchmark would take a minute or two to aggregate.

Its 3N normalized (star), date partitioned , logical and fysical file partitioned, Raid Striped etc. I think I've tried EVERY possible type of index scheme there is to handle this.

Queries are aggregations inner joined with reference table selections ( I use a tree structure so selecting the root would expand to be the entire reference table....) queries specify  3 -7  "dimensions" getting the same number of

The problem is that even the optimizers recomendations adding the pro posed non-clustered indexes typicaly LOWERS performance. Well the indexes seems to push SQL into seek mode rather than scan which actually seems to lower performance !

I managed to set up a partitioned multicolumn (all foreginkeys) non-clustered index (with all data columns included) on a indexedView.

I then gets an Index scans which produces 100% CPU load, and 300 MB/s IO reads when I hint the index and add noexpand, and I get quite resonable performance then.

I get this on a typical reference serach with say 5 - 10 million results. It completes in a few minutes with a Raid 5 5 disk HDD array on SQL server 2008 R2 IFF i hint the multicolumn Index !

HEY wait a minute that's MY OLD Server and my old HW...it's outperforming SSD's and 2012...!

If I let SQL do its own selections it starts to do index seeks instead...and I get 3% CPU load and I see the HDD array peak at a 4 - 5 MB/s with 5 K IOPS/s.

When I migrate this to SQL server 2012 and my high performance SSD array...it does nothing of the sort 3 % CPU load and 50 MB/s (yes the disks are faster...) 

* Columnstore indexes does NOTHING other than possibly lower performance...

* Clustered indexes have little or no impact.

Statistics appear completely useless, two keys are biased (recenst and a particular source 99%) but I only see impact of these in queries with seek operations but hey those are part of the problem...so...

I would really need a way to hint and force an index scan, every single time !

I understand that SQL was built for millions of requests to picking out needles in haystacks rather than processing and accumulating large ammount of data... 

What bothers me is that it is not even making an effort, it just sits there apparently idling...

I used to bring out the sub-results into a TEMP table then do my accumulations 4 times on that subset, but as that caused alot of disk writes (even with an SSD temp DB that turned out to be really slow). it turnes out that accumulating 3 times with full selection is faster.

Is there any apparent "switch" or mode that I have missed ?

Why does it work only on an Indexed View and not the source table ?

Something I'd like to try is to turn things upside down...

What if I index the fact columns instead (which are aggregated & grouped)... as smart algorithm would use a BIN technique one per value... with the selection columns as included columns.... 

More practical I would like to performance on this type of query...

I know this doesn't work...but I would like it to....

select Min(Value), Count(PKvalue) , NTILE(100) OVER ( order by Value ) as ntl

from FACTTable 

Inner Join ....

Where ....

Group by ntl

Since the NTILE is specific to every join and where clause I can't precompute it...

Doing this in 2 stages takes 4:ever... So I end up doing a frequency trick instead...

Select Count(Pkvalue), Min(value) .... group by Value

or variants of Floor(Value/10) or RUND(Value,3) depending on the semantics and accuracy...

It still generates about 100 000 k records for a single value, when i do vectors it just expodes...

Pulling out the data and doing it in code also not a great idea...

Do I have to switch to SSAS to get performance on this ?


Viewing all articles
Browse latest Browse all 15889

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>