Hi all,
We're trying to come up with a new partitioning strategy for our large fact tables as we are outgrowing our current approach. We're also very excited about using columnstore indexes (as it significantly improves our performance).
Here's what we have:
- We store sales data for various stores in a fact table. Obvious measures like quantity, C.O.G.S., price sold, etc. Obvious dimensions like store ID, product ID, territory ID, etc.
- Our fact table is approximately 1.5 billion rows large and growing at a rate of 15-20 million rows per day.
- Every day, we will receive a full data set (5+ years up until now) for between 50-100 new stores.
- The full data set we receive from stores must be immediately available for reporting upon ingestion.
- Every day, we receive live data throughout the day in trickles from the existing stores we have data for.
- Some of the live data we receive will be alterations to sales from previous days or even sales that are put through for previous days.
- We present real-time data to clients (has to be very low latency). Currently, we use a combination of MOLAP for the historical data and ROLAP for any data we receive today (i.e. bulk load of new stores and trickling of live data).
- We want to avoid significant regular outages where we merge/process data (system should be available 24/7).
Since columnstore indexes prevent updates on a table, we are trying to work out a partitioning scheme that will let us have historical data in one table with a columnstore index and new data in another.
The problem is, if we partition by date (which makes sense considering queries are generally on the most recent data), how do we get the data we receive live that day (bulk loads and live trickling data) into the columnstore table? We would effectively just have to bulk insert the data across. This poses a problem because we would effectively have to disable the columnstore index while we inserted and rebuild the entire thing (meaning reports can't be generated during that period).
If we were receiving only new data for a single day, we could switch a partition into the archived table; however, this is not the case.
We'd love to use columnstore indexes and take advantage of some better partitioning, but it seems like there's no highly available way to accommodate what we're trying to achieve.
Any thoughts or advice would be greatly appreciated!