Search This Blog

Wednesday, July 5, 2023

Wait a minute, CLUSTERED COLUMNSTORE indexes are not ordered?

Wait a minute, CLUSTERED COLUMNSTORE indexes are not ordered?

Just to reiterate what's in the title of this post, what I mean is that the CLUSTERED COLUMNSTORE indexes are not ordered, while NONCLUSTERED COLUMNSTORE created indexes are on regular, ROWSTORE tables are. This should be evident from the fact that the  SQL Server does not even allow you to specify column list when creating a clustered columnstore index. (Correction: An option to create/rebuild an ordered CCI is available in SQL Server 2022).

If order of the data is important (and it is for query performance if that's the problem you want to fix or prevent from happening in future), its up to the data architect to figure out how to order the data and make sure that order is enforced by any processes that writes to the table.  Especially if that table is mostly (like close to 100% of the time) suppose to have data additions in right/intended order and NO DELETES or UPDATES or with much less frequency.

For existing clustered columnstore index, one strategy that I have employed to fix poorly ordered data to my needs, (without having to build or recreate tables and structures) :

  1. Drop the existing columnstore index.
  2. Create a clustered rowstore index with key columns that match the data order for the table.
  3. Create a new clustered columnstore index with the DROP_EXISTING=ON option.


For example Code:

-- Step 1
DROP INDEX CCI_my_table ON my_table;
-- Step 2
CREATE CLUSTERED INDEX CCI_my_table ON my_table ([TimeStamp]);
-- Step 3
CREATE CLUSTERED COLUMNSTORE INDEX CCI_my_table ON my_table WITH (DROP_EXISTING=ON, MAXDOP=1);

So what is the catch? Why shouldn't I make this part of my regular database maintenance task if I know I have or will have poorly ordered data in my clustered columnstore indexes?  Well, dropping a clustered columnstore index is not a quick operation. It requires converting the columnstore structure into a b-tree/heap structure, this requires more computing resources than dropping a clustered rowstore index.

If data in your clustered columnstore indexes are becoming unordered quickly, then I think you need a different solution. This method of fixing the data order on a columnstore index should be mainly reserved for less frequent cases/scenarios where fragmentation is too high, enough to negatively impact analytic processes.

Although much more complex, if the table is also partitioned, which is a technique often combined with columnstore indexes, you can use this method for specific partitions.