Columnstore indexes – In Details

Posted by

What is a columnstore index?

A columnstore index is a technology for storing, retrieving, and managing data by using a columnar data format, called a columnstore.

Key terms and concepts

Columnstore – physically stored in a column-wise data format.

Rowstore – physically stored in a row-wise data format. 

Rowgroup – A rowgroup is a group of rows that are compressed into columnstore format at the same time

Column segment – A column segment is a column of data from within the rowgroup.

  • Each rowgroup contains one column segment for every column in the table.
  • Each column segment is compressed together and stored on physical media.
  • There’s metadata with each segment to allow for fast elimination of segments without reading them.

Clustered columnstore index – A clustered columnstore index is the physical storage for the entire table.

To reduce fragmentation of the column segments and improve performance, the columnstore index might store some data temporarily into a clustered index called a deltastore and a B-tree list of IDs for deleted rows.

Delta rowgroup – A delta rowgroup is a clustered B-tree index that’s used only with columnstore indexes. It improves columnstore compression and performance by storing rows until the number of rows reaches a threshold (1,048,576 rows) and are then moved into the columnstore.

Deltastore – A columnstore index can have more than one delta rowgroup. All of the delta rowgroups are collectively called the deltastore.

Nonclustered columnstore index – A nonclustered columnstore index and a clustered columnstore index function the same. The difference is that a nonclustered index is a secondary index that’s created on a rowstore table, but a clustered columnstore index is the primary storage for the entire table.The nonclustered index contains a copy of part or all of the rows and columns in the underlying table. The index is defined as one or more columns of the table and has an optional condition that filters the rows.A nonclustered columnstore index enables real-time operational analytics where the OLTP workload uses the underlying clustered index while analytics run concurrently on the columnstore index.

Why should I use a columnstore index?

  • It has high compression rates. I/O bottlenecks in your system are minimized or eliminated, and memory footprint is reduced significantly.
  • High compression rates improve query performance by using a smaller in-memory footprint. 
  • Query performance can improve because SQL Server can perform more query and data operations in memory.
  • Batch execution improves query performance, typically by two to four times, by processing multiple rows together.
  • Queries often select only a few columns from a table, which reduces total I/O from the physical media.

How do I choose between a rowstore index and a columnstore index?

Rowstore indexes perform best on queries that seek into the data, when searching for a particular value, or for queries on a small range of values. They tend to require mostly table seeks instead of table scans.

Columnstore indexes give high performance gains for analytic queries that scan large amounts of data, especially on large tables. They tend to require full table scans rather than table seeks.

Can I combine rowstore and columnstore on the same table?

  • Yes. Beginning with SQL Server 2016 (13.x), you can create an updatable nonclustered columnstore index on a rowstore table. 
  • The columnstore index stores a copy of the selected columns, so you need extra space for this data, but the selected data is compressed on average 10 times.
  • Beginning with SQL Server 2016 (13.x), you can have one or more nonclustered rowstore indexes on a columnstore index and perform efficient table seeks on the underlying columnstore. 
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x