Difference between NONCLUSTERED COLUMNSTORE INDEX and CLUSTERED COLUMNSTORE INDEX in azure synapse sql

Posted by

1. Columnstore Indexes:

  • Purpose: Designed for high-performance analytics and data warehousing workloads.
  • Storage: Store data in column-wise format instead of traditional row-wise format.
  • Benefits:
    • Significant compression (up to 10x), reducing storage costs.
    • Faster query performance for aggregations and large scans, especially on large datasets.

2. CLUSTERED COLUMNSTORE INDEX (CCI):

  • Main storage structure: Organizes the entire table data in columnstore format.
  • Default in Dedicated SQL Pool: Created automatically for every table, optimizing for analytics.
  • Benefits:
    • Best for typical analytical workloads with frequent aggregations and scans.
    • Maximizes compression and query performance.
  • Considerations:
    • Can slow down frequent updates and inserts, as data needs to be re-organized.

3. NONCLUSTERED COLUMNSTORE INDEX (NCCI):

  • Secondary index: Created on specific columns while the table’s primary storage remains rowstore.
  • Used for:
    • Selectively improving query performance on certain columns without impacting the entire table.
    • Maintaining a rowstore structure for transactional workloads while enhancing analytics on specific columns.
  • Benefits:
    • Flexibility for mixed workloads with both transactional and analytical needs.
    • Less impact on updates and inserts compared to CCI.

Key Differences:

FeatureCLUSTERED COLUMNSTORE INDEX (CCI)NONCLUSTERED COLUMNSTORE INDEX (NCCI)
Storage StructurePrimary storage for the entire tableSecondary index on specific columns
Default CreationCreated automatically in Dedicated SQL PoolCreated manually
Impact on Updates/InsertsCan slow down due to data reorganizationLess impact
Use CasesTypical analytical workloads with frequent aggregations and scansMixed workloads with both transactional and analytical needs

Choosing the Right Index:

  • CCI: Ideal for primarily analytical workloads with frequent aggregations and scans.
  • NCCI: Suitable for mixed workloads, selective performance optimization, and balancing transactional needs with analytics.

Additional Points:

  • You can create multiple NCCIs on a table for different query patterns.
  • Synapse Serverless SQL Pool supports only NCCIs, as it’s designed for query-specific optimization.
  • Consider factors like query patterns, workload type, update frequency, and storage requirements when choosing the appropriate index.

Example 1: CCI (Clustered Columnstore Index)

Scenario: A large fact table in a data warehouse storing sales transactions for analysis.

SQL Statements:

CREATE TABLE SalesTransactions
(
    TransactionID INT,
    ProductID INT,
    CustomerID INT,
    Date DATE,
    SalesAmount DECIMAL(18,2)
);

-- CCI created automatically in Dedicated SQL Pool

Benefits:

  • Optimal compression for large dataset.
  • Fast aggregations and scans for analytical queries like:
    • SELECT SUM(SalesAmount) FROM SalesTransactions WHERE Date BETWEEN '2023-01-01' AND '2024-01-08'

Example 2: NCCI (Nonclustered Columnstore Index)

Scenario: A table used for both transactional updates and analytical queries.

SQL Statements:

CREATE TABLE CustomerOrders
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    OrderDetails VARCHAR(MAX)
);

CREATE NONCLUSTERED COLUMNSTORE INDEX IX_CustomerOrders_OrderDate
ON CustomerOrders (OrderDate);

Benefits:

  • Rowstore structure maintained for fast updates and inserts.
  • Enhanced query performance for analytics on OrderDate column, like:
    • SELECT COUNT(*) FROM CustomerOrders WHERE OrderDate >= '2024-01-01'
  • Less impact on transactional operations compared to CCI.

Example 3: Multiple NCCIs

Scenario: A table with frequent queries involving different column combinations.

SQL Statements:

CREATE NONCLUSTERED COLUMNSTORE INDEX IX_Sales_ProductID
ON SalesTransactions (ProductID);

CREATE NONCLUSTERED COLUMNSTORE INDEX IX_Sales_CustomerID
ON SalesTransactions (CustomerID);

Benefits:

  • Optimized performance for various query patterns without impacting the entire table.
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x