Difference between NONCLUSTERED COLUMNSTORE INDEX and CLUSTERED COLUMNSTORE INDEX in azure synapse sql

Mohammad Gufran Jahangir January 16, 2024 0

1. Columnstore Indexes:

Purpose: Designed for high-performance analytics and data warehousing workloads.
Storage: Store data in column-wise format instead of traditional row-wise format.
Benefits:
- Significant compression (up to 10x), reducing storage costs.
- Faster query performance for aggregations and large scans, especially on large datasets.

2. CLUSTERED COLUMNSTORE INDEX (CCI):

Main storage structure: Organizes the entire table data in columnstore format.
Default in Dedicated SQL Pool: Created automatically for every table, optimizing for analytics.
Benefits:
- Best for typical analytical workloads with frequent aggregations and scans.
- Maximizes compression and query performance.
Considerations:
- Can slow down frequent updates and inserts, as data needs to be re-organized.

3. NONCLUSTERED COLUMNSTORE INDEX (NCCI):

Secondary index: Created on specific columns while the table’s primary storage remains rowstore.
Used for:
- Selectively improving query performance on certain columns without impacting the entire table.
- Maintaining a rowstore structure for transactional workloads while enhancing analytics on specific columns.
Benefits:
- Flexibility for mixed workloads with both transactional and analytical needs.
- Less impact on updates and inserts compared to CCI.

Key Differences:

Feature	CLUSTERED COLUMNSTORE INDEX (CCI)	NONCLUSTERED COLUMNSTORE INDEX (NCCI)
Storage Structure	Primary storage for the entire table	Secondary index on specific columns
Default Creation	Created automatically in Dedicated SQL Pool	Created manually
Impact on Updates/Inserts	Can slow down due to data reorganization	Less impact
Use Cases	Typical analytical workloads with frequent aggregations and scans	Mixed workloads with both transactional and analytical needs

Choosing the Right Index:

CCI: Ideal for primarily analytical workloads with frequent aggregations and scans.
NCCI: Suitable for mixed workloads, selective performance optimization, and balancing transactional needs with analytics.

Additional Points:

You can create multiple NCCIs on a table for different query patterns.
Synapse Serverless SQL Pool supports only NCCIs, as it’s designed for query-specific optimization.
Consider factors like query patterns, workload type, update frequency, and storage requirements when choosing the appropriate index.

Example 1: CCI (Clustered Columnstore Index)

Scenario: A large fact table in a data warehouse storing sales transactions for analysis.

SQL Statements:

CREATE TABLE SalesTransactions
(
    TransactionID INT,
    ProductID INT,
    CustomerID INT,
    Date DATE,
    SalesAmount DECIMAL(18,2)
);

-- CCI created automatically in Dedicated SQL Pool

Benefits:

Optimal compression for large dataset.
Fast aggregations and scans for analytical queries like:
- SELECT SUM(SalesAmount) FROM SalesTransactions WHERE Date BETWEEN '2023-01-01' AND '2024-01-08'

Example 2: NCCI (Nonclustered Columnstore Index)

Scenario: A table used for both transactional updates and analytical queries.

SQL Statements:

CREATE TABLE CustomerOrders
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    OrderDetails VARCHAR(MAX)
);

CREATE NONCLUSTERED COLUMNSTORE INDEX IX_CustomerOrders_OrderDate
ON CustomerOrders (OrderDate);

Benefits:

Rowstore structure maintained for fast updates and inserts.
Enhanced query performance for analytics on OrderDate column, like:
- SELECT COUNT(*) FROM CustomerOrders WHERE OrderDate >= '2024-01-01'
Less impact on transactional operations compared to CCI.

Example 3: Multiple NCCIs

Scenario: A table with frequent queries involving different column combinations.

SQL Statements:

CREATE NONCLUSTERED COLUMNSTORE INDEX IX_Sales_ProductID
ON SalesTransactions (ProductID);

CREATE NONCLUSTERED COLUMNSTORE INDEX IX_Sales_CustomerID
ON SalesTransactions (CustomerID);

Benefits:

Optimized performance for various query patterns without impacting the entire table.

Mohammad Gufran Jahangir

Category:

Uncategorized