How to Add Partitioning to an Existing Table in Azure SQL Database

Mohammad Gufran Jahangir November 23, 2024 0

Adding partitioning to an existing table in Azure SQL Database requires the following steps. Unfortunately, you cannot directly apply partitioning to an already populated table. Instead, you must create a new partitioned table and migrate the data into it.

Here’s the step-by-step process:

Table of Contents

Steps to Add Partitioning to an Existing Table

1. Define a Partition Function

The partition function maps the rows of the table to specific partitions based on the partition key value.

CREATE PARTITION FUNCTION MyPartitionFunction (DATETIME)
AS RANGE LEFT FOR VALUES ('2023-01-01', '2023-06-01', '2023-12-31');

In this example:

Rows with dates before or equal to 2023-01-01 go to the first partition.
Rows between 2023-01-02 and 2023-06-01 go to the second partition, and so on.

2. Define a Partition Scheme

The partition scheme specifies where the partitions reside. In Azure SQL Database, all partitions reside in the same filegroup.

CREATE PARTITION SCHEME MyPartitionScheme
AS PARTITION MyPartitionFunction
ALL TO ([PRIMARY]);

3. Create a New Partitioned Table

Create a new table with the same schema as your existing table but include the partition scheme.

CREATE TABLE NewTableName (
    Column1 INT,
    Column2 NVARCHAR(100),
    BusinessDate DATETIME, -- Partition Key
    OtherColumns NVARCHAR(100)
)
ON MyPartitionScheme(BusinessDate); -- Use the partition scheme

4. Migrate Data from the Existing Table

Use INSERT INTO ... SELECT to move data from the existing table into the new partitioned table.

INSERT INTO NewTableName (Column1, Column2, BusinessDate, OtherColumns)
SELECT Column1, Column2, BusinessDate, OtherColumns
FROM ExistingTableName;

5. Drop the Old Table (Optional)

Once the data is successfully migrated, and you’ve verified it, you can drop the old table.

DROP TABLE ExistingTableName;

6. Rename the New Table (Optional)

If needed, rename the new table to the name of the original table.

EXEC sp_rename 'NewTableName', 'ExistingTableName';

7. Recreate Indexes

Recreate any indexes from the old table on the new partitioned table to optimize query performance.

Key Points to Remember

Partition Key: Choose a column for partitioning that is frequently used in query filters.
Partition Elimination: Queries must use the partition key in the WHERE clause to benefit from partition pruning.
Azure SQL Constraints:
- Azure SQL Database stores all partitions in the primary filegroup.
- Partition switching (e.g., ALTER TABLE ... SWITCH) is restricted in some scenarios.

Why Can’t You Partition an Existing Table Directly?

Partitioning changes the underlying storage structure of the table, and this cannot be applied to a table that already exists. Partitioning is integral to table design, and enabling it requires creating a new table with a partitioned design.

Partitioning to an existing table without recreating the table by creating a clustered index

In Azure SQL Database, you can apply partitioning to an existing table without recreating the table by creating a clustered index on the table and associating it with a partition scheme. This approach enables partitioning while preserving the existing data structure.

Steps to Partition an Existing Table Using a Clustered Index

1. Create a Partition Function

Define how the data will be divided into partitions based on a range of values.

CREATE PARTITION FUNCTION INSERT_DATE_PF (DATE)
AS RANGE RIGHT FOR VALUES ('2022-01-01', '2023-01-01', '2024-01-01');

This defines the partition function that splits the data into partitions based on the INSERT_DATE column.

2. Create a Partition Scheme

Map the partition function to storage. In Azure SQL Database, storage management is automated, so all partitions use the default storage ([PRIMARY]).

CREATE PARTITION SCHEME INSERT_DATE_PS
AS PARTITION INSERT_DATE_PF ALL TO ([PRIMARY]);

This creates a partition scheme that maps each partition to the PRIMARY filegroup.

3. Create a Clustered Index

Create a clustered index on the table using the partition scheme.

CREATE CLUSTERED INDEX cdx 
ON [Opera].[test_partition](INSERT_DATE) 
ON INSERT_DATE_PS(INSERT_DATE);

This step:

Applies the partition scheme to the table.
Physically reorganizes the table data into partitions based on the INSERT_DATE column.

Key Points

Preserves Table Structure:
- The table itself is not dropped or recreated. Partitioning is applied directly by building the clustered index.
Clustered Index Requirement:
- Partitioning requires the table to have a clustered index.
- If the table already has a clustered index, it must be dropped first using:

DROP INDEX [ExistingClusteredIndexName] ON [Opera].[test_partition];

3.Data Distribution:

Data is redistributed across partitions based on the INSERT_DATE column and the boundaries defined in the partition function.

4.Non-Clustered Indexes:

Non-clustered indexes must be recreated after applying the clustered index with partitioning.

Limitations

Clustered Index Requirement:
- You cannot partition a table without a clustered index. Applying partitioning inherently reorganizes the data storage, which is achieved through the clustered index.
Rebuilding Non-Clustered Indexes:
- If the table has non-clustered indexes, you must drop and recreate them after applying the clustered index.
Performance Impact:
- Applying a clustered index with partitioning can be resource-intensive for large tables, so perform this operation during a maintenance window.
15,000 Partition Limit:
- Azure SQL Database enforces a maximum of 15,000 partitions per table.

Advantages

No Table Recreation:
- Unlike the full recreation process, this method avoids the need to create a new table and migrate data.
Direct Partitioning:
- Enables partitioning directly on an existing table, simplifying the process.

Creating a clustered index with partitioning in Azure SQL Database can include additional columns in the index

when creating a clustered index with partitioning in Azure SQL Database, you can include additional columns in the index to define the order of rows within each partition. However, these additional columns are not part of the partitioning key; they are only used for sorting and searching within the partitions.

Key Points:

Partition Key vs. Index Key:
- The partition key determines how data is distributed across partitions.
- Additional columns in the index key determine how data is sorted within each partition.
Order of Columns:
- The partition key must be included in the index key, and it must come first.
- Other columns can be added after the partition key in the index definition.

Syntax for Adding Additional Columns

Here’s how to create a clustered index with additional columns while adding partitioning:

CREATE CLUSTERED INDEX cdx 
ON [SchemaName].[TableName](PartitionKeyColumn, AdditionalColumn1, AdditionalColumn2)
ON PartitionSchemeName(PartitionKeyColumn);

PartitionKeyColumn: Column used for partitioning (must be included first).
AdditionalColumn1, AdditionalColumn2: Columns used for sorting within partitions.
PartitionSchemeName: The partition scheme that maps partitions to storage.

Example: Adding Multiple Columns in a Partitioned Clustered Index

Scenario:

You have a table Sales with the following columns:

SaleDate (used for partitioning)
Region
ProductID
Amount

You want to partition the table by SaleDate and include Region and ProductID in the clustered index for efficient querying.

Steps:

Create Partition Function: Define partitions based on SaleDate.

CREATE PARTITION FUNCTION SaleDatePF (DATE)
AS RANGE RIGHT FOR VALUES ('2022-01-01', '2023-01-01', '2024-01-01');

2. Create Partition Scheme: Map the partition function to the default storage.

CREATE PARTITION SCHEME SaleDatePS
AS PARTITION SaleDatePF ALL TO ([PRIMARY]);

3. Create the Clustered Index: Include additional columns (Region, ProductID) in the index key.

CREATE CLUSTERED INDEX cdx 
ON Sales(SaleDate, Region, ProductID)
ON SaleDatePS(SaleDate);

How It Works

Partitioning:
- Data is distributed into partitions based on the SaleDate column.
- Example:
  - Partition 1: Data before 2022-01-01
  - Partition 2: Data between 2022-01-01 and 2023-01-01
  - Partition 3: Data between 2023-01-01 and 2024-01-01
  - Partition 4: Data after 2024-01-01
Indexing:
- Within each partition, rows are sorted by SaleDate, then by Region, and finally by ProductID.

Limitations and Considerations

Partition Key Required:
- The partition key must be included in the clustered index and as the first column.
Additional Columns Are Optional:
- Adding more columns is optional but can improve performance for specific queries by supporting better sorting and searching.
Index Size:
- Including many columns in the clustered index increases its size, which can impact storage and query performance.
Non-Clustered Indexes:
- Non-clustered indexes must be recreated after applying a partitioned clustered index.
Query Compatibility:
- Ensure your queries use the partition key and additional indexed columns to benefit from partitioning and indexing.

Querying the Partitioned Table

After adding a partitioned clustered index, optimize queries to use the indexed columns. For example:

SELECT *
FROM Sales
WHERE SaleDate BETWEEN '2022-01-01' AND '2022-12-31'
  AND Region = 'North'
  AND ProductID = 101;

This query benefits from:

Partition elimination: Only the relevant partition(s) are scanned.
Index usage: Within the partition, rows are efficiently located using Region and ProductID.

Difference between assigning a partition while creating a table versus creating a clustered index

The primary difference between assigning a partition while creating a table versus creating a clustered index and assigning the partition in Azure SQL Database lies in how and when partitioning is applied to the data in the table. Here’s an explanation of the differences with examples:

1. Assigning Partition While Creating the Table

How It Works

When you assign a partition scheme during table creation, the table is immediately partitioned based on the partition function and scheme.
The partitioning logic is built directly into the table’s storage structure from the start.

Example

-- Create Partition Function
CREATE PARTITION FUNCTION SalesPartitionFunction (DATE)
AS RANGE RIGHT FOR VALUES ('2022-01-01', '2023-01-01', '2024-01-01');

-- Create Partition Scheme
CREATE PARTITION SCHEME SalesPartitionScheme
AS PARTITION SalesPartitionFunction ALL TO ([PRIMARY]);

-- Create Table with Partition
CREATE TABLE Sales (
    SaleID INT,
    SaleDate DATE,
    Amount DECIMAL(10, 2)
) ON SalesPartitionScheme(SaleDate);

The Sales table is partitioned at the moment it is created, and data is physically stored in partitions based on the SaleDate column.

Advantages

Immediate Partitioning:
- The table is inherently partitioned, and every row inserted into the table is immediately placed in the correct partition.
Simplicity:
- There is no need to create a clustered index later to enable partitioning.
Better for Initial Design:
- Ideal when creating new tables with a clear partitioning strategy.

Limitations

If the table already exists, it cannot be repartitioned without recreating it.
If you need to repartition later, the process involves dropping and recreating the table.

2. Creating a Clustered Index and Assigning a Partition

How It Works

When a table is created without partitioning, you can later assign partitioning by creating a clustered index on the table and linking it to a partition scheme.
The partitioning is applied as part of the index creation, and the table’s data is physically reorganized into partitions.

Example

-- Create Partition Function
CREATE PARTITION FUNCTION SalesPartitionFunction (DATE)
AS RANGE RIGHT FOR VALUES ('2022-01-01', '2023-01-01', '2024-01-01');

-- Create Partition Scheme
CREATE PARTITION SCHEME SalesPartitionScheme
AS PARTITION SalesPartitionFunction ALL TO ([PRIMARY]);

-- Create Non-Partitioned Table
CREATE TABLE Sales (
    SaleID INT,
    SaleDate DATE,
    Amount DECIMAL(10, 2)
);

-- Add Clustered Index to Partition the Table
CREATE CLUSTERED INDEX cdx_Sales
ON Sales(SaleDate)
ON SalesPartitionScheme(SaleDate);

The CREATE CLUSTERED INDEX statement partitions the existing Sales table based on the SaleDate column and the partition scheme.

Advantages

Partitioning Existing Tables:
- Useful when you have an existing non-partitioned table and want to partition it without recreating the table.
Data Reorganization:
- Data is automatically redistributed into the appropriate partitions as the clustered index is created.

Limitations

Requires a Clustered Index:
- You must create a clustered index to enable partitioning.
- If the table already has a clustered index, it must be dropped before creating the partitioned index.
Performance Overhead:
- Partitioning an existing table involves reorganizing data, which can be time-consuming and resource-intensive, especially for large tables.
Non-Clustered Indexes Must Be Rebuilt:
- Any non-clustered indexes on the table must be dropped and recreated after partitioning.

Key Differences

Feature	Partition During Table Creation	Partition with Clustered Index
When Partitioning Occurs	At the time the table is created	When the clustered index is created on the table
Requires Clustered Index	No, partitioning is inherent	Yes, partitioning is applied through the clustered index
For New/Existing Tables	Best for new tables	Best for existing tables
Performance Impact	No additional reorganization needed	Data is reorganized into partitions during index creation
Non-Clustered Indexes	No impact	Must be dropped and recreated after partitioning
Ease of Implementation	Simpler, no need to manage indexes	Requires managing clustered index

Mohammad Gufran Jahangir

Tags: Add Partitioning to an Existing Table, Azure SQL Database

Category: