,

Partition Table in Azure SQL Database

Posted by

Partitioning tables in Azure SQL Database allows you to manage large tables more efficiently by splitting them into smaller, manageable partitions. These partitions are based on a column (commonly a date column) that serves as the partition key.

Key Steps to Create a Partitioned Table in Azure SQL Database

  1. Enable Partitioning with a Filegroup: In Azure SQL Database, partitions are stored on the same filegroup (unlike on-premises SQL Server, where multiple filegroups can be used).

Example: Partition a Table by Year Using a Date Column

Step 1: Create a Partition Function

The partition function defines how the data will be split across partitions.

CREATE PARTITION FUNCTION YearPartitionFunction (DATE)
AS RANGE LEFT FOR VALUES 
    ('2022-12-31', '2023-12-31', '2024-12-31');
  • RANGE LEFT: Specifies that values equal to the boundary go to the lower partition.
  • The boundaries ('2022-12-31', '2023-12-31') determine the year split.

Step 2: Create a Partition Scheme

The partition scheme maps partitions to filegroups. In Azure SQL Database, all partitions map to the PRIMARY filegroup

CREATE PARTITION SCHEME YearPartitionScheme
AS PARTITION YearPartitionFunction
ALL TO ([PRIMARY]);

Step 3: Create a Partitioned Table

Define the table and assign it to the partition scheme using the partition column.

CREATE TABLE FinancialTransactions (
    TransactionID INT NOT NULL PRIMARY KEY,
    BusinessDate DATE NOT NULL,
    Amount DECIMAL(18,2)
)
ON YearPartitionScheme (BusinessDate);
  • ON YearPartitionScheme (BusinessDate): Specifies that the table is partitioned based on BusinessDate using the partition scheme.

Managing Partitions

  • Switching Data In/Out: You can move data in and out of partitions for maintenance or archival using ALTER TABLE ... SWITCH.
  • Merging/Splitting Partitions: Adjust partitions using ALTER PARTITION FUNCTION.

Example Query for Partitioned Table

Querying the table automatically uses the appropriate partition based on the query’s predicate.

SELECT *
FROM FinancialTransactions
WHERE BusinessDate BETWEEN '2023-01-01' AND '2023-12-31';

Benefits of Partitioning in Azure SQL Database

  1. Improved Performance: Queries targeting a single partition can be faster.
  2. Easier Maintenance: Partitions can be managed individually for loading, indexing, or archiving.
  3. Scalability: Handles large datasets effectively by splitting them logically.

Limitation For Partition Table in Azure SQL Database

In Azure SQL Database, table partitioning is supported but has certain limitations compared to the on-premises SQL Server or SQL Server running on virtual machines. Below are the key limitations and considerations:


1. Single Filegroup Support

  • In Azure SQL Database, all partitions are stored within the primary filegroup.
  • Unlike on-premises SQL Server, where you can use multiple filegroups to spread partitions across different storage devices, Azure SQL Database does not support this.

2. Limited Partition Count

  • Azure SQL Database supports a maximum of 15,000 partitions per partitioned table.
  • This count is generally sufficient for most use cases, but large-scale systems with granular partitions (e.g., per day for decades) may hit this limit.

3. Resource Constraints

  • Partitioned tables in Azure SQL Database share the same resources (CPU, memory, storage) as the database.
  • Query performance can degrade if partitions are misconfigured or if the database exceeds its DTU (Database Transaction Units) or vCore limits.

4. No Partition-Level Statistics

  • Azure SQL Database does not support partition-level statistics.
  • This may lead to less optimal query execution plans for partitioned tables, as statistics are generated for the entire table rather than for individual partitions.

5. Lack of Partition-Level Locking

  • Partition-level locking is not supported. Operations that target a specific partition may still result in table-level locks, potentially impacting concurrency.

6. No Partition-Level Maintenance

  • Maintenance operations (e.g., rebuilding indexes) must be performed on the entire table. Azure SQL Database does not allow partition-level index rebuilds or other granular maintenance tasks.

7. No Data Compression for Individual Partitions

  • Data compression can be applied to the table but not to individual partitions within the table.

8. Limited Use Cases for SWITCH

  • The ALTER TABLE ... SWITCH command has restrictions:
    • Both tables involved in the SWITCH operation must have the same schema and partitioning scheme.
    • SWITCH is commonly used for partitioning large datasets, but misaligned configurations can cause issues.

9. Higher Costs for Large Tables

  • Partitioned tables often lead to increased storage and compute requirements due to:
    • Additional overhead for partition alignment.
    • Rebuilding and maintaining large indexes.

10. Inconsistent Query Performance

  • If queries do not leverage partition pruning (e.g., missing filters on the partition column), they may scan the entire table, leading to degraded performance.

Best Practices to Overcome Limitations

  1. Partition Key Choice: Select a partition key that is commonly used in queries (e.g., DATE columns).
  2. Indexing: Ensure indexes align with partitions to improve query performance.
  3. Query Optimization: Use predicates in queries to enable partition elimination.
  4. Database Tier: Use a higher-performance service tier if resource constraints impact partitioned table performance.

Summary Table of Limitations

LimitationDescription
Maximum Partitions15,000 partitions per table.
No Partition SwitchingMoving or replacing partitions is not supported.
Indexing ConstraintsNo global indexes; only aligned indexes are supported.
Partition Key DependencyQueries must include the partition key for optimal performance.
Static BoundariesPartition boundaries cannot be altered dynamically; table recreation may be required.
Storage LimitationsPartitioning does not increase database storage capacity limits.
Performance OverheadPoor design can result in unnecessary scans or inefficient indexing.
Unsupported OperationsCertain operations, like bulk inserts and schema modifications, require additional considerations.

What Does “15,000 Partitions” Mean?

It means that for any single partitioned table, you can create up to 15,000 partitions. This limit ensures the table remains performant and manageable.

Each partition is like a “bucket” where a subset of the data is stored. The partitioning helps improve query performance because the database engine can quickly narrow down the data to specific partitions instead of scanning the entire table.


Simple Example

Scenario:

You have a table Sales with millions of rows, and you want to divide the data by year to make it easier to query.

Partition Function:

You define a partition function that splits the data based on the Year in the SaleDate column.

CREATE PARTITION FUNCTION YearPartitionFunction (datetime)
AS RANGE RIGHT FOR VALUES ('2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01');

How Partitions are Created:

This creates the following 4 partitions:

  1. Partition 1: Sales before 2020-01-01
  2. Partition 2: Sales in 2020
  3. Partition 3: Sales in 2021
  4. Partition 4: Sales in 2022 and later

If your data grows, you can add more partitions. For example:

  • Partition 5 for 2023
  • Partition 6 for 2024, and so on.

Maximum Partitions Limit

You can keep adding partitions until the total reaches 15,000. For example:

  • If you partition data by year, you could handle data for 15,000 years.
  • If you partition data by day, you could handle approximately 41 years of daily data (15,000 days).

Benefits of Partitioning:

  1. Performance: Queries on partitioned tables only scan the relevant partitions, speeding up operations.
  2. Manageability: Easier to manage smaller chunks of data (e.g., archive older partitions).
  3. Scalability: Supports very large datasets.

Important Notes:

  • Each partitioned table must have 1 partition function.
  • The limit is for the number of partitions, not rows. Each partition can hold millions or billions of rows.
  • Proper indexing and partitioning design are essential to maximize performance.

How the partition function works with RANGE RIGHT and RANGE LEFT values in the provided query?

In Azure SQL Database, a partition function is used to divide data across different partitions based on specified values. Here’s an explanation of how the partition function works with RANGE RIGHT and RANGE LEFT values in the provided query:


Query Breakdown

CREATE PARTITION FUNCTION myRangePF1 (datetime2(0))
AS RANGE RIGHT FOR VALUES ('2022-04-01', '2022-05-01', '2022-06-01');
  • CREATE PARTITION FUNCTION: Defines a function that specifies how data should be distributed across partitions.
  • myRangePF1: The name of the partition function.
  • datetime2(0): Specifies the data type for the partition key, which is datetime2 with 0 fractional seconds precision.
  • AS RANGE RIGHT: Specifies that the partition includes the boundary value in the partition to the right.
  • FOR VALUES: Lists the boundary points used to divide the data into partitions.

What Happens in This Query?

  • The function divides the data based on the boundary values: '2022-04-01', '2022-05-01', and '2022-06-01'.
  • RANGE RIGHT means that the boundary value belongs to the partition on its right.

Partitions Created

  1. Partition 1: Includes all rows where the date is less than ‘2022-04-01’.
  2. Partition 2: Includes rows where the date is greater than or equal to ‘2022-04-01’ and less than ‘2022-05-01’.
  3. Partition 3: Includes rows where the date is greater than or equal to ‘2022-05-01’ and less than ‘2022-06-01’.
  4. Partition 4: Includes rows where the date is greater than or equal to ‘2022-06-01’.

How is RANGE LEFT Different?

If you replace RANGE RIGHT with RANGE LEFT:

CREATE PARTITION FUNCTION myRangePF1 (datetime2(0))
AS RANGE LEFT FOR VALUES ('2022-04-01', '2022-05-01', '2022-06-01');
  • RANGE LEFT means that the boundary value belongs to the partition on its left.

Partitions Created with RANGE LEFT

  1. Partition 1: Includes all rows where the date is less than or equal to ‘2022-04-01’.
  2. Partition 2: Includes rows where the date is greater than ‘2022-04-01’ and less than or equal to ‘2022-05-01’.
  3. Partition 3: Includes rows where the date is greater than ‘2022-05-01’ and less than or equal to ‘2022-06-01’.
  4. Partition 4: Includes rows where the date is greater than ‘2022-06-01’.

Visual Representation of RANGE RIGHT vs. RANGE LEFT

Boundary ValueRANGE RIGHT (Included in Right Partition)RANGE LEFT (Included in Left Partition)
2022-04-01Partition 2Partition 1
2022-05-01Partition 3Partition 2
2022-06-01Partition 4Partition 3

Use Cases

  • RANGE RIGHT: Often used when you want to include the boundary value with the later (future) data ranges.
  • RANGE LEFT: Useful when you want to group the boundary value with earlier (past) data ranges.

Practical Example

If you have a table of sales data with a SaleDate column, you can use this partition function to distribute the data based on date ranges. This helps improve query performance when accessing specific date ranges, as only the relevant partition(s) are scanned.

Partition Distribution for SaleDate (Using RANGE RIGHT):

  • Dates before '2022-04-01': Partition 1
  • Dates between '2022-04-01' and '2022-04-30': Partition 2
  • Dates between '2022-05-01' and '2022-05-31': Partition 3
  • Dates on or after '2022-06-01': Partition 4
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x