Here are real-world, scenario-based interview questions and answers related to partitioning in Azure SQL Database to help you prepare for advanced technical discussions.
Scenario 1: Query Performance Optimization for Large Data
Question:
You have a sales table with 100 million records, and queries frequently filter data by sale date (SaleDate
).
How would you optimize query performance using partitioning?
Answer:
To improve query performance, I would partition the table by SaleDate
using range-based partitioning.
Steps:
1️⃣ Create a Partition Function to Divide Data by Year:
CREATE PARTITION FUNCTION pf_SalesByYear (DATE)
AS RANGE LEFT FOR VALUES ('2021-12-31', '2022-12-31', '2023-12-31');
2️⃣ Create a Partition Scheme to Assign Partitions to Storage:
CREATE PARTITION SCHEME ps_SalesByYear
AS PARTITION pf_SalesByYear ALL TO ([PRIMARY]);
3️⃣ Create a Partitioned Table Using the Scheme:
CREATE TABLE SalesData (
SaleID INT PRIMARY KEY,
SaleDate DATE NOT NULL,
Amount DECIMAL(10,2)
) ON ps_SalesByYear(SaleDate);
✅ Benefit: When querying WHERE SaleDate = '2023-01-01'
, SQL Server will scan only the relevant partition, reducing query execution time.
Scenario 2: Data Archival Without Affecting Performance
Question:
Your company needs to delete old sales data older than 5 years, but deleting millions of rows at once locks the table. How would you handle this?
Answer:
I would use partition switching, which allows moving old data to an archive table instantly without affecting performance.
Steps:
1️⃣ Create an Archive Table with the Same Structure:
CREATE TABLE SalesArchive (
SaleID INT PRIMARY KEY,
SaleDate DATE NOT NULL,
Amount DECIMAL(10,2)
);
2️⃣ Move Old Data to Archive Using SWITCH
(Instant Move, No Deletion Cost)
ALTER TABLE SalesData SWITCH PARTITION 1 TO SalesArchive;
3️⃣ Drop the Empty Partition (Removes Old Data Without Locking)
ALTER PARTITION FUNCTION pf_SalesByYear()
MERGE RANGE ('2018-12-31');
✅ Benefit:
- No DELETE locks or long-running queries.
- Instantly moves data without affecting active transactions.
Scenario 3: Partition Elimination in Query Execution
Question:
A query filtering data for January 2023 runs slow on a partitioned table.
How do you ensure it uses partition elimination instead of scanning all partitions?
Answer:
Partition elimination works only when the query uses the partition key directly.
🔍 Fix the Query:
Instead of:
SELECT * FROM SalesData WHERE YEAR(SaleDate) = 2023;
✅ Use Partition Key Directly:
SELECT * FROM SalesData WHERE SaleDate BETWEEN '2023-01-01' AND '2023-01-31';
💡 Why?
- The first query (
YEAR(SaleDate)
) applies a function, preventing partition elimination. - The second query uses the actual partition key, so only relevant partitions are scanned.
Scenario 4: Handling Frequent Updates in a Partitioned Table
Question:
Your partitioned table has daily logs, but updates on the latest partition cause performance issues. How do you handle it?
Answer:
1️⃣ Create a Non-Clustered Index on the Partitioned Column
CREATE NONCLUSTERED INDEX idx_LogDate ON Logs (LogDate) ON ps_LogPartition(LogDate);
2️⃣ Use a Staging Table for Frequent Updates
Instead of updating directly:
UPDATE Logs SET Status = 'Processed' WHERE LogDate = '2023-12-01';
✅ Move Data to Staging, Update, Then Merge Back
-- Move relevant partition data to a temp table
SELECT * INTO Logs_Staging FROM Logs WHERE LogDate = '2023-12-01';
-- Update in staging (Faster than updating large tables)
UPDATE Logs_Staging SET Status = 'Processed';
-- Swap data back to the partitioned table
DELETE FROM Logs WHERE LogDate = '2023-12-01';
INSERT INTO Logs SELECT * FROM Logs_Staging;
💡 Why?
- Avoids locking large partitions during updates.
- Improves performance by batch-processing updates in a smaller dataset.
Scenario 5: Handling Data Growth Beyond Existing Partitions
Question:
Your table is partitioned by year, but you need to add a new partition for 2024. How do you do it?
Answer:
1️⃣ Add a New Partition Dynamically
ALTER PARTITION FUNCTION pf_SalesByYear()
SPLIT RANGE ('2024-12-31');
2️⃣ Verify Partition Split
SELECT * FROM sys.partition_range_values WHERE function_id = OBJECT_ID('pf_SalesByYear');
3️⃣ Insert Data for 2024
INSERT INTO SalesData (SaleID, SaleDate, Amount) VALUES (5001, '2024-01-01', 1000.00);
✅ Benefit:
- New partition is added without impacting existing partitions.
- Queries for 2024 data use the new partition efficiently.
Scenario 6: Indexing on Partitioned Tables
Question:
How do you create an index that aligns with your partitioned table?
Answer:
1️⃣ Create an Aligned Clustered Index
CREATE CLUSTERED INDEX idx_Sales ON SalesData(SaleDate)
ON ps_SalesByYear(SaleDate);
2️⃣ Create a Non-Clustered Index for Faster Lookups
CREATE NONCLUSTERED INDEX idx_Amount ON SalesData(Amount);
✅ Why?
- Clustered Indexes should match partitioning for better performance.
- Non-clustered Indexes help optimize lookup queries.
Scenario 7: Troubleshooting Partition Issues
Question:
A query on a partitioned table is running slow. How do you troubleshoot the issue?
Answer:
✅ Step 1: Check If Partitioning Is Being Used
SELECT SaleDate, $PARTITION.pf_SalesByYear(SaleDate) AS PartitionNumber
FROM SalesData WHERE SaleDate = '2023-01-01';
- If the query scans multiple partitions, partition elimination is not working.
✅ Step 2: Check Partition Fragmentation
SELECT * FROM sys.dm_db_partition_stats WHERE object_id = OBJECT_ID('SalesData');
- If
used_page_count
is too high, rebuild indexes.
✅ Step 3: Force Partition Elimination
SELECT * FROM SalesData WHERE SaleDate BETWEEN '2023-01-01' AND '2023-01-31';
- Ensure partition key is directly used in queries.
Final Thoughts
Partitioning in Azure SQL Database is a powerful tool for performance tuning, efficient data management, and query optimization. These scenario-based interview questions help in understanding real-world use cases.