Table constraints
Dedicated SQL pool supports these table constraints:
- PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are both used.
- UNIQUE constraint is only supported when NOT ENFORCED is used.
Note : FOREIGN KEY constraint is not supported in dedicated SQL pool.
Having primary key and/or unique key allows dedicated SQL pool engine to generate an optimal execution plan for a query. All values in a primary key column or a unique constraint column should be unique.
Functionality:
- Primary Key:
- Uniquely identifies each row in a table.
- Enforces data integrity by preventing duplicate entries for the defined column(s).
- Can be used as a foreign key in another table to establish relationships.
- Can be automatically clustered, meaning data is physically organized based on the primary key, which can optimize certain queries.
- Unique Constraint:
- Simply ensures unique values for a specific column(s) within a table.
- Doesn’t automatically cluster data.
- Can’t be used as a foreign key.
Enforcement:
- Primary Key:
- Strictly enforced: Duplicate entries attempting to violate the primary key constraint will lead to insert or update failure.
- Clustered columnstore index (CCI) is typically created by default, further optimizing performance for queries involving the primary key.
- Unique Constraint:
- Can be enforced or not enforced:
- Enforced: Similar to a primary key, duplicate entries will be rejected.
- Not enforced: Duplicate entries are allowed, potentially compromising data integrity.
- No automatic CCI creation unless explicitly specified.
- Can be enforced or not enforced:
Usage:
- Primary Key:
- Ideal for uniquely identifying and referencing rows: Used in tables where each row represents a distinct entity and needs to be uniquely identifiable.
- Enforces data integrity by preventing duplicates.
- Can improve query performance through clustering.
- Unique Constraint:
- Useful for restricting specific data values: Used for columns where unique values are important but not necessarily for identifying each row.
- Can be non-enforced for scenarios where occasional duplicates are acceptable.
- Offers more flexibility than a primary key for specific use cases.
Limitations:
- Primary Key:
- Only one primary key can be defined per table.
- Must be defined on non-nullable columns.
- Not supported with replicated distribution in Synapse SQL Pool.
- Unique Constraint:
- Can be defined on multiple columns in a table.
- Can be defined on nullable columns.
- Can be used with any distribution type in Synapse SQL Pool.
Remember:
- Choose the best option based on your specific data needs and query patterns.
- A primary key offers stricter data integrity and potentially better performance than a unique constraint.
- A unique constraint provides more flexibility for scenarios where occasional duplicates are acceptable.
Examples
Create a dedicated SQL pool table with a primary key:
CREATE TABLE mytable (c1 INT PRIMARY KEY NONCLUSTERED NOT ENFORCED, c2 INT);
Create a dedicated SQL pool table with a unique constraint:
CREATE TABLE t6 (c1 INT UNIQUE NOT ENFORCED, c2 INT);
Enforced and Not Enforced Primary Key
Enforced Primary Key:
This is the traditional and default behavior of a primary key. It acts as a strict data integrity constraint, meaning:
- Uniqueness: No two rows can have the same value for the primary key column(s). Any attempt to insert or update data that violates this uniqueness rule will result in an error and the operation will fail.
- Foreign key relationships: The primary key can be used as a foreign key in another table to establish relationships and ensure data consistency between tables.
- Clustered Columnstore Index (CCI): By default, a clustered columnstore index (CCI) is created on the primary key column(s), physically ordering the data for efficient read performance on queries involving those columns.
Not Enforced Primary Key:
This is a less common scenario in Synapse SQL Pool, but it allows defining a primary key without its strict enforcement. This means:
- Uniqueness becomes optional: Duplicate values for the primary key column(s) are allowed, potentially compromising data integrity.
- Foreign key relationships: The primary key cannot be used as a foreign key since its uniqueness cannot be guaranteed.
- No CCI by default: A CCI is not automatically created on a not enforced primary key. You can manually create a CCI or another type of index if needed.
Why Use a Not Enforced Primary Key?
There are limited scenarios where using a not enforced primary key might be justified, such as:
- Temporary tables: You might define a primary key for internal organization within a temporary table without strict uniqueness requirements.
- Staging tables: For initial data ingestion, you might want to allow duplicates before processing and cleansing the data later.
- Specific data model needs: In rare cases, your data model might require a “soft” identifier that allows occasional duplicates for specific reasons.
Important Considerations:
- Using a not enforced primary key significantly weakens data integrity and should be done with caution and under specific circumstances.
- You lose the benefits of foreign key relationships and efficient CCI-based query performance.
- Always document your reasons for using a not enforced primary key and monitor your data closely to avoid inconsistencies.
Recommendation:
In most cases, stick with an enforced primary key to ensure data integrity, leverage foreign key relationships, and enjoy efficient query performance through CCIs. Only consider using a not enforced primary key if you have very specific and valid reasons that outweigh the potential drawbacks.